
The ultimate goal of this Challenge was to verify if transcriptomics data contains enough information for the diagnosis and/or prognosis of four human diseases (psoriasis, multiple sclerosis, COPD, and lung cancer).
In addition, it was setup to allow to:
- Identify best methods for particular data types
- Determine the dependence of performance on the methods of choice
- Study if the wisdom of crowds applies to diagnostics signatures
- Study the overlap of genes in the signatures (when applicable)
BACKGROUND ↑
The need for this challenge to infer clinical phenotype from genomics data in 2011
The situation back in 2011 when the challenge was launched was the following:
- A few success stories of gene expression based biomarkers in clinical use:
- MammaPrint (breast cancer recurrence assay, 70-gene profile, requires fresh tissue)
- Oncotype Dx (breast cancer recurrence assay), 21-gene profile, works on both fresh and fixed tissue
- Counter-balanced by a few failure stories of gene expression based biomarkers in clinical use:
- Potti et al, Nat Med (2006) claimed to identify genomic signatures for drug response. Three clinical trials begun in 2007, 2008 for lung and breast cancer. The research was later deemed statistically flawed and at least 10 high profiled publications were retracted and the clinical trials stopped.
- Amgen scientists tried to confirm 53 landmark papers in pre-clinical oncology research: Only 6 (11%) were confirmed.
- Bayer HealthCare reported that only about 25% of published preclinical studies could be validated.
Psoriasis
Psoriasis is the most prevalent autoimmune disease in the U.S; according to current studies, as many as 7.5 million Americans — approximately 2.2 percent of the population — have psoriasis. It is a chronic inflammatory and hyperproliferative skin disease, which, in addition to cutaneous manifestation, is accompanied with inflammatory arthritis in up to 40% cases.
The disease is diagnosed following physical examination of the skin lesions. Microscopic analysis of psoriatic skin biopsy shows thick, red, flaky cells with no sign of inflammation and blood tests can differentiate psoriasis from rheumatoid arthritis.
Psoriasis is typically treated with topical treatments of both steroids and non-steroids and phototherapy: UVB and UVA with light-sensitizing medication. There are also systemic medications and new drugs that target the autoimmune response and specific parts of the immune system (T cells, TNF, interleukin).

Clinical manifestation of psoriasis. (A) The red boxes show the most prevalent sites where psoriasis affects the skin. (B) Schematic view of the skin structure of a healthy and a psoriatic patient. Psoriatic skin shows signs of inflammation and scales (dead skin).
Multiple sclerosis
Multiple sclerosis (MS) is an autoimmune disease that affects the central nervous system. The trigger of the autoimmune process in MS is unknown. MS is believed to occur as a result of some combination of genetic, environmental and infectious factors, and possibly other factors such as vascular problems. Previous studies of identical twins have demonstrated a concordance of 30% to develop MS, suggesting that the genetic background has a relatively limited but significant role in triggering MS.
The symptoms of the disease result from inflammation, swelling, and lesions on the myelin.
There are a number of MS progression subtypes (see Figure 1): relapsing-remitting MS (RRMS), primary progressive MS (PPMS), and secondary progressive MS (SPMS). In 85% of the patients, the disease has a relapse-remitting (RR) course, which is characterized by the onset or deterioration of the neurological symptoms (relapses), followed by partial or complete recovery (remissions).
Similarly to most other autoimmune diseases, MS is significantly more common (at least 2-3 times) in women than men. This disease is most commonly diagnosed between the ages of 20 and 50. The risk of developing MS in the general population is 1/750 and over 2.5 million people are living with the disease worldwide. The disease can be managed and the symptoms controlled to various degrees of success with an individualized, multifaceted approach that includes medications and other therapies. However, there is no cure for multiple sclerosis.
Diagnosis by a neurologist usually involves ruling out other nervous system disorders with invasive and expensive tests such as lumbar puncture, Magnetic Resonance Imaging (MRI) brain scan and nerve function study.

Progression of the disease for clinically isolated syndromes and multiple sclerosis types. A clinically isolated syndrome (CIS) is an individual's first neurological episode, caused by inflammation or demyelination of nerve tissue. The diagnosis of multiple sclerosis is only possible after a MRI confirms lesions in the brain, which typically happens after multiple sites are affected (in the course of usually multiple events). The main forms of MS are distinguished by their different courses over time. RRMS is the most common form of MS. It defines patients having relapses followed by periods of remission. Multiple sclerosis diagnosis is made after a minimum of 2 relapses for RRMS. PPMS patients have constant symptoms without remission. SPMS progression starts the same way as RRMS, but at some point there is no more remission.
COPD
COPD encompasses chronic obstructive bronchiolitis with obstruction of small airways and emphysema with enlargement of airspaces and destruction of lung parenchyma, loss of lung elasticity, and closure of small airways. Although the disease is manifested in the small airways, the challenge is to produce a COPD signature that is valid in large airways where sample collection is easier to perform.
COPD causes a progressive airflow limitation that is not fully reversible and is associated with abnormal inflammatory responses to noxious particles or gases [1]. COPD is a major cause of chronic morbidity and mortality throughout the world with its prevalence being variable across different countries and groups. In developed countries smoking is a contributing factor to the disease.
As of 2011, COPD treatment was still in the active research and development phase. Pharmacotherapy decreased symptoms and complications and includes the use of long-acting bronchodilators and inhaled glucocorticosteroids. However, none of the existing medications offered a cure for or prevention of the long-term decline in lung function.

COPD is a disease that is manifested in the small airways. The challenge is to produce a COPD signature that is valid in large airways where sample collection is easier to perform.

COPD stages and symptoms
The Global Initiative for Chronic Obstructive Lung Disease (GOLD) characterizes COPD patients into GOLD Stage 1-4 depending on the severity of disease (with GOLD Stage 4 being the most severe). Diagnosis is based on spirometry (a test which measures expiratory air flow) with or without a bronchodilator (to differentiate from asthma) and through questionnaires related to respiratory symptoms.
Historically, a GOLD Stage 0 characterized a higher risk population who did not present the clearer symptoms used to describe stage 1. Since not all of these patients will eventually develop COPD, we did not include them in this challenge. In addition, subjects that suffer from alpha1-antitrypsin deficiency represent a unique group of COPD patients and were also excluded.
In summary, the COPD phenotype refers to GOLD stages 1-4 while Controls are asymptomatic subjects that have no consistent symptoms.
Lung cancer
In 2006 medical expenses from cancer care in the United States were an estimated $104.1 billion. As the population ages, costs are expected to continue to increase as cancer prevalence rises and expensive, targeted treatment strategies are becoming the standard of care. According to the World Health Organization (WHO) between 2004 and 2030, global cancer deaths will increase from 7.4 million to 11.8 million and cancer will be the leading cause of death followed by heart disease and stroke.
Non Small Cell Lung Cancer (NSCLC) accounts for approximately 85% of all lung cancers. NSCLC is divided into adenocarcinoma (AC), squamous cell carcinoma (SCC), and large cell carcinoma (LCC) histologies.
NSCLC stage is generally defined by the TNM system. The T category describes the original (primary) tumor – tumor size and whether it has spread to surrounding tissue. The N category signifies any lymph node involvement (in and around the lungs), and the M category indicates whether the cancer has spread to other parts of the body, i.e. metastasized.
According to the overall TNM staging, stage 1 lung cancer is small and localized to only one area of the lung. Stage 2 and 3 cancers are larger and may have grown into the surrounding tissues and there may be cancer cells in the lymph nodes. Stage 4 cancer has spread to another body part.

Lung cancer subtypes. Distribution of lung cancer subtypes in a study with smoking status at the time of diagnosis. The pie chart shows the distribution of the non small cell lung cancer (NSCLC) subtypes: SCC (squamous cell lung cancer), AC (adenocarcinoma) and LCC (large cell lung cancer), and the small cell lung cancer (SCLC). The distribution of current (red) and former (green) smokers is shown as a histogram for each subtype. (B). Schematic of the tissues involved in squamous cell carcinoma and adenocarcinoma
The challenge ↑
Aim
The ultimate goal of this Challenge was to verify if transcriptomics data contains enough information for the diagnosis and/or prognosis of four human diseases (psoriasis, multiple sclerosis, COPD, and lung cancer).
In addition, it was setup to allow to:
- Identify best methods for particular data types
- Determine the dependence of performance on the methods of choice
- Study if the wisdom of crowds applies to diagnostics signatures
- Study the overlap of genes in the signatures (when applicable)
Challenge overview

Psoriasis Sub-challenge
The aim of this sub-challenge was to verify that a robust diagnostic signature for Psoriasis can be extracted from gene expression data.
Participants were asked to develop and then submit a classifier that can stratify skin samples into one of two phenotype groups - Psoriasis or Control. The classifier was built by using any publicly available gene expression data with their related clinical, demographic and batch information, and was tested on an independent dataset.

Multiple sclerosis stage Sub-challenge
The aim of this sub-challenge was to verify that a robust diagnostic signature for different stages of relapsing-remitting multiple sclerosis (RRMS) patients can be extracted from gene expression data.
Participants were asked to develop and submit a classifier that can stratify MS patients in one of two phenotype groups – Relapsing RRMS or Remitting RRMS – based on the Peripheral Blood Mononuclear Cells (PBMC) transcriptome. The classifier was built by using publicly available gene expression data with clinical, demographic, and batch information, and was tested on an independent dataset.

Multiple sclerosis diagnostic Sub-challenge
The aim of this sub-challenge was to verify that a robust diagnostic signature for different types of multiple sclerosis (MS) patients can be extracted from gene expression data.
Participants were asked to develop and submit a classifier that can stratify MS patients in one of two phenotype groups – relapsing-remitting multiple sclerosis (RRMS) or Control - based on the Peripheral Blood Mononuclear Cells (PBMC) transcriptome. The classifier was build by using publicly available gene expression data with clinical, demographic, and batch information, and was tested on an independent dataset.

COPD Sub-challenge
The aim of this sub-challenge wa to identify a classifier that can distinguish between COPD and Control subjects in large airway tissue gene expression data. At the time, publicly available training data were derived from large airways and small airways whereas test data consisted large airway data only. While gene signatures are the typical components of classifiers from gene expression, we believe that there is room for exploration of other biologically-interpretable signatures that go beyond over- or under-expressing genes.

Schematics diagram of COPD challenge. The training data (blue outline) consist of data from large airways (green symbols) and small airways (orange symbols), whereas test data (yellow outline) consist large airway data only.

Lung cancer Sub-challenge
The aim of this sub-challenge was to classify Adenocarcinoma (AC) and Squamous Cell Carcinoma (SCC) and their respective stages (I & II) based on transcriptome from tumor samples. While gene signatures are the typical components of classifiers from gene expression, we believe that there is room for exploration of other biologically-interpretable signatures that go beyond over-or-under expressing genes.
Data

Psoriasis Sub-challenge
Each participant could find any suitable training data from publicly available repositories. For convenience, we included a list of third party publicly available datasets that participants may be able to use for training purposes:
- GSE13355 - Normal skin: 58 samples, Lesional skin: 64 samples
- GSE14905 - Normal skin: 21 samples, Lesional skin: 28 samples

Multiple sclerosis stage Sub-challenge
Each participant could find any suitable training data from publicly available repositories. For convenience, we included a list of third party publicly available datasets that participants may be able to use for training purposes:
- GSE15245 - Relapsing RRMS: n/a, Remitting RRMS: 62 samples
- GSE19224 - Relapsing RRMS: 14 samples, Remitting RRMS: 14 samples
- E-MTAB-69 - Relapsing RRMS: 12 samples, Remitting RRMS: 14 samples

Multiple sclerosis diagnostic Sub-challenge
Each participant could find any suitable training data from publicly available repositories. For convenience, we included a list of third party publicly available datasets that participants may be able to use for training purposes:
- GSE14895 - RRMS: n/a, Relapsing RRMS: n/a, Remitting RRMS: n/a, Control: 11 samples
- GSE15245 - RRMS: n/a, Relapsing RRMS: n/a, Remitting RRMS: 62 samples, Control: n/a
- GSE19224 - RRMS: n/a, Relapsing RRMS: 14 samples, Remitting RRMS: 14 samples, Control: n/a
- GSE23832 - RRMS: 4 samples, Relapsing RRMS: n/a, Remitting RRMS: n/a, Control: 4 samples
- GSE24427 - RRMS: 50 samples, Relapsing RRMS: n/a, Remitting RRMS: n/a, Control: n/a
- GSE21942 - RRMS: n/a, Relapsing RRMS: n/a, Remitting RRMS: n/a, Control: 15 samples
- GSE26104 - RRMS: 8 samples, Relapsing RRMS: n/a, Remitting RRMS: n/a, Control: n/a
- E-MTAB-69 - RRMS: n/a, Relapsing RRMS: 12, Remitting RRMS: 14, Control: 18 samples

COPD Sub-challenge
Training data can be obtained from any publicly available source.

Lung cancer Sub-challenge
Training data can be obtained from any publicly available source.
Rules and awards
Rules of the challenge can be viewed here.
Challenge outcome
- There is no one-size-fits-all method for classifying disease:
- No single normalization method conferred a performance advantage
- No single classification method conferred a performance advantage
- The specifics of the methodology used to classify disease seems to be decisive in extracting signal from the data
- If the signal is strong, most methods will get the classification right, as was the case with Psoriasis.
- If the signal is strong, most methods will get the classification right, as was the We can determine that the signal is too weak or inexistent, by finding that statistical significance was not attained by any prediction, as was the case with MS Stages.
- If the signal is strong, most methods will get the classification right, as was the When the signal is faint, the method used can be decisive. Crowd-sourcing is particularly relevant in these cases (COPD).
- The advantage of having many participants can be offset by the multiple testing problem that ensues
- The wisdom of crowds enhances the performance at least from the perspective of one of the performance metrics
- It is important to keep the test set data from the participants to better represent the situation at the clinic
- Many of these lessons learned are consistent with the conclusions reached in the MACQ-II study (2010) to be discussed in a forthcoming session
- An open source software package was provided to the community that allows researchers worldwide to develop prediction models starting with raw microarray data.
Challenge participants ↑
Over 54 teams world-wide participated in the challenge. Participants per country are shown on the map below.

Teams could choose to participate in one or more subchallenges. The distribution of teams submitting their predictions per subchallenge is given below.

Scoring and ranking ↑
Scoring
Scoring Review Panel
- Richard A. Bonneau, New York University
- Alberto de la Fuente, CRS4 Bioinformatica
- Igor Jurisica, University of Toronto
- Daniel Marbach, MIT, Computational Biology Group
- Tamir Tuller, Tel Aviv University
IBM Scoring Team:
- Raquel Norel
- Erhan Bilal
- Gustavo Stolovitzky
Rationale behind the chosen scoring methodology
- Basic premise: no single metric can capture all the subtleties of a prediction.
- We used non-redundant metrics that highlight different qualities of a prediction
- Threshold vs non-threshold
- Order-based versus confidence based
- Different ways of rewarding correct versus incorrect predictions
- All metrics must be generalizable to multi-class problems to accommodate for the lung cancer sub-challenge.
- A metric should avoid to reward pathological cases (e.g., predict all subject to be control)
Ranking
The Scoring Review Panel reviewed and approved the scoring methodology and procedures before the challenge closure as well as the below results of the scoring and final ranking:

Overall best performing teams

- Team 221: PRB
Team Members: Adi L. Tarca & Roberto Romero
Institution: Wayne State University, Detroit, USA - Team 227: COSBI
Team Member: Mario Lauria
Institution: Computational Systems Biology, Rovereto, Italy - Team 161: BISON
Team Members: M. Unger, P. Nandy, K.K. Dey, C. Zechner & H. Koeppl
Institution: ETH, Zurich, Switzerland

Best performers announcement as published in Nature, 24 Jan. 2013, page 565
Final full ranking:

Challenge symposium ↑
The Diagnostic Signature Challenge: Smarter Algorithms for better Disease Detection Symposium was successfully conducted at the Omni Parker House Hotel in Boston, MA, USA, on 2 – 3 October 2012. The event included lectures, presentations by the best performers in the challenge, and social events and networking.
The objectives of the symposium were:
- to discuss and share experiences on SBV IMPROVER and the Diagnostic Signature Challenge
- to engage with experts in the fields of system biology, crowd-sourcing and related topics
- to announce the best performing teams
- for the award winners to share their approaches with the scientific community.
Further details regarding the sbv IMPROVER Symposium 2012 can be found in the following links:
- Symposium Hosts
- Symposium Keynote Speakers
- Symposium Agenda
- Symposium presentations:
Session 1: The Need for Research Verification in Biomarker Discovery
- Keynote Presentation: Biomarker Discovery and Qualification; Donna L. Mendrick, Director, Division of Systems Biology, NCTR/FDA
- Keynote Presentation: Moving Beyond the Mean: The role of Variation in Determining Phenotype; John Quackenbush, Prof. of Computational Biology & Bioinformatics, Dana-Farber Cancer Institute
- Crowdsourcing Predictive Scientific Problems to do more in less Time; Will Cukierski, Kaggle Inc.
- IMPROVER and its Application to PMI R&D ; Manuel C. Peitsch, VP, Biological Systems Research, PMI Research & Development & Gustavo Stolovitzky, Manager, Functional Genomics & Systems Biology, IBM Computational Biology Center
Session 2: The Diagnostic Signature Challenge
- Presentation on Scoring and Overall Results; Gustavo Stolovitzky, Manager, Functional Genomics & Systems Biology, IBM Computational Biology Center
- Presentation by the Overall Best Performing Team; Adi L. Tarca, Wayne State University (USA)
Session 3: Diagnostic Signature Challenge: Best Performer Talks
- Presentation by the Overall Second Best Performing Team; Mario Lauria, COSBI (Italy)
- MS: Genomic Biomarker Status and Challenge Scoring; Tamir Tuller, Assistant Professor, Laboratory of Computational Systems Biology, Tel Aviv University & Erhan Bilal, IBM Computational Biology Center
- Presentation by the Best Performing Team of Sub-challenge Multiple Sclerosis; Mario Lauria, COSBI (Italy)
- Presentation by the Overall Third Best Performing Team; Michael Unger, ETH Zurich (Switzerland)
- Psoriasis: Genomic Biomarker Status and Challenge Scoring; Stephanie Boué, PMI R&D & Raquel Norel, IBM Computational Biology Center
- Presentation by the Best Performing Team of Sub-challenge Psoriasis; Kai Wang, ISB (USA)
- COPD: Genomic Biomarker Status and Challenge Scoring; Julia Hoeng, PMI R&D & Raquel Norel, IBM Computational Biology Center
- Presentation by the Best Performing Team of Sub-challenge COPD; Lin Song, UCLA (USA)
- Lung Cancer: Genomic Biomarker Status and Challenge Scoring; Stephanie Boué, PMI R&D & Erhan Bilal, IBM Computational Biology Center
- Presentation by the Best Performing Team of Sub-challenge Lung Cancer; Sol Efroni, Bar Ilan University (Israel)
- Lessons Learned from the Challenge: Biomedicine ; Stephanie Boué, PMI R&D & Computational Gustavo Stolovitzky, IBM Computational Biology Center
- Roundtable: Feedback from the Diagnostics Signature Challenge
Session 4: Beyond Diagnostic Signatures / Towards Network Based Diagnostics
- Keynote Presentation: Measuring and Modeling Variability in responses to Therapeutic Drugs; Peter Sorger, Professor of Systems Biology, Harvard Medical School (HMS)
- Keynote Presentation: Identification of Predictive Response Markers Based on the Understanding of the Drug Mechanism of Action; Birgit Schoeberl, Vice President of Discovery, Merrimack Pharmaceuticals
- Case Study from the Challenge: Performance of a Mechanism-Based Method; Florian Martin & Yang Xiang, PMI R&D
Session 5: Reflection, Outlook and Further Challenges
- Species Translational Challenge; Leonidas Alexopoulos, Lecturer, Department of Mechanical Engineering & Group Leader, Systems Biology and Bioengineering Group, National Technical University of Athens
- Future IMPROVER Challenges 2 - 4 ; Julia Hoeng, Manager Computational Disease Biology, PMI Research & Development
- Symposium Gallery
Media library ↑
Scientific publications
- Tarca, A.L. et al. Strengths and limitations of microarray-based phenotype prediction: lessons learned from the IMPROVER Diagnostic Signature Challenge. Bioinformatics 29 (22), 2892–9 (2013).
- Hoeng, J. et al. sbv IMPROVER Diagnostic Signature Challenge: Preface to this special issue. Systems Biomedicine 1 (4), 193–195 (2013).
- Rhrissorrakrai, K. et al. sbv IMPROVER diagnostic signature challenge: design and results. Systems Biomedicine 1 (4), 196–207 (2013).
- Norel, R. et al. sbv IMPROVER Diagnostic Signature Challenge: scoring strategies. Systems Biomedicine 1 (4), 208–216 (2013).
- Tarca, A.L. et al. Methodological approach from the Best Overall Team in the sbv IMPROVER Diagnostic Signature Challenge. Systems Biomedicine 1 (4), 217–227 (2013).
- Lauria, M. Rank-based transcriptional signatures. Systems Biomedicine 1 (4), 228–239 (2013).
- Nandy, P. et al. Learning diagnostic signatures from microarray data using L1-regularized logistic regression. Systems Biomedicine 1 (4), 240–246 (2013).
- Zhao, C. et al. Relapsing-remitting multiple sclerosis classification using elastic net logistic regression on gene expression data. Systems Biomedicine 1 (4), 247–253 (2013).
- Cho, J.-H. et al. Kernel-based method for feature selection and disease diagnosis using transcriptomics data. Systems Biomedicine 1 (4), 254–260 (2013).
- Song, L., Horvath, S. Predicting COPD status with a random generalized linear model. Systems Biomedicine 1 (4), 261–267 (2013).
- Ben-Hamo, R. et al. Classification of lung adenocarcinoma and squamous cell carcinoma samples based on their gene expression profile in the sbv IMPROVER Diagnostic Signature Challenge. Systems Biomedicine 1 (4), 268–277 (2013).
- Tian, S., Suárez-Fariñas, M. Hierarchical-TGDR. Systems Biomedicine 1 (4), 278–287 (2013).
The challenge in the news
Tutorials and webinars
Flyers and posters
Testimonials ↑
What they say about the challenge

