MEDIC

The Metagenomics Diagnosis for IBD Challenge (MEDIC) aimed to investigate the diagnostic potential of metagenomics data to classify patients with Inflammatory Bowel Disease (IBD) and non-IBD subjects. The participants have also attempted to classify Ulcerative Colitis (UC) and Crohn’s Disease (CD) subjects with data obtained from non-invasive clinical samples. The challenge came with a prize-pool of $12,000.

Background
Challenge detail
Participants
Scoring and ranking
Media library
Testimonials

BACKGROUND

Inflammatory bowel disease (IBD)

The umbrella term “Inflammatory bowel disease” includes a spectrum of chronic inflammatory disorders which recurrently affect the gastro-intestinal tract. Ulcerative colitis (UC) and Crohn’s disease (CD) are the two main clinically defined manifestations of IBD, each with distinctive clinical and pathological features (Titz et al., 2018).

A systematic review of population-based studies from 1990 to 2016 provides information on the incidence (the number of newly diagnosed cases in a population and given time period) and prevalence (the total number of cases in a population at a given time) of UC and CD around the world (Ng et al., 2017).

Europe and North America show the highest prevalence values worldwide, e.g., 505 and 286 UC subjects per 100’000 in Norway and USA, respectively and 322 and 319 CD subjects per 100’000 in Germany and Canada, respectively. While the incidence of IBD has stabilized in western countries, IBD has become a global disease with accelerating incidence in newly industrialized countries (Ng et al., 2017).

MEDIC_Figure2_TitzB-et-al

Pathomechanisms and selected biomarkers of IBD. The main pathogenesis-associated changes are shown for UC and CD. For both diseases, changes in gut microbiota, disruption of the epithelial barrier function, and chronic immune-activation are observed. Differences have also been reported, such as for immune-regulatory processes and epithelial effects.
Biomarkers can be measured directly in tissue biopsies or upon release into the gut (e.g., fecal calprotectin) or the blood, e.g., autoreactive antibodies.
From Titz et al., Int. J. Mol. Sci. 2018, 19, 2775 (Titz et al., 2018)

Clinical and pathological features of UC and CD.

DiseaseUCCD
Mucosal inflammationnon-transmuraltransmural
Locationgenerally begins in the rectum and extends proximally to the colonentire gastrointestinal tract: from the oral cavity to the rectum
Inflammatory patterncontinuous areapatchy area
Clinical symptomsbloody diarrheaabdominal pain with weight loss
Co-morbiditiesmusculoskeletal, dermatological, ocular, and hepatobiliary co-morbiditiessimilar to those described in UC
Immune responseatypical T helper cell (Th) 2 response mediated by secretion of IL-13 by natural killer T cellTh1 and Th17 cytokine profiles are dominant

IBD and Diagnosis

Endoscopy constitutes the gold standard for the diagnosis and monitoring of IBD (Bernstein et al., 2010). The diagnosis is usually confirmed by biopsies on colonoscopy and complemented with the measurement of molecular biomarkers including fecal calprotectin, serum C-reactive protein (CRP), and serum antibody markers including autoantibodies and microbial and peptide antibodies (Mitsuyama et al., 2016; Takedatsu et al., 2018). However, their low sensitivity and high variability limit the clinical efficacy (Supplementary Table S2. “IBD biomarker products” in Titz et al. (Titz et al., 2018)). Thus, there is a need to identify novel molecular biomarkers that could be assessed with less invasive methods, and could benefit IBD clinical management and treatment.


IBD and Microbiome

IBD comprises complex genetic disorders, with multiple contributing genes (Bonen and Cho, 2003). However, not all subjects carrying mutations in those identified genes develop IBD. Indeed other components, such as diet and microbiota, seem to play a role in the etiology of the disease (Ananthakrishnan et al., 2018; Scotti et al., 2017). The human microbiome composed of various microorganisms colonize different body sites, such as the gut, mouth, genitals, skin, and airways, and vary in compositions. The microbiome is recognized to play a positive role in host supporting the maintenance of homeostasis, by contributing in the metabolism of nutrients, detoxification, helping immunity, preventing the propagation of pathogenic microbes for examples (Koppel et al., 2017; Lloyd-Price et al., 2016). A balanced interaction of microbes with the host plays an important part in preserving health. The dynamics and function of the microbiota can be influenced by many host-related and environmental factors, such as age, gender, diet, and drugs. Dysbiosis, a disruption of this balance is associated with skin and neurological disorders as well as many diseases such as immune-related diseases, metabolic diseases, inflammatory bowel disease (Scotti et al., 2017).

The link between pathogenesis of IBD and the intestinal microbiota has been established in: (i) animal models of colitis showing that germ-free conditions prevent inflammation (Powrie and Leach, 1995); (ii) human studies, showing that probiotics or surgical diversion of the fecal stream help the management of IBD and improve inflammation (Ganji-Arjenaki and Rafieian-Kopaei, 2018; Larson and Pemberton, 2004). Evidence also points out that microbiome dysbiosis may cause an inappropriate immune response that results in alteration of the intestinal epithelium barrier integrity. An increase of epithelial permeability allows further infiltration of microbial organisms that, in turn, provoke further immune responses (Maloy and Powrie, 2011).

The characterization of the microbiome relies on 16S or shotgun sequencing of metagenomes from fecal or intestine biopsy samples. Recent cross-sectional and longitudinal studies investigated microbiome changes in CD and/or UC compared to non-IBD using metagenomics sequencing data, and reported differences in composition and abundances between subjects suffering from IBD compared with non-IBD subjects (He et al., 2017; Santoru et al., 2017; Schirmer et al., 2018).

The analysis of raw metagenomics data consists in converting sequence reads by clustering or mapping into relative abundances of operational taxomonic units (OTUs) which can be annotated with taxonomy ranks, pathways, or microbial genes. A plethora of approaches exists to analyze shotgun metagenomic sequencing data for taxonomy and functional profiling (Poussin et al., 2018). The abundance taxonomic or pathway matrix constitutes the starting point for downstream analyses, visualization, and interpretation (Poussin 2018), or possibly, for machine learning and the identification of discriminative metagenomics features and model predictive of IBD status (Figure 3). Our new sbv IMPROVER Challenge aims to explore this new avenue.

MEDIC_Figure3

Schematic view of possible analysis paths and methods used for metagenomics data processing and downstream computational evaluation.

Modified from Poussin et al., Drug Discovery Today. 2018, 23(9), 1644 (Poussin et al., 2018)

References

The challenge

Aim

With the aim of finding the best classification algorithm that can be used in diagnosing Inflammatory Bowel Disease with data obtained from non-invasive clinical samples, the basic question we addressed to our participants was: Can you predict IBD status using metagenomics data?

More specifically, MEDIC aimed to verify that shotgun metagenomics sequencing data is sufficiently informative to allow for accurate classification of human subjects as:

  1. IBD vs. non-IBD
  2. UC vs. non-IBD
  3. CD vs. non-IBD
  4. UC vs. CD

With the analysis of the predictions submitted in the challenge, our goal is to answer the following scientific questions:

  • Which predictive computational approaches are the most accurate across the four 2-class problems described above?
  • What do the most discriminative metagenomic features tell us?
    • Are they rather based on taxonomy, functions/pathways and/or other types, e.g., k-mers?
    • Are they distinct between UC vs non-IBD and CD vs non-IBD or do they show commonalities?
MEDIC_Figure1_MEDIC_summary

MEDIC challenge

Challenge overview

The Challenge was split into two sub-challenges:

  • In the first sub-challenge (“MEDIC RAW”), participants were provided with shotgun metagenomics sequencing reads, so that they have the possibility to process metagenomics data with the analysis pipeline of their choice to address the Challenge.
  • In the second sub-challenge (“MEDIC PROCESSED”), participants were provided with pre-calculated taxonomic and pathway abundances matrices derived from the raw data. This allowed data scientists with no access to metagenomics analysis pipelines to solve the Challenge, as well as to compare the performance of classification methods beyond the role of pre-processing steps.

The participants could participate to either one or both sub-challenges.

Data

Organizers provided participants with shotgun metagenomics sequencing data as raw and processed data for predictor model training and testing as described in the Technical Document.

MEDIC_Figure5_datasets

MEDIC datasets

Rules for participation

The following content merely summarizes the key points of the Challenge Rules*. Your participation to the Challenge* requires that you have read and understood all the terms and conditions applied to it.

  • You are responsible to ensure that your participation in the challenge does not violate (1) any local laws, regulations or policies which may be applicable to you; and/or (2) any policies, regulations or rules of your employer and/or affiliated entity, including any that relate to working with, receiving funding from, or otherwise engaging with a company engaged in the making, marketing and/or selling of tobacco products.
  • To enter the Challenge*, you must register with the Site*, create or join a team, download the Challenge Data* available on the Site, and provide a Submission* in accordance with the Challenge Rules, before the submission deadline (15th of January 2020).
  • You must have one individual (the “Team Leader”) who will be primarily responsible as among the teammates for yours and your team’s communications with the Challenge Organizers* and to whom any incentive (where applicable, according to the Challenge Rules) will be distributed on your behalf.
  • With respect to the Challenge Data, you will be granted a limited license (according to the terms described in the Challenge Rules) to use such data solely for purposes of preparing a Submission to be sent to the Challenge Organizers*.
  • You can submit multiple Submissions per team for each sub-challenge described in the Challenge Rules. The Submissions will be scored by comparing the predictions to the “gold standard” that is unseen by you or any of the participants and, after scoring, only your best Submission will be retained for final ranking.
  • You acknowledge and agree that the Challenge Organizers shall be granted a license on the Submission to publish and/or use the Submission as specified in the Challenge Rules.
  • Except to the extent embodied in the Submission, you shall retain all rights, title and interest in any intellectual property rights in and to the method (algorithm) used by you in creating the Submission, and is solely responsible for taking such steps as may be necessary to secure any such rights. The Challenge Organizers shall be granted a license to use, review, assess, test and otherwise analyze such method, as specified in the Challenge Rules.
  • The 3 Best Performing Entrants (Teams)* of each sub-challenge are eligible to a 2,000 USD prize to be shared within team members at the discretion of the Team Leader.

*All capitalized expressions (e.g. “Challenge Rules”, “Challenge Data”, “Submission”) shall have the meaning provided for in the Challenge Rules.


Awards & Opportunities

  • Contribute and help the scientific community to benchmark computational methods objectively and establish standards and best practices in computational metagenomics data analysis.
  • Win a cash 2,000 USD price that will be awarded to the three best performing teams of each sub-challenge (see Rules).
  • Contribute to writing peer-reviewed scientific article(s) describing the outcome of the Challenge.
  • Show your data science skills
  • Receive an independent assessment of your methods
  • Collaborate with other researchers and grow your professional network

Challenge participants

50 participants in 26 teams.


sbv participation map

Scoring and ranking

Scoring

Gold Standard

Each submission is scored by comparing the predictions to the “gold standard”, which corresponds to the true class labels of the subjects from which the microbiome samples have been collected.

Scoring Methodology

Predefined metrics are applied to score anonymized participants’ predictions. Complementary metrics may be used to evaluate different aspects of the submitted predictions. Scores are then aggregated to provide final ranking of participants.

To avoid the optimization toward the maximization of specific scoring metrics, the scoring methods and metrics are only disclosed once the scoring is completed in accordance with the Challenge Rules.

Once all submissions are scored and the ranking established, teams are associated with their respective submission numbers and the winners announced.

Tie Resolution

If several teams obtained the same aggregated score, incentives would be allocated according to the Challenge Rules.

Scorers and Scoring Review Panel

A team of researchers from PMI R&D, Philip Morris Products S.A., in Neuchâtel (Switzerland) established a scoring methodology and performed the scoring on the blinded submissions under the review of an independent Scoring Review Panel including experts in the field of metagenomics and systems biology. Panel members reviewed the scoring strategy and procedure for the Challenge to ensure fairness and transparency.

Procedures

Blinded scoring: submissions are anonymized before scoring, so that the scoring team does not have access to the identity of the participating teams or the members of the teams. To help us maintain this, submissions (e.g. prediction files and write up) should not include any information regarding the identity or affiliations of the team or the members of the team.

Submissions and significance: The submission requires one .zip archive containing all the necessary files, in the specified format. One of the submissions must be significantly better than random prediction in at least one metric. The threshold Score above which a prediction is considered to be significant is defined as the 95th percentile of the distribution of random prediction scores. All predictions are scored by computing predefined metric(s) and compared to a random prediction score distribution to assess that a prediction is better than random. If these requirements are not met the Challenge organizers retain the right not to declare a best performer in accordance with the Challenge Rules.

Ranking

The ranking in each subchallenge will be announced shortly.

Media library

Scientific publications

We are currently preparing the challenge outcome publication. Stay tuned

The challenge in the news

Tutorials and webinars

Flyers and posters

Image_flyer_MEDIC

Click on the image to open the flyer in pdf.

Testimonials

What they say about the challenge