Microbiomics challenge

Accurately determining the composition and function of the microbiome can shed light on its role in diseases and lead to the development of new therapies and diagnostic tools.
The Microbiota composition prediction challenge was designed to evaluate the performance of computational microbiome analysis pipelines for their ability to predict the microbial composition of samples based on sequencing data.

Specifically, the questions that we aimed to address were the following:

  • Which pipelines best recover bacterial community composition and relative abundance?
  • Do technical biases and specific microbial composition affect the performance?
Challenge detail
Scoring and ranking
Media library



Microbiology is the study of microscopic organisms called microbes or microorganisms. Diverse types of microorganisms exists including bacteria, archaea, and viruses. Microorganisms populate most of the earth and can be found in every part of the biosphere including soil, oceans. They are present on the epithelial surface and digestive tract of higher organisms such as humans and the analysis of the composition of these populations is rapidly expanding field of research.

The microbiome

In higher organisms, the microbiome comprises a complex collection of microorganisms, colonizing different body niches, such as the gut, mouth, genitals, skin, or airways. The composition of this microorganism population varies depending on the body part and the health status of the individuals.

The human microbiome is known to have a beneficial role for homeostasis, assisting for example in the bioconversion of nutrients and detoxification, supporting immunity, protecting against pathogenic microbes, and maintaining host development, metabolism and physiology (Koppel et al., 2017; Lloyd-Price et al., 2016).

It is now understood that a good and sensitive balanced interaction of microbes with the host is essential to health. Moreover, growing evidence suggests that the function of the indigenous microbiota can be influenced by many factors, including genetics, diet, age, and toxins. The disruption of this balance, called dysbiosis, is associated with a plethora of diseases, including cancers, immune-related diseases, metabolic diseases, inflammatory bowel disease, pulmonary pathologies, oral diseases, skin problems, and neurological disorders (Benson et al., 2010; Blázquez and Berin, 2017; Caminero et al., 2016; Galipeau et al., 2015; Koren et al., 2012; Riiser, 2015; Roy and Trinchieri, 2017; Scher et al., 2016; Schuppan et al., 2009; Shukla et al., 2017; Sommer and Backhed, 2013; Turnbaugh et al., 2007; Vatanen et al., 2016; Vogtmann and Goedert, 2016).


Introduction to microbiome. Microorganisms are found in many environments on earth including soil, sea floor, and the human body that are among the most studied environments. In the figure, the relative abundances of four dominant bacterial phyla in different body sites: mouth (Bik et al., 2010), distal esophagus (Pei et al., 2004), lung (Beck et al., 2012), gut (Costello et al., 2009) is shown.

The common feature found among these unhealthy conditions is the loss of microbiota diversity, defined as the decrease in number and abundance of distinct types of microorganisms (Huttenhower et al., 2012; Mosca et al., 2016). Lower microbiome richness has been associated with metabolic dysfunctions, skin disorders, gastrointestinal disorders, and low-grade inflammation (Alekseyenko et al., 2013; Cotillard et al., 2013; Le Chatelier et al., 2013). Therefore, interrogating the composition of the microbiome can shed light on the etiology of diseases and, in the future, microbial abundances could potentially be used as markers for disease diagnostic.

Technologies and tools for microbiome analysis

Advances in genome sequencing technologies have enabled progress in the characterization of the microbial diversity, leading to a rapid expansion of the field known as microbiomics: the study of DNA of a microbial community.

An accurate analysis of microbiome sequencing data (e.g. accurate taxonomic assignment and relative abundance estimates) relies on computational methods. A plethora of analysis tools has been developed and published. However, limited information on the performance of computational methods and their context of applicability make scientists’ selection of the most appropriate software difficult.

Initially, the evaluation of computational methods in microbiome analysis has been limited to authors’ benchmarking of their own method against other existing methods, when authors publish novel or improved methods. However, this evaluation remains restricted and difficult due to the limited number of methods that are generally compared in a publication, with the risk to fall into “self-assessment trap” leading to biased results (Norel et al., 2011), as well as low consensus about benchmarking datasets and evaluation metrics in microbiomics.

For this reason new initiatives (see Assemblathon (Bradnam et al., 2013; Earl et al., 2011) and the CAMI initiatives (http://www.cami-challenge.org/)) such as the one presented here are undertaken to evaluate computational methods in microbiomics independently, comprehensively, and objectively.

Microbiomics analysis pipeline in a nutshell


Analysis pipeline steps: Sample pipeline for the analysis of shotgun data

Quality control of reads QC tools applied at this step check that the raw data are of good quality and provide insights for filtering/trimming.

Trimming/Filtering of low quality reads Trimming refers to the action of shortening sequencing reads by removing based with poor quality base calls and bases from sequencing adapters. Filtering refers to the action of removing sequencing reads completely, for instance when the average quality of the read is below a certain threshold, or when the trimmed read becomes too short.

Host genome contamination removal Filter all unwanted reads that belong to the host genome.

Taxonomic assignment Microbiome profile identification. Identification of represented genomes abundances.


The challenge


The biological interpretation of changes to the microbiome relies on the accurate qualitative and quantitative measurement and inference of the microbiome community composition and function, using advanced sequencing technologies and computational analysis approaches. Choosing the most suitable tool is challenging, as there is a large and ever-increasing variety of computational methods, and the issue of how to objectively benchmark them is still being explored.

A few crowdsourced initiatives have been conducted for evaluating the performance of metagenomics data analysis methods and providing guidance to the scientific community. The two Assemblathon efforts ran in 2010 and 2012 (Earl et al. 2011, Bradnam et al. 2013) focused on evaluating the performance of genome assembly methods. The Critical Assessment of Metagenome Interpretation (CAMI) team in collaboration with the metagenomics community organized a challenge in 2015, which aimed at evaluating methods in metagenomics for assembly, binning, and taxonomy profiling. CAMI provided an extensive benchmarking dataset to participants. Among the many results they collected, CAMI observed that (i) a good assembling step is crucial for successive binning; (ii) taxonomic profiling tools accurately predict higher level taxa (e,g., family level), while giving poor predictions on lower level taxa (e.g., species level).

In a spirit of continuity with CAMI and Assemblathon, the microbiota composition prediction challenge aimed at assessing objectively the performance of microbiomics computational analysis pipeline(s) as a whole, i.e. from quality control to taxonomy profiling, for the recovery of relative abundance and taxonomy assignment of bacterial communities, rather than assessing the individual steps of the process as CAMI already did.
The participants were provided with shotgun DNA sequencing data for several microbiome samples and asked to predict, at the phylum, genus, and species level, the composition and relative abundance of bacterial communities present in each sample.

Specifically, the questions that we aimed to address were the following:

  • Which pipelines best recover bacterial community composition and relative abundance?
  • Do technical biases and specific microbial composition affect the performance?

Challenge overview



Organizers provided to participants 19 samples (paired-end reads 2x150, Phred33 quality score, shotgun sequencing data) as a multiple-file .tar archive for download (each archive containing a subset of samples). The composition of the dataset is summarized in the figure below.


Rules for participation

Participants’ eligibility for scoring of their predictions was conditional to their compliance with the challenge and submission rules:

  • Submission completeness including all prediction files and description of the computational approach in a write up;
  • Compliance with data format for the predictions;
  • Compliance with the challenge rules.

Awards & Opportunities

  • Help the scientific community benchmark computational methods objectively and establish standards and best practices in computational microbiome analysis
  • Gain early access to new benchmarking datasets
  • Receive an independent assessment of your method(s)
  • Contribute to writing peer-reviewed scientific article(s) describing the outcome of the challenge
  • Grow your professional network by engaging with researchers from around the world
  • Have the possibility to win a 2000$ travel bursary to a conference of your choice.

Challenge participants

Eight submissions from Japan, India, and Armenia were scored.

sbv participation map

Scoring and ranking


A team of researchers from Philip Morris R&D in Neuchâtel (Switzerland) established a scoring methodology and performed the scoring on the blinded submissions under the review of an independent Scoring Review Panel including Prof. Alice McHardy, Helmholtz Centre for Infection Research, Germany and Dr Luisa Cutillo, Department of management and quantitative Studies (DISAQ) - University Parthenope of Naples, Italy.

Scoring proceeded according to the following steps:

  • Scoring of anonymized teams’ predictions against the gold standard = known relative abundances of bacteria
  • Binary classification and abundance metrics computed using the OPAL software (https://github.com/CAMI-challenge/OPAL)
  • Score aggregation: weighted sum of ranks
  • Final team ranking approved by an external Scoring Review Panel


The Scoring Review Panel reviewed and approved the scoring methodology and procedures before the challenge closure as well as the below results of the scoring and final ranking:


The winners are:

  • 1st : Vijay Kumar Narsapuram from India – Dupont Pioneer India
  • 2nd : Emma Ghrejyan from Armenia - Center for Ecological-Noosphere Sciences, NAS RA; Russian-Armenian University
  • 3rd : Tigran Vardanyan from Armenia – ISTC labz

Media library

Scientific publications

We are working on the outcome publication of this challenge. Stay tuned!

The challenge in the news

Tutorials and webinars

Flyers and posters


Click on the image to open the flyer in pdf.


Click on the image to open the poster in pdf.


What they say about the challenge