hrvatski jezikClear Cookie - decide language by browser settings

The landscape of microbial phenotypic traits and associated genes

Brbić, Maria; Piškorec, Matija; Vidulin, Vedrana; Kriško, Anita; Šmuc, Tomislav; Supek, Fran (2016) The landscape of microbial phenotypic traits and associated genes. Nucleic Acids Research . ISSN 0305-1048

This is the latest version of this item.

PDF - Accepted Version - article
Download (6MB) | Preview


Bacteria and Archaea display a variety of phenotypic traits and can adapt to diverse ecological niches. However, systematic annotation of prokaryotic phenotypes is lacking. We have therefore developed ProTraits, a resource containing ∼545 000 novel phenotype inferences, spanning 424 traits assigned to 3046 bacterial and archaeal species. These annotations were assigned by a computational pipeline that associates microbes with phenotypes by text-mining the scientific literature and the broader World Wide Web, while also being able to define novel concepts from unstructured text. Moreover, the ProTraits pipeline assigns phenotypes by drawing extensively on comparative genomics, capturing patterns in gene repertoires, codon usage biases, proteome composition and co-occurrence in metagenomes. Notably, we find that gene synteny is highly predictive of many phenotypes, and highlight examples of gene neighborhoods associated with spore-forming ability. A global analysis of trait interrelatedness outlined clusters in the microbial phenotype network, suggesting common genetic underpinnings. Our extended set of phenotype annotations allows detection of 57 088 high confidence gene-trait links, which recover many known associations involving sporulation, flagella, catalase activity, aerobicity, photosynthesis and other traits. Over 99% of the commonly occurring gene families are involved in genetic interactions conditional on at least one phenotype, suggesting that epistasis has a major role in shaping microbial gene content.

Item Type: Article
Additional Information: This work was funded by the Croatian Science Foundation grants HRZZ-9623 (DescriptiveInduction) and HRZZ- 5660 (MultiCaST) and by the European Union FP7 grants ICT-2013-612944 (MAESTRA) and REGPOT-2012-2013- 1-316289 (InnoMol). FS acknowledges the support of the Spanish Ministry of Economy and Competitiveness, ‘Centro de Excelencia Severo Ochoa 2013-2017’ (SEV-2012- 0208) and the FP7 project 4DCellFate (277899). Funding for open access charge: EU FP7 FET ICT-2013-612944 (MAESTRA).
Uncontrolled Keywords: Prokaryotic phenotypes, genomic representation, gene-trait associations, text-mining, non-negative matrix factorization, support vector machine, random forest
Subjects: NATURAL SCIENCES > Biology
TECHNICAL SCIENCES > Computing > Artificial Intelligence
Divisions: Division of Electronics
Project titleProject leaderProject codeProject type
Learning from Massive, Incompletely annotated, and Structured Data-MAESTRAUNSPECIFIED612944EK
Croatian Science Foundation grants HRZZ-9623Dragan GambergerIP-11-2013-9623HRZZ
Enhancement of the Innovation Potential in SEE through new Molecular Solutions in Research and Development-INNOMOLUNSPECIFIED316289EK
Spanish Ministry of Economy and CompetitivenessUNSPECIFIEDUNSPECIFIEDFP7
‘Centro de Excelencia Severo Ochoa 2013-2017UNSPECIFIEDUNSPECIFIEDFP7
FP7 project 4DCellFateUNSPECIFIED277899FP7
Depositing User: Maria Brbić
Date Deposited: 09 Dec 2016 16:22
DOI: 10.1093/nar/gkw964

Available Versions of this Item

Actions (login required)

View Item View Item


Downloads per month over past year

Increase Font
Decrease Font
Dyslexic Font