Environmental Pathway Genome Dataset Construction
Metabolic Pathways for the Whole Community
Metabolic potential can be inferred from primary sequence information with the aid of computational methods that assemble and/or cluster sequences, search for patterns or motifs representing genes, and predict metabolic networks. Understanding how these networks link microbial activity with ecosystem functions and services has emerged as a fundamental scientific problem guiding the peer efforts of scientists and engineers around the world. Although sequencing technologies are rapidly expanding capacity to explore the metabolic potential o microbial communities, there are a number of computational and analytical challenges that limit our capacity to work with metagenomic datasets.
We have developed MetaPathways PMID: 23800136, PMID: 25048541, PMID: 26076725, PMID: 27515739, a modular and scalable pipeline for constructing Environmental Pathway Genome Databases (ePGDBs) from metagenomic sequence information that interfaces with Pathway Tools from SRI. Pathway Tools is a production-quality software environment supporting metabolic inference and flux balance analysis based on the MetaCyc database of metabolic pathways and enzymes representing all domains of life. Unlike KEGG or SEED subsystems, MetaCyc emphasizes smaller, evolutionarily conserved or co-regulated units of metabolism and contains the largest collection of experimentally validated metabolic pathways. Navigable and extensively commented pathway descriptions, literature citations, and enzyme properties combined within an ePGDB provide a coherent structure for exploring and interpreting predicted pathways from genomes to biomes PMID: 28398290. ePGDBs can be explored using the MetaPathways graphical user interface, exported to statistical software or visualized in Pathway Tools to produce interactive wall charts representing the metabolic blueprints of individual, population or community level metabolic networks.