Skip to content

Metagenomic And Metatranscriptomic Analysis Essay

Abstract

Although the composition of the human microbiome is now well-studied, the microbiota’s >8 million genes and their regulation remain largely uncharacterized. This knowledge gap is in part because of the difficulty of acquiring large numbers of samples amenable to functional studies of the microbiota. We conducted what is, to our knowledge, one of the first human microbiome studies in a well-phenotyped prospective cohort incorporating taxonomic, metagenomic, and metatranscriptomic profiling at multiple body sites using self-collected samples. Stool and saliva were provided by eight healthy subjects, with the former preserved by three different methods (freezing, ethanol, and RNAlater) to validate self-collection. Within-subject microbial species, gene, and transcript abundances were highly concordant across sampling methods, with only a small fraction of transcripts (<5%) displaying between-method variation. Next, we investigated relationships between the oral and gut microbial communities, identifying a subset of abundant oral microbes that routinely survive transit to the gut, but with minimal transcriptional activity there. Finally, systematic comparison of the gut metagenome and metatranscriptome revealed that a substantial fraction (41%) of microbial transcripts were not differentially regulated relative to their genomic abundances. Of the remainder, consistently underexpressed pathways included sporulation and amino acid biosynthesis, whereas up-regulated pathways included ribosome biogenesis and methanogenesis. Across subjects, metatranscriptional profiles were significantly more individualized than DNA-level functional profiles, but less variable than microbial composition, indicative of subject-specific whole-community regulation. The results thus detail relationships between community genomic potential and gene expression in the gut, and establish the feasibility of metatranscriptomic investigations in subject-collected and shipped samples.

Of all of the human microbiomes across the diverse landscape of the human organism, the oral and gut microbiome represent the two best studied to date. Both microbiomes are subject to distinctive environments along the gastrointestinal tract and have their own unique ecologies. Several hundred taxa live in and along the saliva, teeth, and gingival structures, and over 500 taxa have been estimated to inhabit the distal gut (1). However, although robust methods for studying diversity of the human microbiota exist for the oral and gut microbiome, recent studies suggest that functional activity may vary widely across hosts and in response to distinct perturbations (2, 3), and highlight the need for methods development that bridges analyses of metagenomic and metatranscriptomic interrogations of the microbiome.

Early high-throughput metatranscriptomic investigations of microbial communities were focused largely on ocean-derived environmental samples (4⇓–6). These efforts demonstrated the feasibility of RNA-based profiling of microbial community structure, function, and diversity, and also produced large amounts of novel sequence information (transcripts) unseen by earlier metagenomic investigations. Metatranscriptomic analysis has subsequently been applied to the human gut microbiome, revealing strong intersubject and temporal variability in microbial gene expression, as well as core modules of actively transcribed versus repressed functions (7⇓–9). In addition, metatranscriptomic analyses of the gut microbiome during exposure to dietary (10) and xenobiotic (2) interventions have revealed significant alterations of the microbial community gene-expression profile, but often without large changes in overall community structure. One of the next major challenges facing human microbiome studies is relating the current understanding of microbial ecology to this growing knowledge of the biomolecular activities and regulatory systems of the microbiota (11).

Although recent population studies have established a framework for interrogating the community composition and genomic potential of these microbiomes, it is not yet well-understood how genomic potential relates to whole-community transcriptional regulation. This knowledge gap is in part because of the lack of standardized human microbiome sampling methodologies appropriate both for functional assays of the microbiota and for large cohort-based research (12, 13). Despite the success of efforts by the Human Microbiome Project (14) and MetaHIT consortia (15), identification of best practices and experimental processes that affect microbiome measurements represents a key challenge for functional meta’omics and for enabling microbiome investigation in larger epidemiological studies. In particular, the field requires development of microbiome sampling methods that are: (i) cost-effective, (ii) easily applied outside of a clinical setting, (iii) amenable to a variety of downstream meta’omic analyses, (iv) highly accurate in comparison with clinically collected controls, and (v) devoid of large biases or batch effects.

In this work, we proposed and validated a method for studying functional aspects of the microbiota in large human cohorts. We then applied the data collected during the validation process to address important knowledge gaps regarding relationships between the oral metagenome, gut metagenome, and gut metatranscriptome. Working with eight subjects from the Health Professionals Follow-up Study (HPFS) cohort, we demonstrated the representativeness of self-collected, self-shipped saliva and stool samples in metagenomic and metatranscriptomic assays of the microbiome. Comparing saliva and stool samples from the same subject further allowed us to explore microbial co-occurrence relationships between the oral and gut environments. In particular, only a small number of abundant oral residents survived transport to the gut environment, and their functional activity there was consistently greatly reduced. This proved to be the case even when microbes were identified at the strain level, indicating the transport of one population per species rather than the differentiation of two niche-specific subpopulations.

Finally, we compared and contrasted the metagenomic and metatranscriptomic compositions of the human gut. Although metagenomic analysis reveals the functional potential of a microbial community, it remains largely unknown how this potential is translated to functional activity, as measured by the metatranscriptome. Our analysis revealed that although functional potential and activity were often closely coupled in the gut, they were also distinguished by two strong forces: (i) a subset of microbial functional activities that were consistently transcriptionally up- or down-regulated in the gut, and (ii) activities that varied in a highly subject-specific manner in the context of a common functional potential. Together, these results provide a community-wide profile of biomolecular regulatory processes in the gut, as well as validating one of the first protocols appropriate for large-scale functional profiling of the microbiome in human populations.

Results

Self-Collected Stool Aliquots Provide Representative Metagenomes and Metatranscriptomes.

We recruited eight members of the HPFS cohort to provide saliva and stool samples to dissect relationships between the human oral metagenome, gut metagenome, and gut metatranscriptome. To simultaneously evaluate the feasibility of sample self-collection and shipping methods in functional studies of the human microbiome, saliva and stool samples were self-collected by the subjects and then stored on ice for delivery to our laboratory facilities within 24 h following an established protocol (14). We additionally evaluated this standard transport procedure relative to freshly collected, immediately processed samples and found only minimal differences (SI Appendix, SI Methods and Fig. S1). Upon arrival, aliquots of each stool sample were fixed in ethanol and RNAlater and then stored at ambient temperature for 48 h to simulate shipping conditions; additional aliquots were kept frozen as controls. DNA and RNA were subsequently extracted from the samples, assessed to ensure high-quality (RNA integrity number, RIN, scores) (Methods and SI Appendix, Table S1), and sequenced by Illumina HiSeq (Methods). The resulting raw read data were processed to remove low-quality reads and human contamination and finally profiled at the taxonomic and functional levels using MetaPhlAn (16) and HUMAnN (17), respectively (Fig. 1).

Fig. 1.

A self-sampling method compatible with metagenomic and metatranscriptomic sequencing of the human microbiome. (A) Eight participants from the HPFS cohort were recruited to assess the viability of self-collection methods in meta’omics studies and to simultaneously investigate relationships between the human oral metagenome, gut metagenome, and gut metatranscriptome. (B) Subjects self-collected samples of saliva and stool, which were returned to the laboratory. (C) Saliva samples were frozen and stool samples were tested under three conditions, including simulated shipping conditions: (i) frozen control, (ii) fixed in ethanol, and (iii) fixed in RNAlater. (D) DNA was extracted from all saliva and stool samples; RNA was extracted from stool samples only and reverse-transcribed to cDNA. All samples were then sequenced using the Illumina HiSeq platform. Raw sequence data were filtered to remove low quality and human host reads. (E) Metagenomic and metatranscriptomic read data were profiled for functional and taxonomic composition using HUMAnN (17) and MetaPhlAn (16), respectively.

We first sought to determine whether subject-collected, fixed, and shipped samples provided equivalent metagenomic and metatranscriptomic data to state-of-the-art fresh-frozen sample-collection protocols (14). This was assessed quantitatively by determining the extent to which stool samples taken from the same individual but handled by different methods yielded equivalent metagenomic and metatranscriptomic profiles. We thus compared profiles of microbial species, gene, and transcript abundances from 24 stool samples: one from each of eight subjects subdivided and stored by three different methods before DNA/RNA extraction and sequencing (frozen control, ethanol-fixed and mock-shipped, and RNAlater-fixed and mock-shipped) (Fig. 1). We found that for all three types of meta’omic profiles (species, genes, and transcripts), within-subject correlations between frozen and mock-shipped samples were universally very strong (minimum Spearman’s r = 0.83, P ≪ 0.001), with gene-level abundances being the most consistent between methods, followed by species, and then transcripts. Both direct comparisons (Fig. 2 A–C) and overview ordination (SI Appendix, Fig. S2) supported the conclusion that subject-shipment of samples had minimal effect on meta’omic profiling, particularly in contrast to the typically large intersubject differences.

Fig. 2.

Taxonomic and functional profiles are consistent across sample handling methods. Global profiles of (A) species composition, (B) gene-level functional composition, and (C) transcript-level functional composition were highly concordant in within-subject comparisons of frozen controls vs. mock-shipped samples (Spearman’s rank correlation coefficient); black bars represent the averages across each group of eight correlation coefficients. Sample handling effect was further quantified by two-way ANOVA for all (D) species, (E) genes, and (F) transcripts detected with relative abundance of at least 10−4 (0.01%) in at least three samples. Following correction for multiple hypothesis testing, <5% of transcripts showed a strong, significant effect from choice of sample handling method; we observed no significant sample handling effects for either species or genes. Vertical red lines represent the threshold for statistical significance (Benjamini–Hochberg FDR, α = 0.05); features above the horizontal red lines have greater between-method variation than between-subject variation.

Effects of Sample Handling Method on Individual Meta’omic Features.

To assess the contribution of individual features (e.g., specific microbial species or genes) to this strong global agreement, we performed two-way ANOVA tests on each metagenomic and metatranscriptomic feature, normalizing abundance data (Methods) to partition feature variance across the eight subjects and three sample handling methods (Fig. 2 D–F). Only features exceeding a minimum relative abundance of 10−4 (0.01%) in at least 3 of the 24 samples were considered. Relative to between-subject variation, no individual microbial species demonstrated statistically significant variation across sample collection methods after correction for multiple hypothesis testing (Benjamini–Hochberg α = 0.05) (18) (Fig. 2D). Similarly, sample handling method was not observed to have a statistically significant effect on the relative abundance of any individual genes (Fig. 2E). These findings are consistent with the strong within-subject, between-method agreements observed for DNA-level species and gene relative-abundance profiles in the correlation analyses described above (Fig. 2 A and B), and further suggest that—in addition to strong global agreement—individual metagenomic measurements are robust to subject-collected stool sample handling methods.

The effect of sample shipping on metatranscriptomics was comparably small, with only a very small minority (n = 84, <5% of total) of transcripts exhibiting statistically significant variation across sample handling methods (Fig. 2F and Dataset S1). The nature of these differentially abundant transcripts was consistent with a pattern of live cells responding to an altered environment via changes in gene regulation. For example, up-regulated genes in ethanol-fixed samples were largely involved in oxidative metabolic processes, a signal consistent with bacteria responding to a combination of oxygen exposure and a new carbon source. On the other hand, a subset of genes up-regulated in the RNAlater-fixed samples were involved in response to osmotic stress [e.g., the glycine betaine/proline transport system (19)], which is consistent with the high saline content of RNAlater solution. It is also possible that some transcripts experienced a sample handling method effect because of variation in RNA stability across the three storage conditions.

Comparison of Oral-Gut Microbial Ecology in the HPFS and Human Micobiome Project Cohorts.

After establishing the data quality of subject-collected samples, we next sought to meta-analyze the relationship between the oral and gut microbial communities in our own HPFS cohort and the larger healthy population of the Human Microbiome Project (HMP) (20) (Fig. 3A). In addition to characterizing potential microbial transit along the gastrointestinal tract, this contextualized our eight subjects within a broader population, as the HPFS participants were healthy but of both restricted geography (Boston metropolitan area) and age range (over 65). We identified a subset of 62 commonly occurring species in the eight frozen HPFS saliva and stool samples and compared their abundance profiles with 69 oral (tongue dorsum) and 81 stool samples from the HMP. As expected, differences in body-site specific ecology proved to be the largest effect in both cohorts, with HPFS and HMP stool samples forming a single, well-mixed cluster and HPFS saliva samples associated with but distinct from HMP oral metagenomes from the buccal mucosa and tongue dorsum (SI Appendix, Fig. S3). Whereas the compositions of the oral versus gut samples were largely distinct, we did observe a small number of species that occurred regularly at both body sites (Fig. 3A).

Fig. 3.

Oral-gut ecology in the HPFS and HMP cohorts. (A) We isolated species observed in the eight pairs of frozen stool and saliva samples from the HPFS cohort with relative abundance of at least 10−2 (1%) in two HPFS samples. The taxonomic profiles of these species were compared with stool and tongue samples from the HMP cohort, with tongue representing the oral community. Samples were clustered by Bray–Curtis distance, and species were clustered by rank correlation. Note that samples cluster strongly by body site (oral vs. gut). Highly abundant oral species are more likely to be detected at low levels in the gut. Green numbers associate oral-gut co-occurring species with detailed abundance profiles in B and C. (B) Eight abundant oral species detected in the HPFS saliva samples were detectable at low abundance in the stool samples from the same individuals, but showed minimal transcriptional activity in the stool. Gray lines connect oral DNA (light blue), gut DNA (dark blue), and gut RNA (red) from the same individual. (C) D. invisus is an unusual example of a gut-dominant species that also occurs in the oral cavity.

Detection of Oral Bacterial Strains in the Gut Microbiota.

Although the oral and gut environments are anatomically linked, the degree of exchange between their resident microbiota is not completely understood (21). Bacterial species from the oral community are carried along with food into the stomach, but the degree to which they survive or remain biologically active in the lower gastrointestinal tract has not been systematically characterized, particularly whether oral microbes contribute to the stable commensal gut community as measured by the stool. We first examined our frozen saliva and stool samples for cases of bacterial species co-occurring in the oral and gut communities of each subject. We defined a species as co-occurring if, for at least two subjects, the species occurred with relative abundance greater than 10−2 (1%) in a subject’s saliva sample and greater than 10−5 (0.001%) in the same subject’s frozen stool sample. Of 33 species meeting the first criterion (common oral species), eight met our criteria for detection in the stool: four members of the Streptococcus genus (Streptococcus salivarius, Streptococcus parasanguinis, Streptococcus australis, and Streptococcus sanguinis), along with Haemophilus parainfluenzae, Veillonella atypica, Veillonella parvula, and Actinomyces odontolyticus (Fig. 3B). For each of these species, the typical drop in relative abundance between the oral and gut communities was one-to-two orders-of-magnitude, with higher oral abundance generally corresponding to higher gut abundance. This finding suggests that, although DNA from these oral species does survive transit to the gut, it does not form a dominant component of that community.

If a small fraction of oral microbiome members do survive transit to the healthy gut, the gut metatranscriptome provides one means of assessing their active biological contribution to this community, albeit agnostic of their transcriptional activities at the oral site. Our metagenomic samples were assessed, among other methods, by mapping DNA reads to sets of species-specific marker genes using MetaPhlAn (16) (Methods). We applied the same procedure to RNA read data, in which case MetaPhlAn provides a profile of a subset of the metatranscriptome that can be unambiguously assigned to individual species. Based on this analysis, we found that when oral species were detectable in the gut at the DNA level, they rarely appeared to be transcriptionally active there. It was only in those cases where an oral species achieved its highest DNA abundance in the gut that its species-specific RNA was detected, and typically with relative abundance one-to-two orders-of-magnitude lower than the corresponding DNA abundance (some cases of RNA nondetection may correspond to exceptionally small, but still nonzero, RNA abundances that fell below our detection limit). As oral species’ abundances in the gut community were already multiple orders-of-magnitude lower than in the oral microbiome, this indicates that oral microbes that do survive transit to the gut are not stable, active contributors to its ecology.

Although this evidence is consistent with a pattern of abundant oral species passaging through the gut at low levels, co-occurrence could also be explained by separate strains of the same species adapted to the oral and gut environments. To evaluate whether oral and gut signatures of a species represent the same or different strains, we compared profiles of species-specific marker-gene presence and absence across subjects’ stool and saliva samples. These profiles capture gene gain and loss events in specific strains of a species and were used here as molecular “barcodes” for identifying or differentiating strains (Methods). Barcodes for common oral species often differed markedly across the eight saliva samples, indicating the presence of subject-specific strain variation (SI Appendix, Figs. S4–S11). However, in cases of within-subject oral-gut species co-occurrence, we rarely detected markers for a species in a subject’s stool sample that were not also seen in the subject’s saliva sample. This finding supports the hypothesis that oral-gut species co-occurrence is driven not by separate pools of niche-adapted strains of the same species, but instead by oral strains surviving transit to the gut in low quantities.

An even smaller number of abundant gut microbes occurred at appreciable levels in the oral community, the only significant example being Dialister invisus; this was true for samples from both the HMP and HPFS cohorts (Fig. 3C). There were three cases of D. invisus oral-gut co-occurrence among the eight HPFS subjects, and in each case the same strain was carried with high abundance in the stool and low abundance in the saliva (SI Appendix, Fig. S12). Curiously, despite its high DNA abundance in the gut, D. invisus made almost no contribution to the pool of species-specific transcripts, an indicator of reduced transcriptional activity (Fig. 3C and SI Appendix, Fig. S13). Of the remaining five subjects, four carried D. invisus exclusively in the oral community. This evidence resolves previously disparate ecologies for D. invisus, which was isolated from the human oral cavity (22) but also identifed as a marker for human stool (23), suggesting that it is atypically capable of persisting at high abundance in both the oral and gut communities and may freely transit between the two.

Relating Microbial Genes in the Gut at the DNA and RNA Levels.

We next investigated possible global models for metagenome vs. metatranscriptome regulation in the gut microbiota. Among the host-adapted microbes in this community, DNA and RNA abundances would be correlated if many genes were not differentially regulated and were transcribed at the same constant rate. This would be the case, for example, if typical gut microbe molecular activity was regulated by genome modifications over evolutionary time, as opposed to transcriptional regulation on a more rapid time scale. To test this theory, we quantified the relative abundances of genes and transcripts using HUMAnN (17) in the HPFS stool samples, with the Kyoto Encyclopedia of Genes and Genomes (KEGG) Orthogroups (KO) database as a reference set for gene families (Methods). Across all samples, a total of 3,292 KOs were observed with relative abundance of at least 10−4 (0.01%) at either the DNA or RNA level. Averaged over the eight frozen stool samples, we found that gene abundance and corresponding transcript abundance were well correlated (Spearman’s r = 0.76; two-tailed P ≪ 0.001) (Fig. 4).

Fig. 4.

Up- and down-regulated pathways and clades in the gut metatranscriptome. Gene and transcript relative abundances are generally well correlated (Spearman’s r = 0.76). (A–H) Each scatterplot illustrates the average gene (DNA) and transcript (RNA) relative abundance for 3,292 KOs from the eight frozen HPFS stool samples, highlighting a prominent over- or underexpressed functional module. Red circles correspond to KOs where RNA > DNA; blue circles correspond to KOs where DNA > RNA. Marks on the x or y axis margins represent KOs with zero measured abundance in one dataset but nonzero abundance in the other. The trends illustrated here were all of large effect (fold-change > 2) and statistically significant following FDR correction (Methods).

To identify differentially regulated transcripts, we computed the log RNA/DNA abundance ratio for each gene on a subject-by-subject basis and then tested whether the mean of the eight log ratios was significantly different from zero following false-discovery rate (FDR) correction (indicating a pattern of consistent over- or underexpression) (Dataset S2). Although a substantial fraction of KOs were not consistently differentially regulated in the gut (nonsignificant fold-change or fold-change < 2; n = 1,340, 41%), we also observed many transcripts whose relative abundances were an order-of-magnitude higher or lower than expected from the DNA abundance of their corresponding gene families (significant fold-change > 10; n = 724, 22%) (Fig. 4). For example, tetA, which encodes a transporter protein conferring tetracycline resistance, was on average 1,000-times more abundant at the RNA level than the DNA level, one of the strongest such effects (one-sample t test, two-tailed P < 10−5) (Fig. 4A). We then used these rankings of gene-level expression as input to a functional enrichment analysis, searching for KEGG pathways and modules whose member genes (KOs) were enriched for consistent over- or underexpression (Fig. 4 and Dataset S3). All of the over- and underexpression relationships discussed in the following sections were both of large effect (fold-change > 2) and statistically significant following FDR correction (Methods).

Microbial genes encoding ribosomal proteins were among the most strongly overexpressed (Fig. 4B). Note that these are, of course, distinct from the ribosomal rRNAs depleted from metatranscriptomic assays (Methods). Indeed, we observed distinct clusters of ribosomal protein-coding gene overexpression across three domains of life, with bacterial ribosomal genes having the highest overall abundance, followed by archaeal ribosomal genes, and finally eukaryotic ribosomal genes detectable at the low end of DNA relative abundance. Notably, these archaeal ribosomal genes occurred as part of a “burst” of other highly expressed archaea-associated functions, including methanogenesis (Fig. 4C) and the archaeal RNA polymerase. This signal can be explained predominantly by the presence of the archaeon Methanobrevibacter smithii in five of the eight HPFS subjects. In these five subjects, the relative abundance of M. smithii at the DNA level ranged from 0.005 to 0.053 (0.5–5.3%), whereas its relative contribution to the pool of species-specific transcripts ranged from 0.021 to 0.147 (2.1–14.7%) (SI Appendix, Fig. S13

TY - CHAP

T1 - Metagenomic and metatranscriptomic analysis reveal genetic adaptation of deep-sea microbial communities

AU - Wu,Jieying

AU - Gao,Weimin

AU - Zhang,Weiwen

AU - Meldrum,Deirdre R.

PY - 2014/7/1

Y1 - 2014/7/1

N2 - The water body underlying the photic zone in oceans represents the largest water mass on earth (comprising 1.3×1018 m3), and is also the largest aqueous habitat for various microorganisms. This realm differs distinctly from the photic zone, in terms of its relatively lower temperature (approximately 2~4°C), higher pressure and richer inorganic nutrients. Differences in physical geochemical parameters between uppersea and deep-sea environments create fundamentally different challenges to microbial communities living in these environments. Recent studies found that prokaryotic microbes in deep-sea environments are welladapted to the special dwelling environments after long evolution, carrying genetic features that enable them to live and reproduce in the extreme environmental conditions. Recent progress in sequencing technologies is fueling a rapid increase in the number and scope of deepsea microbial community-targeted studies. While metagenomic analysis can provide information on the taxonomic composition and metabolic potential of microbial communities in the deep sea, metatranscriptomics serves to unveil the actual metabolic activities of the communities at a specific time and location, and how those activities are changing in response to environmental and biotic challenges. Here we provide a summary of recent progress in applying integrated metagenomic and metatranscriptomic analyses to uncover the special genetic features in the well-adapted deep-sea microbial communities.

AB - The water body underlying the photic zone in oceans represents the largest water mass on earth (comprising 1.3×1018 m3), and is also the largest aqueous habitat for various microorganisms. This realm differs distinctly from the photic zone, in terms of its relatively lower temperature (approximately 2~4°C), higher pressure and richer inorganic nutrients. Differences in physical geochemical parameters between uppersea and deep-sea environments create fundamentally different challenges to microbial communities living in these environments. Recent studies found that prokaryotic microbes in deep-sea environments are welladapted to the special dwelling environments after long evolution, carrying genetic features that enable them to live and reproduce in the extreme environmental conditions. Recent progress in sequencing technologies is fueling a rapid increase in the number and scope of deepsea microbial community-targeted studies. While metagenomic analysis can provide information on the taxonomic composition and metabolic potential of microbial communities in the deep sea, metatranscriptomics serves to unveil the actual metabolic activities of the communities at a specific time and location, and how those activities are changing in response to environmental and biotic challenges. Here we provide a summary of recent progress in applying integrated metagenomic and metatranscriptomic analyses to uncover the special genetic features in the well-adapted deep-sea microbial communities.

UR - http://www.scopus.com/inward/record.url?scp=84952946871&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=84952946871&partnerID=8YFLogxK

M3 - Chapter

SN - 9781633216624

SN - 9781633216372

SP - 1

EP - 20

BT - Deep Sea: Biodiversity, Human Dimension and Ecological Significance

PB - Nova Science Publishers, Inc.

ER -