forked from MakisDi/metagen
-
Notifications
You must be signed in to change notification settings - Fork 0
/
Copy pathmetagen.Rmd
233 lines (214 loc) · 9.66 KB
/
metagen.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
---
title: "16s rRna metagenomics analysis report"
author: "Makis Digaletos"
output:
html_document:
toc: true
toc_depth: 3
number_sections: false
toc_float:
collapsed: false
smooth_scroll: true
---
```{r setup, include=FALSE}
knitr::opts_chunk$set(echo = FALSE)
library(rmarkdown)
library(dplyr)
library(tibble)
library(tidyr)
library(ggplot2)
library(dada2)
library(phyloseq)
library(Biostrings)
library(treemap)
library(highcharter)
library(kableExtra)
library(heatmaply)
library(plotly)
ps <- readRDS(paste0(path,"/out/",DB,"_ps.rds"))
data_trunc <- read.csv(file=paste0(path,"/out/data_trunc.tsv"), sep="")
colnames(data_trunc) <- c("Total Input","QC Filtered","Denoised","Chimeras Filtered","#Bacteria", "%Bacteria", "#Human", "%Human", "#Mouse", "%Mouse")
enzymes_pre <- read.delim(paste0(path,"/out/functionalAnn_out/EC_metagenome_out/pred_metagenome_unstrat_descrip.tsv.gz"))
pathways <- read.delim(paste0(path,"/out/functionalAnn_out/pathways_out/path_abun_unstrat_descrip.tsv.gz"))
```
#Quality Control
Dataset Quality control was executed with the DADA2 in-built QC commands. The mini-pipeline involves left trimming the 15bp ends (suggested for IonTorrent data), discarding reads shorter than 50bp, denoising and finally merging any chimeric reads to increase eficiency. FastQ Screen was used to generate mappings of QC filtered fastQ reads to reference contaminant sequences (hg19, mm10).
### Data filtering
The table shows how many reads were truncated in the fitering steps followed in the DADA-2 pipeline for taxonomy classification. The 'Chimeras Filtered' column shows the number of fully filtered reads that are used for the downstream analysis.
```{r trunc, warning=FALSE}
kable(data_trunc) %>%
kable_styling("striped", full_width = F) %>%
add_header_above(c(" ", "FastQ reads" = 2, "DADA2 reads" = 4, "Human" = 2, "Mouse" = 2)) %>%
add_header_above(c(" ", "DADA2 reads filtering" = 6, "QC Filtered Contaminant reads" = 4))
```
### Contaminant Reads mapping
The graphs below depict mapping profiles of the analysed samples. Reference sequences used are hg19, mm10, hg19_16srRna, mm10_16srRna and E.Coli_16srRna.
```{r, echo=FALSE,out.width="49%", out.height="35%",fig.cap="Reads mapping distributions",fig.show='hold',fig.align='center'}
map_dstrib_PNGs <- list.files(path=paste0(path,"/out/plots/reads_distributions"), pattern='.png', full.names=TRUE)
knitr::include_graphics(map_dstrib_PNGs)
```
### Quality Control plots
In gray-scale is a heat map of the frequency of each quality score at each base position. The mean quality score at each position is shown by the green line, and the quartiles of the quality score distribution by the orange lines. The red line shows the scaled proportion of reads that extend to at least that position.
```{r qc, echo=FALSE, out.width = '100%'}
knitr::include_graphics(paste0(path,"out/plots/QC.png"))
```
The error rates for each possible transition (A→C, A→G, …) are shown. Points are the observed error rates for each consensus quality score. The black line shows the estimated error rates after convergence of the machine-learning algorithm. The red line shows the error rates expected under the nominal definition of the Q-score.
```{r noise, echo=FALSE, out.width = '100%'}
knitr::include_graphics(paste0(path,"out/plots/noise.png"))
```
#Taxonomic Analysis
Taxonomic Analysis was run according to the standard PhyloSeq Bioconductor package pipeline to generate the following five basic visualisations.
### Alpha diversity
Alpha diversity will visualize how many different species could be decected in a microbial ecosystem.
```{r alpha, warning=FALSE}
alpha <- plot_richness(
ps,
"Subject",
color="Organ",
measures=c("Chao1", "Shannon", "ACE"),
title = "Alpha diversity"
) +
geom_point(size=2, alpha=0.7)
ggplotly(alpha, width = 800, height = 600)
```
### Beta diversity
Beta diversity will depict how different is the microbial composition in one environment compared to another based on the Order of each species. Samples/species are separated on two side-by-side panels.
```{r ordination, warning=FALSE}
wh0 = genefilter_sample(
ps,
filterfun_sample(function(x) x > 1),
A=1
)
GP1 = prune_taxa(wh0, ps)
GP1 = ps
GP1 = transform_sample_counts(GP1, function(x) 1E6 * x/sum(x))
phylum.sum = tapply(
taxa_sums(GP1),
tax_table(GP1)[, "Order"],
sum,
na.rm=TRUE
)
top10phyla = names(sort(phylum.sum, TRUE))[1:10]
GP1 = prune_taxa((tax_table(GP1)[, "Order"] %in% top10phyla), GP1)
GP.ord <- ordinate(GP1, "CCA", "bray")
ordination <- plot_ordination(
GP1,
GP.ord,
type="split",
color="Order",
label="Subject",
title="Beta diversity"
) + geom_point(size=2) + scale_color_brewer(palette="Spectral")
ggplotly(ordination, width = 800, height = 600)
```
### OTU abundance analysis
Abundance of top 30 most abundant OTUs accross all samples. At each OTU family’s horizontal position, the abundance values for each OTU are stacked in order from greatest to least, separate by a thin horizontal line. The values are stacked in order as a means of displaying both the sum total value while still representing the individual OTU abundances.
```{r barplot, warning=FALSE}
# Abundance barplot based in top30 appearing OTUs accross all samples
top30 <- names(sort(taxa_sums(ps), decreasing=TRUE))[1:30]
ps_top30 <- transform_sample_counts(ps, function(OTU) OTU/sum(OTU))
ps_top30 <- prune_taxa(top30, ps_top30)
barplot <- plot_bar(
ps_top30,
"Family",
fill="Genus",
facet_grid=(~Subject),
title="Abundance barplot"
) +
scale_fill_brewer(palette="Paired") +
theme(axis.title.y = element_text(margin = margin(t = 0, r = 200, b = 100, l = 0)))
ggplotly(barplot, width = 800, height = 600)
```
### Dendrogram
To capture the species diversity as much as possible, 200 random OTUs among the 1000 most abundant ones are shown in this figure. Any species-level annotation available will be displayed next to the relevant point. OTUs are distinguished in terms of abundance, sample, and phylum by size, shape and color of the points respectively.
```{r tree, warning=FALSE}
# 200 random OTUs among the 1000 most abundant ones
ps_top1000 <- prune_taxa(names(sort(taxa_sums(ps), TRUE))[1:1000], ps)
ps_tree = prune_taxa(taxa_names(ps_top1000)[sample(1:nrow(tax_table(ps_top1000)),size = 200)], ps_top1000)
tree <- plot_tree(
ps_tree,
ladderize="left",
shape="Sample",
color="Phylum",
label.tips = "Species",
text.size = 4,
size="abundance",
base.spacing = 0.03,
method = "sampledodge",
title = "Tree clustering",
plot.margin=0.1
)
ggplotly(tree, width = 800, height = 600)
```
### Phyla structure network analysis
The network helps identify any underlying structures in the co-occurence of different phyla across all datasets. The graph represents the 200 most abundant OTUs.
```{r net, warning=FALSE}
# top-200 appearing OTUs by phylum
ps_net <- prune_taxa(names(sort(taxa_sums(ps), TRUE))[1:200], ps)
jg <- make_network(ps_net, "taxa", "jaccard", 0.3)
net <- plot_network(title = "Bacterial classes co-occurence network",
g = jg,
physeq = ps_net,
type = "taxa",
color = "Phylum",
line_weight = 0.4,
label = "Class")
ggplotly(net, width = 800, height = 600)
```
# Additional plots
Additional plots were generated to give a picture of the datasets enzyme distributions and involved pathways. PiCRUST2 was used to generate the functional annotations for the treemap and add KEGG_IDs for the pathway analysis.
### Functional Annotation
This is the functional annotation of the OTUs. The treemap depicts the actual abundance of enzyme classes, groupped by sample.
```{r treemap, include=FALSE}
# Enzymes Treemap
class <- case_when(
grepl(pattern = "EC:1", x = enzymes_pre$function.) ~ "Oxidoreductases",
grepl(pattern = "EC:2", x = enzymes_pre$function.) ~ "Transferases",
grepl(pattern = "EC:3", x = enzymes_pre$function.) ~ "Hydrolases",
grepl(pattern = "EC:4", x = enzymes_pre$function.) ~ "Lyases",
grepl(pattern = "EC:5", x = enzymes_pre$function.) ~ "Isomerases",
grepl(pattern = "EC:6", x = enzymes_pre$function.) ~ "Ligases",
grepl(pattern = "EC:7", x = enzymes_pre$function.) ~ "Translocases"
)
enzymes <- add_column(enzymes_pre, class, .after = 2)
enzymes_expanded <- gather(enzymes, "sample", "abundance", 4:ncol(enzymes))
tree <- treemap(enzymes_expanded,
aspRatio = 2,
index = c("sample",
"class"),
vSize ="abundance",
vColor = "sample",
type="categorical",
align.labels = list(c("centre",
"bottom"),
c("left",
"top")),
algorithm = "pivotSize" ,
sortID = "abundance",
palette = "Spectral",
title = "Enzyme class abundance",
fontcolor.labels = c("#435856","white"),
fontsize.labels = c(15,9),
fontface.labels = 2,
border.col = "grey",
bg.labels = 220)
```
```{r hctreemap, warning=FALSE}
hctreemap(tree, allowDrillToNode = TRUE) %>%
hc_title(text = "Enzyme class abundance") %>%
hc_tooltip(pointFormat = "Group: <b>{point.name}</b><br>
Abundance: {point.value:,.0f}")
```
### Pathways analysis
The heatmap demonstrates the relative (on a 0-1 scale) abundance of the pathways that our OTUs were found to participate in.
```{r heatmap, warning=FALSE}
pt <- pathways
pt[] <- paste("description:",pathways$description)
rownames(pathways) <- pathways[,1]
heatmaply(normalize(pathways[,3:ncol(pathways)]),
custom_hovertext = pt,
xlab = "Samples",
ylab = "KEGG Pathway",
main = "Pathways relative abundance") %>%
layout(width=1000, height=1000)
```