-
Notifications
You must be signed in to change notification settings - Fork 17
/
Copy pathw01-geo-gse31873.Rmd
123 lines (96 loc) · 3.11 KB
/
w01-geo-gse31873.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
---
title: "Format GEO GSE31873 Dataset"
author: "K Siegmund"
date: "5/19/2021"
output: html_document
---
```{r setup, include=FALSE}
knitr::opts_chunk$set(echo = TRUE)
```
# {.tabset}
## Gene Expression Data
Import C42b cell line data from GEO.
Illumina HumanHT-12 V4.0 expression beadchip
GPL10558; 47231 rows
The download from GEO can take a few minutes, depending on computer and internet speed. Therefore I ran the command earlier, and saved the object as an R data set that we can load directly from the data folder.
```{r libraries}
library(purrr)
#if(!require("GEOquery")) {BiocManager::install("GEOquery")}
#library(GEOquery)
#gse31873<-getGEO('GSE31873',GSEMatrix=TRUE)
load("data/JBC 2012/gse31873.rda")
```
```{r showgse}
show(gse31873)
```
We want to oragnize the data into a list object, jbcdat, with three matrices:
* E for the expression data (exprs)
* targets for sample annotation (pData)
* genes for gene annotation (fData)
```{r exprs}
gse <- gse31873$GSE31873_series_matrix.txt.gz
jbcdat <- NULL
jbcdat$E <- exprs(gse)
jbcdat$E[1:4,1:3]
```
pData accesses the sample annotation information (p stands for phenotype.)
```{r pDannot}
names(pData(gse))
```
Check if the sample annotation data and gene expression data are ordered identically.
```{r checkorder}
identical(as.character(pData(gse)$geo_accession),
colnames(jbcdat$E))
```
The first variable 'title' contains the sample treatment information and the variable 'description' the array names.
```{r annot}
pData(gse)$title # sample treatment information
pData(gse)$description # array name
```
Now I will create a sample annotation dataframe named targets. I will have to break up the string in 'title' to get the different treatment factors.
```{r sampannot}
trtvars <- strsplit(as.character(pData(gse)$title)," ")
trt_time <- map_chr(trtvars,pluck,2)
rep <- map_chr(trtvars,pluck,4)
trts <- strsplit(as.character(trt_time),"_")
treatment <- map_chr(trts,pluck,1)
hour <- map_chr(trts,pluck,2)
jbcdat$targets <-
data.frame(barcode = pData(gse)$description,
treatment,hour,rep,
type = factor(trt_time),
type_rep = paste(trt_time,rep,sep="_"))
#levels(jbcdat$targets$type) <- c("")
rownames(jbcdat$targets) <- pData(gse)$geo_accession
head(jbcdat$targets)
rm(trtvars,treatment,trt_time,trts,hour)
```
fData is the feature (gene) annotation data.
```{r fDannot}
names(fData(gse))
```
Now let's save some array feature annotation information.
```{r featannot}
jbcdat$genes <- data.frame(fData(gse)[,c("Entrez_Gene_ID","Symbol","Chromosome")])
head(jbcdat$genes,n=12)
```
```{r dataobj}
names(jbcdat)
```
Let's save this data set for future use.
```{r savedata}
#jbcdir=c("data/JBC 2012")
#save(jbcdat,file = file.path(jbcdir,"jbcdat.rda"))
```
## MDS plot
Label samples by their treatment and color.
```{r MDSplot}
library(limma)
limma::plotMDS(jbcdat$E,labels=jbcdat$targets$type,
col=unclass(jbcdat$targets$type),
xlim = c(-1.5,1.5), ylim=c(-1,1),
main="MDS plot") #color by type
```
```{r sessionInfo}
sessionInfo()
```