Seurat新版教程:Guided Clustering Tutorial-(下)
分享是一种态度
回顾
Seurat新版教程:Guided Clustering Tutorial-(上)
好了,最重要的一步来了,聚类分析。Seurat采用的是graph-based聚类方法,k-means方法在V3中已经不存在了。
聚类
1# Cluster the cells
2#Identify clusters of cells by a shared nearest neighbor (SNN) modularity optimization based clustering algorithm. First calculate k-nearest neighbors and construct the SNN graph. Then optimize the modularity function to determine clusters. For a full description of the algorithms, see Waltman and van Eck (2013) The European Physical Journal B. Thanks to Nigel Delaney (evolvedmicrobe@github) for the rewrite of the Java modularity optimizer code in Rcpp!
3pbmc <- FindNeighbors(pbmc, dims = 1:10)
4pbmc <- FindClusters(pbmc, resolution = 0.5)
5Modularity Optimizer version 1.3.0 by Ludo Waltman and Nees Jan van Eck
6
7Number of nodes: 2638
8Number of edges: 96033
9
10Running Louvain algorithm...
110% 10 20 30 40 50 60 70 80 90 100%
12[----|----|----|----|----|----|----|----|----|----|
13**************************************************|
14Maximum modularity in 10 random starts: 0.8720
15Number of communities: 9
16Elapsed time: 0 seconds
17
18> table(pbmc@active.ident) # 查看每一类有多少个细胞
19
20 0 1 2 3 4 5 6 7 8
21697 483 480 344 271 162 155 32 14
22 # 提取某一类细胞。
23> head(subset(as.data.frame(pbmc@active.ident),pbmc@active.ident=="2"))
24
25 pbmc@active.ident
26AAACCGTGCTTCCG 2
27AAAGAGACGCGAGA 2
28AAAGCAGATATCGG 2
29AAAGTTTGTAGCGT 2
30AAATGTTGAACGAA 2
31AAATGTTGTGGCAT 2
提取某一cluster细胞。
1> pbmc
2An object of class Seurat
313714 features across 2638 samples within 1 assay
4Active assay: RNA (13714 features)
5 1 dimensional reduction calculated: pca
6> subpbmc<-subset(x = pbmc,idents="2")
7> subpbmc
8An object of class Seurat
913714 features across 480 samples within 1 assay
10Active assay: RNA (13714 features)
11 1 dimensional reduction calculated: pca
12
13?WhichCells
14head(WhichCells(pbmc,idents="2"))
15
16[1] "AAACCGTGCTTCCG" "AAAGAGACGCGAGA" "AAAGCAGATATCGG" "AAAGTTTGTAGCGT" "AAATGTTGAACGAA" "AAATGTTGTGGCAT"
17# Look at cluster IDs of the first 5 cells
18head(Idents(pbmc), 5)
19
20AAACATACAACCAC AAACATTGAGCTAC AAACATTGATCAGC AAACCGTGCTTCCG AAACCGTGTATGCG
21 1 3 1 2 6
22Levels: 0 1 2 3 4 5 6 7 8
23
24#提取部分细胞
25> pbmc
26An object of class Seurat
2732738 features across 2700 samples within 1 assay
28Active assay: RNA (32738 features)
29> head(colnames(pbmc@assays$RNA@counts)[1:30])
30[1] "AAACATACAACCAC" "AAACATTGAGCTAC" "AAACATTGATCAGC" "AAACCGTGCTTCCG" "AAACCGTGTATGCG" "AAACGCACTGGTAC"
31> subbset<-subset(x=pbmc,cells=colnames(pbmc@assays$RNA@counts)[1:30])
32> subbset
33An object of class Seurat
3432738 features across 30 samples within 1 assay
35Active assay: RNA (32738 features)
当然,我们用的基本都是默认参数,建议?FindClusters一下,看看具体的参数设置,比如虽然是图聚类,但是却有不同的算法,这个要看相应的文献了。
Algorithm for modularity optimization (1 = original Louvain algorithm; 2 = Louvain algorithm with multilevel refinement; 3 = SLM algorithm; 4 = Leiden algorithm). Leiden requires the leidenalg python.
系统发育分析(Phylogenetic Analysis of Identity Classes)
1#Constructs a phylogenetic tree relating the 'average' cell from each identity class.
2# Tree is estimated based on a distance matrix constructed in either gene expression space or PCA spac
3
4?BuildClusterTree
5pbmc<-BuildClusterTree(pbmc)
6Tool(object = pbmc, slot = 'BuildClusterTree')
7
8
9Phylogenetic tree with 9 tips and 8 internal nodes.
10
11Tip labels:
12 0, 1, 2, 3, 4, 5, ...
13
14Rooted; includes branch lengths.
15
16PlotClusterTree(pbmc)
识别低质量样本。
1#Calculate the Barcode Distribution Inflection
2?CalculateBarcodeInflections
3pbmc<-CalculateBarcodeInflections(pbmc)
4SubsetByBarcodeInflections(pbmc)
可视化降维
非线性降维——这个目的是为了可视化,而不是特征提取(PCA),虽然它也可以用来做特征提取。
UMAP
1#Run non-linear dimensional reduction (UMAP/tSNE)
2# If you haven't installed UMAP, you can do so via reticulate::py_install(packages =
3# 'umap-learn')
4pbmc <- RunUMAP(pbmc, dims = 1:10)
5
6UMAP(a=None, angular_rp_forest=False, b=None, init='spectral',
7 learning_rate=1.0, local_connectivity=1, metric='correlation',
8 metric_kwds=None, min_dist=0.3, n_components=2, n_epochs=None,
9 n_neighbors=30, negative_sample_rate=5, random_state=None,
10 repulsion_strength=1.0, set_op_mix_ratio=1.0, spread=1.0,
11 target_metric='categorical', target_metric_kwds=None,
12 target_n_neighbors=-1, target_weight=0.5, transform_queue_size=4.0,
13 transform_seed=42, verbose=True)
14Construct fuzzy simplicial set
15Construct embedding
16 completed 0 / 500 epochs
17 completed 50 / 500 epochs
18 completed 100 / 500 epochs
19 completed 150 / 500 epochs
20 completed 200 / 500 epochs
21 completed 250 / 500 epochs
22 completed 300 / 500 epochs
23 completed 350 / 500 epochs
24 completed 400 / 500 epochs
25 completed 450 / 500 epochs
26> head(pbmc@reductions$umap@cell.embeddings) # 提取UMAP坐标值。
27 UMAP_1 UMAP_2
28AAACATACAACCAC 1.7449461 -2.770269
29AAACATTGAGCTAC 4.6293025 12.092374
30AAACATTGATCAGC 5.2499804 -2.011440
31AAACCGTGCTTCCG -11.9875174 -1.568554
32AAACCGTGTATGCG 0.1626114 -8.743275
33AAACGCACTGGTAC 2.9192858 -1.487868
t-SNE
1pbmc <- RunTSNE(pbmc, dims = 1:10)
2 head(pbmc@reductions$tsne@cell.embeddings)
3 tSNE_1 tSNE_2
4AAACATACAACCAC -11.701490 7.120466
5AAACATTGAGCTAC -21.981401 -21.330793
6AAACATTGATCAGC -1.292978 23.822324
7AAACCGTGCTTCCG 30.877776 -9.926240
8AAACCGTGTATGCG -34.619197 8.773753
9AAACGCACTGGTAC -3.046157 13.013471
比较一下两个可视化的结果。
1# note that you can set `label = TRUE` or use the LabelClusters function to help label
2# individual clusters
3plot1<-DimPlot(pbmc, reduction = "umap",label = TRUE)+scale_color_npg()
4plot2<-DimPlot(pbmc, reduction = "tsne",label = TRUE)+scale_color_npg()
5CombinePlots(plots = list(plot1, plot2),legend="bottom")
6
可以看出,两者的降维可视化的结构是一致的,UMAP方法更加紧凑——在降维图上,同一cluster离得更近,不同cluster离得更远,作为一种后来的算法有一定的优点,但是t-SNE结构也能很好地反映cluster的空间结构。
差异分析
1# Finding differentially expressed features (cluster biomarkers)
2# find all markers of cluster 1
3cluster1.markers <- FindMarkers(pbmc, ident.1 = 1, min.pct = 0.25)
4head(cluster1.markers, n = 5)
5
6 p_val avg_logFC pct.1 pct.2 p_val_adj
7IL32 1.894810e-92 0.8373872 0.948 0.464 2.598542e-88
8LTB 7.953303e-89 0.8921170 0.981 0.642 1.090716e-84
9CD3D 1.655937e-70 0.6436286 0.919 0.431 2.270951e-66
10IL7R 3.688893e-68 0.8147082 0.747 0.325 5.058947e-64
11LDHB 2.292819e-67 0.6253110 0.950 0.613 3.144372e-63
12
这是一种one-others的差异分析方法,就是cluster1与其余的cluster来做比较,当然这个是可以指定的,参数就是ident.2。
1# find all markers distinguishing cluster 5 from clusters 0 and 3
2cluster5.markers <- FindMarkers(pbmc, ident.1 = 5, ident.2 = c(0, 3), min.pct = 0.25)
3head(cluster5.markers, n = 5)
4 p_val avg_logFC pct.1 pct.2 p_val_adj
5FCGR3A 7.583625e-209 2.963144 0.975 0.037 1.040018e-204
6IFITM3 2.500844e-199 2.698187 0.975 0.046 3.429657e-195
7CFD 1.763722e-195 2.362381 0.938 0.037 2.418768e-191
8CD68 4.612171e-192 2.087366 0.926 0.036 6.325132e-188
9RP11-290F20.3 1.846215e-188 1.886288 0.840 0.016 2.531900e-184
10
看看输出结果都是什么。
1?FindMarkers
2data.frame with a ranked list of putative markers as rows,
3and associated statistics as columns (p-values, ROC score, etc., depending on the test used (test.use)).
4The following columns are always present:
5
6avg_logFC: log fold-chage of the average expression between the two groups. Positive values indicate that the gene is more highly expressed in the first group
7pct.1: The percentage of cells where the gene is detected in the first group
8pct.2: The percentage of cells where the gene is detected in the second group
9p_val_adj: Adjusted p-value, based on bonferroni correction using all genes in the dataset
我们还可以输出一个总表。
1# find markers for every cluster compared to all remaining cells, report only the positive ones
2> pbmc.markers <- FindAllMarkers(pbmc, only.pos = TRUE, min.pct = 0.25, logfc.threshold = 0.25)
3Calculating cluster 0
4 |++++++++++++++++++++++++++++++++++++++++++++++++++| 100% elapsed = 08s
5Calculating cluster 1
6 |++++++++++++++++++++++++++++++++++++++++++++++++++| 100% elapsed = 06s
7Calculating cluster 2
8 |++++++++++++++++++++++++++++++++++++++++++++++++++| 100% elapsed = 21s
9Calculating cluster 3
10 |++++++++++++++++++++++++++++++++++++++++++++++++++| 100% elapsed = 07s
11Calculating cluster 4
12 |++++++++++++++++++++++++++++++++++++++++++++++++++| 100% elapsed = 08s
13Calculating cluster 5
14 |++++++++++++++++++++++++++++++++++++++++++++++++++| 100% elapsed = 27s
15Calculating cluster 6
16 |++++++++++++++++++++++++++++++++++++++++++++++++++| 100% elapsed = 21s
17Calculating cluster 7
18 |++++++++++++++++++++++++++++++++++++++++++++++++++| 100% elapsed = 24s
19Calculating cluster 8
20 |++++++++++++++++++++++++++++++++++++++++++++++++++| 100% elapsed = 19s
21
22> head(pbmc.markers)
23 p_val avg_logFC pct.1 pct.2 p_val_adj cluster gene
24RPS12 2.008629e-140 0.5029988 1.000 0.991 2.754633e-136 0 RPS12
25RPS27 2.624075e-140 0.5020359 0.999 0.992 3.598656e-136 0 RPS27
26RPS6 1.280169e-138 0.4673635 1.000 0.995 1.755623e-134 0 RPS6
27RPL32 4.358823e-135 0.4242773 0.999 0.995 5.977689e-131 0 RPL32
28RPS14 3.618793e-128 0.4283480 1.000 0.994 4.962812e-124 0 RPS14
29RPS25 5.612007e-124 0.5203411 0.997 0.975 7.696306e-120 0 RPS25
30
31> pbmc.markers %>% group_by(cluster) %>% top_n(n = 2, wt = avg_logFC)
32# A tibble: 18 x 7
33# Groups: cluster [9]
34 p_val avg_logFC pct.1 pct.2 p_val_adj cluster gene
35 <dbl> <dbl> <dbl> <dbl> <dbl> <fct> <chr>
36 1 1.96e-107 0.730 0.901 0.594 2.69e-103 0 LDHB
37 2 1.61e- 82 0.922 0.436 0.11 2.20e- 78 0 CCR7
38 3 7.95e- 89 0.892 0.981 0.642 1.09e- 84 1 LTB
39 4 1.85e- 60 0.859 0.422 0.11 2.54e- 56 1 AQP3
40 5 0. 3.86 0.996 0.215 0. 2 S100A9
41 6 0. 3.80 0.975 0.121 0. 2 S100A8
42 7 0. 2.99 0.936 0.041 0. 3 CD79A
43 8 9.48e-271 2.49 0.622 0.022 1.30e-266 3 TCL1A
44 9 2.96e-189 2.12 0.985 0.24 4.06e-185 4 CCL5
4510 2.57e-158 2.05 0.587 0.059 3.52e-154 4 GZMK
4611 3.51e-184 2.30 0.975 0.134 4.82e-180 5 FCGR3A
4712 2.03e-125 2.14 1 0.315 2.78e-121 5 LST1
4813 7.95e-269 3.35 0.961 0.068 1.09e-264 6 GZMB
4914 3.13e-191 3.69 0.961 0.131 4.30e-187 6 GNLY
5015 1.48e-220 2.68 0.812 0.011 2.03e-216 7 FCER1A
5116 1.67e- 21 1.99 1 0.513 2.28e- 17 7 HLA-DPB1
5217 7.73e-200 5.02 1 0.01 1.06e-195 8 PF4
5318 3.68e-110 5.94 1 0.024 5.05e-106 8 PPBP
这里可以注意一下only.pos 参数,可以指定返回positive markers 基因。test.use可以指定检验方法,可选择的有:wilcox,bimod,roc,t,negbinom,poisson,LR,MAST,DESeq2。
cluster间保守conserved marker基因.
1#Finds markers that are conserved between the groups
2 head(FindConservedMarkers(pbmc, ident.1 = 0, ident.2 = 1, grouping.var = "groups"))
3Testing group g2: (0) vs (1)
4 |++++++++++++++++++++++++++++++++++++++++++++++++++| 100% elapsed = 04s
5Testing group g1: (0) vs (1)
6 |++++++++++++++++++++++++++++++++++++++++++++++++++| 100% elapsed = 05s
7 g2_p_val g2_avg_logFC g2_pct.1 g2_pct.2 g2_p_val_adj g1_p_val g1_avg_logFC g1_pct.1 g1_pct.2
8S100A4 6.687228e-33 -0.8410733 0.678 0.959 9.170864e-29 2.583278e-35 -0.9569347 0.687 0.945
9MALAT1 2.380860e-16 0.2957331 1.000 1.000 3.265112e-12 7.501490e-20 0.3201783 1.000 1.000
10VIM 1.192054e-14 -0.4921326 0.816 0.951 1.634783e-10 1.193930e-19 -0.4945881 0.798 0.945
11ANXA2 3.115304e-13 -0.6406856 0.169 0.461 4.272327e-09 2.186333e-18 -0.7695240 0.164 0.504
12ANXA1 5.226901e-18 -0.7544607 0.451 0.800 7.168173e-14 1.413468e-17 -0.6660324 0.507 0.803
13S100A11 1.832303e-16 -0.6955104 0.288 0.665 2.512820e-12 1.208765e-17 -0.7757913 0.256 0.584
14 g1_p_val_adj max_pval minimump_p_val
15S100A4 3.542707e-31 6.687228e-33 5.166555e-35
16MALAT1 1.028754e-15 2.380860e-16 1.500298e-19
17VIM 1.637356e-15 1.192054e-14 2.387860e-19
18ANXA2 2.998337e-14 3.115304e-13 4.372665e-18
19ANXA1 1.938430e-13 1.413468e-17 1.045380e-17
20S100A11 1.657700e-13 1.832303e-16 2.417530e-17
绘制基因小提琴图,其中右边的图使用原始的count取log绘制的,其实趋势还是蛮一致的。
1plot1<-VlnPlot(pbmc, features = c("MS4A1", "CD79A"))+scale_color_npg()
2# you can plot raw counts as well
3plot2<- VlnPlot(pbmc, features = c("MS4A1", "CD79A"),ncol=1, same.y.lims=T,slot = "counts", log = TRUE)+scale_color_npg()
4CombinePlots(plots = list(plot1, plot2))
1plot1<-FeaturePlot(pbmc, features = c("MS4A1", "GNLY", "CD3E", "CD14", "FCER1A", "FCGR3A", "LYZ", "PPBP",
2 "CD8A"),min.cutoff = 0, max.cutoff = 4)
3top10 <- pbmc.markers %>% group_by(cluster) %>% top_n(n = 10, wt = avg_logFC)
4plot2<-DoHeatmap(pbmc, features = top10$gene) + NoLegend()+scale_color_npg()
5#CombinePlots(plots = list(plot1, plot2))
6
7library(gridExtra)
8grid.arrange(plot1,plot2,ncol = 2, nrow = 1)
gridExtra 也是可以用来组合seurat绘制的图片的,不足为奇,seurat本身用的ggplot2语法。
细胞周期分析
与细胞周期相关的基因。
1> cc.genes
2$`s.genes`
3 [ ] "MCM5" "PCNA" "TYMS" "FEN1" "MCM2" "MCM4" "RRM1" "UNG" "GINS2" "MCM6"
4[ ] "CDCA7" "DTL" "PRIM1" "UHRF1" "MLF1IP" "HELLS" "RFC2" "RPA2" "NASP" "RAD51AP1"
5[ ] "GMNN" "WDR76" "SLBP" "CCNE2" "UBR7" "POLD3" "MSH2" "ATAD2" "RAD51" "RRM2"
6[ ] "CDC45" "CDC6" "EXO1" "TIPIN" "DSCC1" "BLM" "CASP8AP2" "USP1" "CLSPN" "POLA1"
7[ ] "CHAF1B" "BRIP1" "E2F8"
8
9$g2m.genes
10 [ ] "HMGB2" "CDK1" "NUSAP1" "UBE2C" "BIRC5" "TPX2" "TOP2A" "NDC80" "CKS2" "NUF2" "CKS1B"
11[ ] "MKI67" "TMPO" "CENPF" "TACC3" "FAM64A" "SMC4" "CCNB2" "CKAP2L" "CKAP2" "AURKB" "BUB1"
12[ ] "KIF11" "ANP32E" "TUBB4B" "GTSE1" "KIF20B" "HJURP" "CDCA3" "HN1" "CDC20" "TTK" "CDC25C"
13[ ] "KIF2C" "RANGAP1" "NCAPD2" "DLGAP5" "CDCA2" "CDCA8" "ECT2" "KIF23" "HMMR" "AURKA" "PSRC1"
14[ ] "ANLN" "LBR" "CKAP5" "CENPE" "CTCF" "NEK2" "G2E3" "GAS2L3" "CBX5" "CENPA"
#
1?CellCycleScoring
2## Not run:
3# pbmc_small doesn't have any cell-cycle genes
4# To run CellCycleScoring, please use a dataset with cell-cycle genes
5# An example is available at http://satijalab.org/seurat/cell_cycle_vignette.html
6pbmc <- CellCycleScoring(
7 object = pbmc,
8 g2m.features = cc.genes$g2m.genes,
9 s.features = cc.genes$s.genes
10)
11head(x = pbmc@meta.data)[,c(7,8,9,10,11)] # 我是为了好看取了几列来看,你大可不必。
12
13 seurat_clusters groups S.Score G2M.Score Phase
14AAACATACAACCAC 1 g1 0.10502807 -0.04507596 S
15AAACATTGAGCTAC 3 g1 -0.02680010 -0.04986215 G1
16AAACATTGATCAGC 1 g2 -0.01387173 0.07792135 G2M
17AAACCGTGCTTCCG 2 g2 0.04595348 0.01140204 S
18AAACCGTGTATGCG 6 g1 -0.02602771 0.04413069 G2M
19AAACGCACTGGTAC 1 g2 -0.03692587 -0.08341568 G1
20
21VlnPlot(pbmc, features = c("nFeature_RNA", "nCount_RNA", "percent.mt","percent.HB","G2M.Score","S.Score"), ncol = 6)#+scale_color_npg()
在UMAP空间中绘制细胞周期信息。
1umapem<-pbmc@reductions$umap@cell.embeddings
2metada= pbmc@meta.data
3dim(umapem);dim(metada)
4
5metada$bar<-rownames(metada)
6umapem$bar<-rownames(umapem)
7ccdata<-merge(umapem,metada,by="bar")
8head(ccdata)
9library(ggplot2)
10plot<-ggplot(ccdata, aes(UMAP_1, UMAP_2,label=Phase))+geom_point(aes(colour = factor(Phase)))+
11 #plot<-plot+scale_colour_manual(values=c("#CC33FF","Peru","#660000","#660099","#990033","black","red", "#666600", "green","#6699CC","#339900","#0000FF","#FFFF00","#808080"))+
12 labs("@yunlai",x = "", y="")
13plot=plot+scale_color_aaas() +
14 theme_bw()+theme(panel.grid=element_blank(),legend.title=element_blank(),legend.text = element_text(color="black", size = 10, face = "bold"))
15plot<-plot+guides(colour = guide_legend(override.aes = list(size=5))) +theme(plot.title = element_text(hjust = 0.5))
16
17plot
探索
我们可以用SingleR或者celaref来定义每一个细胞的类型,而不只是cluster某某某了。其中SingleR与seurat是无缝衔接的,seurat对象可以读到这个里面。这里先不写,假定我们已经知道了各个类群:
1# singleR
2#Assigning cell type identity to clusters
3new.cluster.ids <- c("Naive CD4 T", "Memory CD4 T", "CD14+ Mono", "B", "CD8 T", "FCGR3A+ Mono",
4 "NK", "DC", "Platelet")
5names(new.cluster.ids) <- levels(pbmc)
6pbmc <- RenameIdents(pbmc, new.cluster.ids)
7plot1<-DimPlot(pbmc, reduction = "umap", label = TRUE, pt.size = 0.5) + NoLegend()
8plot2<-DimPlot(pbmc, reduction = "tsne", label = TRUE, pt.size = 0.5) + NoLegend()
9grid.arrange(plot1,plot2,ncol = 2, nrow = 1)
平均表达谱函数AverageExpression
1?AverageExpression
2
3AverageExp<-AverageExpression(pbmc,features=unique(top10$gene))
4Finished averaging RNA for cluster Naive CD4 T
5Finished averaging RNA for cluster Memory CD4 T
6Finished averaging RNA for cluster CD14+ Mono
7Finished averaging RNA for cluster B
8Finished averaging RNA for cluster CD8 T
9Finished averaging RNA for cluster FCGR3A+ Mono
10Finished averaging RNA for cluster NK
11Finished averaging RNA for cluster DC
12Finished averaging RNA for cluster Platelet
13> typeof(AverageExp)
14[1] "list"
15> head(AverageExp$RNA)
16 Naive CD4 T Memory CD4 T CD14+ Mono B CD8 T FCGR3A+ Mono NK DC Platelet
17RPS3A 80.173486 61.905672 24.1161656 65.4588054 53.2413307 26.3218430 38.9542301 39.4926725 8.2843982
18LDHB 13.697489 13.663320 2.5790368 2.8923463 7.1405019 1.3868494 4.4117033 3.1508971 2.0904628
19CCR7 2.984692 1.293789 0.1020109 1.0788038 0.1631841 0.1413904 0.1196927 0.1473077 0.0000000
20CD3D 10.724878 11.342980 0.4632153 0.3310177 13.9581981 0.2767605 1.1144081 0.2506665 0.0000000
21CD3E 7.320622 7.807142 0.4356602 0.5741410 7.6701063 0.4992164 3.5389591 0.4322447 1.6960081
22NOSIP 5.912196 5.196203 1.2965789 0.8373659 2.4063675 2.0503487 2.1314856 1.0916285 0.0804829
23
24library(psych)
25library(pheatmap)
26coorda<-corr.test(AverageExp$RNA,AverageExp$RNA,method="spearman")
27pheatmap(coorda$r)
1saveRDS(pbmc, file = "D:\\Users\\Administrator\\Desktop\\RStudio\\single_cell\\filtered_gene_bc_matrices\\hg19pbmc_tutorial.rds")
富集分析
有了基因列表其实就可以做富集分析了借助enrichplot等R包可以做出很多好的探索。
1require(org.Hs.eg.db)
2library(topGO)
3library(DOSE)
4x=as.list(org.Hs.egALIAS2EG)
5geneList<-rep(0,nrow(pbmc))
6names(geneList)<-row.names(pbmc)
7geneList<-geneList[intersect(names(geneList),names(x))]
8newwallgenes=names(geneList)
9
10for (ii in 1:length(geneList)){
11 names(geneList)[ii]<-x[[names(geneList)[ii]]][1]
12
13}
14
15
16gene_erichment_results=list()
17for (c1 in as.character(unique(levels(pbmc.markers$cluster)))){
18 print(paste0("RUN ", c1))
19 testgenes<-subset(pbmc.markers,cluster==c1)$gene
20 gene_erichment_results[[c1]]=list()
21 testgeneList=geneList
22 testgeneList[which(newwallgenes %in% testgenes)]= 1
23 #gene_erichment_results=list()
24 tab1=c()
25 for(ont in c("BP","MF","CC")){
26 sampleGOdata<-suppressMessages(new("topGOdata",description="Simple session",ontology=ont,allGenes=as.factor(testgeneList),
27 nodeSize=10,annot=annFUN.org,mapping="org.Hs.eg.db",ID="entrez"))
28 resultTopGO.elim<-suppressMessages(runTest(sampleGOdata,algorithm="elim",statistic="Fisher"))
29
30 resultTopGO.classic<-suppressMessages(runTest(sampleGOdata,algorithm="classic",statistic="Fisher"))
31 tab1<-rbind(tab1,GenTable(sampleGOdata,Fisher.elim=resultTopGO.elim,Fisher.classic=resultTopGO.classic,orderBy="Fisher.elim",
32 topNodes=200))
33 }
34 gene_erichment_results[[c1]][["topGO"]]=tab1
35 x<-suppressMessages(enrichDO(gene=names(testgeneList)[testgeneList==1],ont="DO",pvalueCutoff=1,pAdjustMethod="BH",universe=names(testgeneList),
36 minGSSize=5,maxGSSize=500,readable=T))
37 gene_erichment_results[[c1]][["DO"]]=x
38 dgn<-suppressMessages(enrichDGN(names(testgeneList)[testgeneList==1]))
39 gene_erichment_results[[c1]][["DGN"]]=dgn
40}
41
42
43gene_erichment_results[["8"]][["topGO"]][1:5,]
44
45gene_erichment_results[["1"]][["topGO"]][1:5,]
46 GO.ID Term Annotated Significant Expected Rank in Fisher.classic Fisher.elim Fisher.classic
471 GO:0019221 cytokine-mediated signaling pathway 516 22 4.15 3 4.5e-05 1.4e-10
482 GO:0045059 positive thymic T cell selection 11 3 0.09 55 7.9e-05 8.7e-05
493 GO:0050850 positive regulation of calcium-mediated ... 30 4 0.24 61 9.1e-05 0.00010
504 GO:0033209 tumor necrosis factor-mediated signaling... 98 6 0.79 70 0.00013 0.00016
515 GO:0002250 adaptive immune response 301 11 2.42 45 0.00040 3.8e-05
可视化
1library(enrichplot)
2dotplot(gene_erichment_results[[1]][["DGN"]], showCategory=30)
1## categorySize can be scaled by 'pvalue' or 'geneNum'
2p1<-cnetplot(gene_erichment_results[[1]][["DGN"]], categorySize="pvalue", foldChange=geneList)
3p2<-cnetplot(gene_erichment_results[[1]][["DGN"]], foldChange=geneList, circular = TRUE, colorEdge = TRUE)
4
5plot_grid(p1, p2, ncol=2)
如果你对单细胞转录组研究感兴趣,但又不知道如何入门,也许你可以关注一下下面的课程
看完记得顺手点个“在看”哦!
长按扫码可关注