【单细胞测序攻略:二聚体过滤】DoubletDecon包过滤Seurat对象的二聚体(Doublet)

    科技2025-08-01  3

    单细胞测序攻略:二聚体过滤——DoubletDecon包攻略

    DoubletDecon介绍提醒:1.一直到2020年7月一直在更新,直接对接seurat比较好用2.需要单个样本全部seurat流程走一遍,导出所需文件后进行去除二聚体过程3.去除二聚体以后用subset取去除二聚体后文件的细胞名称,即可得到取出后的交集4.整个包的使用流程参考链接里的protocol文章,本篇只是对代码进行入门介绍 注意:不要用Improved_Seurat_Pre_Process,会报错代码来源:参考文献: 第一步 安装包第二步 加载包第三步 读取数据第四步 质量控制第五步 标准流程 Pipeline第六步 Seurat数据导出——Output第七步 数据预处理(读入DoubletDecon)Seurat_Pre_Process()注意:Improved_Seurat_Pre_Process会报错,别用。先把需要的数据导出来 第八步 二聚体分析第九步 提取过滤二聚体后的细胞名称第十步 细胞名称连接符号变了,要换回seurat的版本第十一步 取交集,所谓去除二聚体后的seurat对象包信息作者简介

    DoubletDecon介绍

    提醒:

    1.一直到2020年7月一直在更新,直接对接seurat比较好用

    2.需要单个样本全部seurat流程走一遍,导出所需文件后进行去除二聚体过程

    3.去除二聚体以后用subset取去除二聚体后文件的细胞名称,即可得到取出后的交集

    4.整个包的使用流程参考链接里的protocol文章,本篇只是对代码进行入门介绍

    注意:不要用Improved_Seurat_Pre_Process,会报错

    代码来源:

    https://github.com/JonathanShor/DoubletDetection https://github.com/EDePasquale/DoubletDecon/blob/master/seurat-3.0.R

    参考文献:

    https://www.cell.com/cell-reports/fulltext/S2211-1247(19)31286-0 https://www.biorxiv.org/content/10.1101/2020.04.23.058156v1.full

    第一步 安装包

    if(!require(devtools)){ install.packages("devtools") # If not already installed } devtools::install_github('EDePasquale/DoubletDecon')

    第二步 加载包

    library(plyr) library(dplyr) library(Matrix) library(ggplot2) library(cowplot) library(Seurat) library(harmony) library(DoubletDecon) library(clusterProfiler)

    第三步 读取数据

    sce.10x <- Read10X(data.dir = 'D:/HSW/HD/scRNA-seq/') testdata_1 <- CreateSeuratObject(counts = sce.10x, project = "testdata_1", min.cells=3, min.features=500) testdata_1

    第四步 质量控制

    testdata_1[["percent.mt"]] <- PercentageFeatureSet(testdata_1, pattern = "^MT-") testdata_1[["percent.HB"]]<-PercentageFeatureSet(testdata_1,features="HBB") VlnPlot(testdata_1, features = c("nFeature_RNA", "nCount_RNA", "percent.mt","percent.HB"), ncol = 2) plot1 <- FeatureScatter(testdata_1, feature1 = "nCount_RNA", feature2 = "percent.mt") plot2 <- FeatureScatter(testdata_1, feature1 = "nCount_RNA", feature2 = "nFeature_RNA") CombinePlots(plots = list(plot1, plot2)) testdata_1 <- subset(testdata_1, subset = nFeature_RNA > 200 & nFeature_RNA < 4000 & percent.mt < 10 & percent.HB < 7) dim(testdata_1) testdata_1

    第五步 标准流程 Pipeline

    testdata_1 <- NormalizeData(testdata_1) testdata_1 <- FindVariableFeatures(testdata_1, selection.method = "vst", nfeatures = 3000, verbose = FALSE) testdata_1 <- ScaleData(testdata_1, verbose = FALSE) testdata_1<- RunPCA(testdata_1, features = VariableFeatures(object =testdata_1)) testdata_1 <- JackStraw(testdata_1 , num.replicate = 100,dims = 50) testdata_1 <- ScoreJackStraw(testdata_1, dims = 1:20) JackStrawPlot(testdata_1, dims =1:20) ElbowPlot(testdata_1,ndims=50) testdata_1<- FindNeighbors(testdata_1, dims = 1:10) testdata_1<- FindClusters(testdata_1, resolution = 0.6) testdata_1<- RunTSNE(testdata_1, dims = 1:10) testdata_1<- RunUMAP(testdata_1, dims = 1:10) testdata_1.markers <- FindAllMarkers(testdata_1, only.pos = TRUE, min.pct = 0.25, logfc.threshold = 0.25) testdata_1.markers %>% group_by(cluster) %>% top_n(n = 2, wt = avg_logFC) top50 <- testdata_1.markers %>% group_by(cluster) %>% top_n(n = 50, wt = avg_logFC)

    第六步 Seurat数据导出——Output

    write.table(top50,file="Top50Genes.txt",sep="\t",col.names= NA) write.table(x = Idents(object = testdata_1),"Cluster.txt",sep="\t",col.names= NA) data<- testdata_1@assays$RNA@data write.table(data,file="counts.txt",sep="\t",col.names= NA)

    第七步 数据预处理(读入DoubletDecon)Seurat_Pre_Process()

    注意:Improved_Seurat_Pre_Process会报错,别用。先把需要的数据导出来

    location="D:/HSW/HD/scRNA-seq/" expressionFile=paste0(location, "counts.txt") genesFile=paste0(location, "Top50Genes.txt") clustersFile=paste0(location, "Cluster.txt") newFiles=Seurat_Pre_Process(expressionFile, genesFile, clustersFile) filename="test_example"

    第八步 二聚体分析

    results=Main_Doublet_Decon(rawDataFile=newFiles$newExpressionFile, groupsFile=newFiles$newGroupsFile, filename=filename, location=location, fullDataFile=NULL, removeCC=FALSE, species="hsa", rhop=1.1, write=TRUE, PMF=TRUE, useFull=FALSE, heatmap=FALSE, centroids=TRUE, num_doubs=100, only50=FALSE, min_uniq=4, nCores=-1)

    第九步 提取过滤二聚体后的细胞名称

    LIST<-row.names(results$Final_nondoublets_groups) head(LIST)

    第十步 细胞名称连接符号变了,要换回seurat的版本

    LIST=gsub('[.]','-',LIST)

    第十一步 取交集,所谓去除二聚体后的seurat对象

    testdata_1_RemoveDoublet<-subset(x = testdata_1, cells=LIST)

    包信息

    sessionInfo() R version 4.0.2 (2020-06-22) Platform: x86_64-w64-mingw32/x64 (64-bit) Running under: Windows 10 x64 (build 19041) Matrix products: default locale: [1] LC_COLLATE=Chinese (Simplified)_China.936 LC_CTYPE=Chinese (Simplified)_China.936 [3] LC_MONETARY=Chinese (Simplified)_China.936 LC_NUMERIC=C [5] LC_TIME=Chinese (Simplified)_China.936 attached base packages: [1] stats4 parallel grid stats graphics grDevices utils datasets methods [10] base other attached packages: [1] Matrix_1.2-18 shiny_1.5.0 stringr_1.4.0 doParallel_1.0.15 [5] iterators_1.0.12 foreach_1.5.0 R.utils_2.10.1 R.oo_1.24.0 [9] R.methodsS3_1.8.1 tidyr_1.1.1 mygene_1.24.0 GenomicFeatures_1.40.1 [13] AnnotationDbi_1.50.3 GenomicRanges_1.40.0 GenomeInfoDb_1.24.2 IRanges_2.22.2 [17] S4Vectors_0.26.1 MCL_1.0 plyr_1.8.6 gplots_3.0.4 [21] DeconRNASeq_1.30.0 pcaMethods_1.80.0 Biobase_2.48.0 BiocGenerics_0.34.0 [25] limSolve_1.5.6 dplyr_1.0.1 clusterProfiler_3.16.1 ROCR_1.0-11 [29] KernSmooth_2.23-17 fields_11.5 spam_2.5-1 dotCall64_1.0-0 [33] DoubletFinder_2.0.3 harmony_1.0 Rcpp_1.0.5 Seurat_3.2.0 [37] sctransform_0.2.1 cowplot_1.0.0 ggplot2_3.3.2 DoubletDecon_1.1.6 loaded via a namespace (and not attached): [1] rappdirs_0.3.1 rtracklayer_1.48.0 knitr_1.29 [4] bit64_4.0.5 DelayedArray_0.14.1 irlba_2.3.3 [7] data.table_1.13.0 rpart_4.1-15 RCurl_1.98-1.2 [10] generics_0.0.2 callr_3.4.3 usethis_1.6.1 [13] RSQLite_2.2.0 RANN_2.6.1 europepmc_0.4 [16] future_1.18.0 chron_2.3-56 bit_4.0.4 [19] enrichplot_1.8.1 spatstat.data_1.4-3 xml2_1.3.2 [22] httpuv_1.5.4 SummarizedExperiment_1.18.2 assertthat_0.2.1 [25] viridis_0.5.1 xfun_0.16 hms_0.5.3 [28] promises_1.1.1 fansi_0.4.1 progress_1.2.2 [31] caTools_1.18.0 dbplyr_1.4.4 igraph_1.2.5 [34] DBI_1.1.0 htmlwidgets_1.5.1 purrr_0.3.4 [37] ellipsis_0.3.1 RSpectra_0.16-0 backports_1.1.8 [40] biomaRt_2.44.1 deldir_0.1-28 vctrs_0.3.2 [43] remotes_2.2.0 abind_1.4-5 withr_2.2.0 [46] ggforce_0.3.2 triebeard_0.3.0 checkmate_2.0.0 [49] GenomicAlignments_1.24.0 prettyunits_1.1.1 goftest_1.2-2 [52] cluster_2.1.0 DOSE_3.14.0 ape_5.4-1 [55] lazyeval_0.2.2 crayon_1.3.4 pkgconfig_2.0.3 [58] labeling_0.3 tweenr_1.0.1 nlme_3.1-148 [61] pkgload_1.1.0 nnet_7.3-14 devtools_2.3.1 [64] rlang_0.4.7 globals_0.12.5 lifecycle_0.2.0 [67] miniUI_0.1.1.1 downloader_0.4 BiocFileCache_1.12.1 [70] rsvd_1.0.3 rprojroot_1.3-2 polyclip_1.10-0 [73] matrixStats_0.56.0 lmtest_0.9-37 urltools_1.7.3 [76] zoo_1.8-8 base64enc_0.1-3 ggridges_0.5.2 [79] processx_3.4.3 png_0.1-7 viridisLite_0.3.0 [82] bitops_1.0-6 Biostrings_2.56.0 blob_1.2.1 [85] qvalue_2.20.0 jpeg_0.1-8.1 gridGraphics_0.5-0 [88] scales_1.1.1 lpSolve_5.6.15 memoise_1.1.0 [91] magrittr_1.5 ica_1.0-2 gdata_2.18.0 [94] zlibbioc_1.34.0 compiler_4.0.2 scatterpie_0.1.5 [97] RColorBrewer_1.1-2 fitdistrplus_1.1-1 Rsamtools_2.4.0 [100] cli_2.0.2 XVector_0.28.0 listenv_0.8.0 [103] patchwork_1.0.1 pbapply_1.4-2 ps_1.3.4 [106] htmlTable_2.1.0 Formula_1.2-3 MASS_7.3-51.6 [109] mgcv_1.8-31 tidyselect_1.1.0 stringi_1.4.6 [112] GOSemSim_2.14.2 askpass_1.1 latticeExtra_0.6-29 [115] ggrepel_0.8.2 fastmatch_1.1-0 tools_4.0.2 [118] future.apply_1.6.0 rstudioapi_0.11 foreign_0.8-80 [121] gridExtra_2.3 farver_2.0.3 Rtsne_0.15 [124] ggraph_2.0.3 digest_0.6.25 rvcheck_0.1.8 [127] BiocManager_1.30.10 proto_1.0.0 quadprog_1.5-8 [130] later_1.1.0.1 RcppAnnoy_0.0.16 httr_1.4.2 [133] colorspace_1.4-1 XML_3.99-0.5 fs_1.5.0 [136] tensor_1.5 reticulate_1.16 splines_4.0.2 [139] uwot_0.1.8 expm_0.999-5 spatstat.utils_1.17-0 [142] graphlayouts_0.7.0 ggplotify_0.0.5 plotly_4.9.2.1 [145] sessioninfo_1.1.1 xtable_1.8-4 jsonlite_1.7.0 [148] spatstat_1.64-1 tidygraph_1.2.0 testthat_2.3.2 [151] R6_2.4.1 Hmisc_4.4-1 gsubfn_0.7 [154] pillar_1.4.6 htmltools_0.5.0 mime_0.9 [157] glue_1.4.1 fastmap_1.0.1 BiocParallel_1.22.0 [160] codetools_0.2-16 maps_3.3.0 fgsea_1.14.0 [163] pkgbuild_1.1.0 utf8_1.1.4 lattice_0.20-41 [166] tibble_3.0.3 sqldf_0.4-11 curl_4.3 [169] leiden_0.3.3 gtools_3.8.2 GO.db_3.11.4 [172] openssl_1.4.2 survival_3.2-3 limma_3.44.3 [175] desc_1.2.0 munsell_0.5.0 DO.db_2.9 [178] GenomeInfoDbData_1.2.3 reshape2_1.4.4 gtable_0.3.0

    作者简介

    何世伟 复旦大学医学博士生 厦门大学公共卫生硕士 研究方向:儿科学、生物信息学、表观遗传流行病学、循证医学 联系方式:swheok@foxmail.com

    Processed: 0.014, SQL: 8