-
Notifications
You must be signed in to change notification settings - Fork 16
Open
Description
Hi. I will be very grateful for any hint on how to overcome the error. I wish to deconvolve my bulk RNA seq data from the lungs of mice using single cell RNA seq data. For practice, I am following your tutorial using my own bulk seq data and the single cell rna seq data you have provided. All seems to go well until when I try to obtain the Cell type-specific differential expression by running the following code. From the beginning I thought the presence of factors(e.g. for th treatment conditions or phenoData) in my data set was the cause but after getting rid of the factors I still get the same error:
csfit <- bseqsc_csdiff(the_best_ilc_bulk_expr_set[genes, ] ~ phenoData| alpha + beta + ductal + acinar,
verbose = 2, nperms = 5000, .rng = 12345)`
Then I get this error:
Groups: bleo=5L | bleo_ko_i=5L | bleo_wt_i=5L | Nacl=5L | NA0L
Cell type(s): 'alpha', 'beta', ..., 'acinar' (4 total)
Fitting mode: auto
Data (filtered): 1202 features x 20 samples
Model has factor effect with more than 2 levels: fitting lm interaction model
Fitting model with nonnegative effects
Model with more than 2 groups: switching to version 2
Fitting linear interaction model ... Error in dimnames(covmat.unscaled) <- list(xnames, xnames) :
length of 'dimnames' [1] not equal to array extent
In addition: Warning messages:
1: In lsfit(D, G, intercept = FALSE) : 'X' matrix was collinear
2: In sqrt(((n - p) * stddevmat^2 - resids^2/(1 - hatdiag[good]))/(n - :
NaNs produced
Here are the details of my code:
# inspired from https://github.com/cozygene/bisque/issues/4
# creating expression set from my bulk rna seq data
library(openxlsx)
library(Biobase)
library(bseqsc)
library(tidyverse)
bseqsc_config('CIBERSORT.R')
data(pancreasMarkers)
# read in the raw data
bulk <- read.xlsx("all_samples.xlsx",
sheet = "rawCounts_unrepeated_genes")
head(bulk)
bulk$Gene <- NULL
rownames(bulk) <- toupper(bulk$external_gene_name)
bulk$external_gene_name <- NULL
bulk <- as.matrix(bulk)
head(bulk)
# create expression matrix
featureData <- as.character(rownames(bulk))
featureData <- as(data.frame(featureData,
stringsAsFactors = FALSE), "AnnotatedDataFrame")
rownames(featureData) <- rownames(bulk)
phenoData <- c(rep("bleo", 5), rep("Nacl", 5), rep("bleo_wt_i", 5), rep("bleo_ko_i", 5))
phenoData <- as(as.data.frame(phenoData), "AnnotatedDataFrame")
rownames(phenoData) <- colnames(bulk)
the_best_ilc_bulk_expr_set <- ExpressionSet(assayData = bulk,
phenoData = phenoData, featureData = featureData)
# read in the single cell rna seq data and fit the model
eislet <- readRDS('islet-eset.rds')
B <- bseqsc_basis(eislet, pancreasMarkers,
clusters = 'cellType', samples = 'sampleID', ct.scale = TRUE)
fit <- bseqsc_proportions(the_best_ilc_bulk_expr_set, B, verbose = TRUE)
pData(the_best_ilc_bulk_expr_set) <- cbind(pData(the_best_ilc_bulk_expr_set), t(coef(fit)))
fit_edger<- fitEdgeR(the_best_ilc_bulk_expr_set, ~phenoData,
coef = c("phenoDatableo_ko_i", "phenoDatableo_wt_i", "phenoDataNacl"))
# extended
fit_edger_ext <- fitEdgeR(the_best_ilc_bulk_expr_set, ~ phenoData + beta + ductal + acinar +gamma,
coef = c("phenoDatableo_ko_i", "phenoDatableo_wt_i",
"phenoDataNacl", "beta", "ductal",
"acinar", "gamma"))
fit_edger_ext$Symbol <- rownames(fit_edger_ext)
# gather P-values from both models
df_fit_edger <- as.data.frame(fit_edger, stringsAsFactors = FALSE)
df_fit_edger$Symbol <- rownames(df_fit_edger)
req_df_fit_edger <- df_fit_edger["PValue"]
colnames(req_df_fit_edger) <- "Base"
head(req_df_fit_edger)
req_df_fit_edger$Symbol <- rownames(req_df_fit_edger)
df_fit_edger_ext <- as.data.frame(fit_edger_ext, stringsAsFactors = FALSE)
df_fit_edger_ext$Symbol <- rownames(df_fit_edger_ext)
head(df_fit_edger_ext)
req_df_fit_edger_ext <- df_fit_edger_ext["PValue"]
colnames(req_df_fit_edger_ext) <- "Adjusted"
req_df_fit_edger_ext$Symbol <- rownames(req_df_fit_edger_ext)
edger_pvals <- req_df_fit_edger_ext %>%
inner_join(req_df_fit_edger)
head(edger_pvals)#
rownames(edger_pvals) <- edger_pvals$Symbol
edger_pvals <- mutate(edger_pvals, Regulated = Adjusted <= 0.01 & Adjusted <= Base)
# plot Base vs Adjusted
pvalueScatter(Base ~ Adjusted, edger_pvals, pval.th = 0.01, label.th = 3.5)
# ER genes
genes_ER <- c('HSPA5', 'MAFA', 'HERPUD1', 'DDIT3', 'UCN3', 'NEUROD1')
# Fit on ER stress genes and genes regulated beyond cell type proportion differences
genes <- union(genes_ER, subset(edger_pvals, Regulated)$Symbol)
csfit <- bseqsc_csdiff(the_best_ilc_bulk_expr_set[genes, ] ~ phenoData| alpha + beta + ductal + acinar,
verbose = 2, nperms = 5000, .rng = 12345)
Then I have the error:
Groups: bleo=5L | bleo_ko_i=5L | bleo_wt_i=5L | Nacl=5L | NA0L
Cell type(s): 'alpha', 'beta', ..., 'acinar' (4 total)
Fitting mode: auto
Data (filtered): 1202 features x 20 samples
Model has factor effect with more than 2 levels: fitting lm interaction model
Fitting model with nonnegative effects
Model with more than 2 groups: switching to version 2
Fitting linear interaction model ... Error in dimnames(covmat.unscaled) <- list(xnames, xnames) :
length of 'dimnames' [1] not equal to array extent
In addition: Warning messages:
1: In lsfit(D, G, intercept = FALSE) : 'X' matrix was collinear
2: In sqrt(((n - p) * stddevmat^2 - resids^2/(1 - hatdiag[good]))/(n - :
NaNs produced
> sessionInfo()
R version 3.6.0 (2019-04-26)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Ubuntu 18.04.2 LTS
Matrix products: default
BLAS: /usr/lib/x86_64-linux-gnu/blas/libblas.so.3.7.1
LAPACK: /usr/lib/x86_64-linux-gnu/lapack/liblapack.so.3.7.1
locale:
[1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C LC_TIME=de_DE.UTF-8 LC_COLLATE=en_US.UTF-8
[5] LC_MONETARY=de_DE.UTF-8 LC_MESSAGES=en_US.UTF-8 LC_PAPER=de_DE.UTF-8 LC_NAME=C
[9] LC_ADDRESS=C LC_TELEPHONE=C LC_MEASUREMENT=de_DE.UTF-8 LC_IDENTIFICATION=C
attached base packages:
[1] parallel stats graphics grDevices utils datasets methods base
other attached packages:
[1] edgeR_3.26.6 limma_3.40.6 preprocessCore_1.46.0 e1071_1.7-2 forcats_0.4.0 stringr_1.4.0
[7] dplyr_0.8.3 purrr_0.3.2 readr_1.3.1 tidyr_0.8.3 tibble_2.1.3 ggplot2_3.2.0
[13] tidyverse_1.2.1 bseqsc_1.0 csSAM_1.4 Rcpp_1.0.2 Biobase_2.44.0 BiocGenerics_0.30.0
[19] openxlsx_4.1.0.1
loaded via a namespace (and not attached):
[1] nlme_3.1-140 lubridate_1.7.4 bit64_0.9-7 doParallel_1.0.14 RColorBrewer_1.1-2 httr_1.4.0
[7] tools_3.6.0 backports_1.1.4 R6_2.4.0 DBI_1.0.0 lazyeval_0.2.2 colorspace_1.4-1
[13] withr_2.1.2 tidyselect_0.2.5 gridExtra_2.3 xbioc_0.1.17 bit_1.1-14 compiler_3.6.0
[19] cli_1.1.0 rvest_0.3.4 xml2_1.2.1 pkgmaker_0.28 scales_1.0.0 NMF_0.22
[25] digest_0.6.20 pkgconfig_2.0.2 bibtex_0.4.2 rlang_0.4.0 readxl_1.3.1 rstudioapi_0.10
[31] RSQLite_2.1.2 generics_0.0.2 jsonlite_1.6 dendextend_1.12.0 zip_2.0.3 magrittr_1.5
[37] Formula_1.2-3 munsell_0.5.0 S4Vectors_0.22.0 viridis_0.5.1 stringi_1.4.3 yaml_2.2.0
[43] plyr_1.8.4 grid_3.6.0 blob_1.2.0 crayon_1.3.4 lattice_0.20-38 haven_2.1.1
[49] splines_3.6.0 hms_0.5.0 locfit_1.5-9.1 zeallot_0.1.0 pillar_1.4.2 rngtools_1.4
[55] reshape2_1.4.3 codetools_0.2-16 stats4_3.6.0 glue_1.3.1 BiocManager_1.30.4 modelr_0.1.4
[61] vctrs_0.2.0 foreach_1.4.7 cellranger_1.1.0 gtable_0.3.0 assertthat_0.2.1 gridBase_0.4-7
[67] xtable_1.8-4 broom_0.5.2 class_7.3-15 viridisLite_0.3.0 iterators_1.0.12 AnnotationDbi_1.46.0
[73] registry_0.5-1 memoise_1.1.0 IRanges_2.18.1 cluster_2.1.0
Metadata
Metadata
Assignees
Labels
No labels