[BioC] EdgeR - problems running estimateCRDisp
Gordon K Smyth
smyth at wehi.EDU.AU
Sun Oct 24 01:31:57 CEST 2010
Dear Josquin,
estimateCDDisp() has worked on all the datasets we have tested it out on
so far (although that's admitedly a very limited number), so we are not
able to reproduce your problems. The function is liable to convergence
issues however because it needs to fit a very large number of generalized
linear models, one or more of which could conceivably fail to converge.
That might be the problem with your data. I don't see any problems with
the code example that you give, you seem to be using the functions
correctly, so the issue must be with the count data itself. Would you be
willing to share your data with us offline so that we diagnose and perhaps
solve the problem?
To see estimateCDDisp() working as intended, you can run the example:
library(edgeR)
example(glmFit)
CR <- estimateCRDisp(d, design)
fit <- glmFit(d, design, dispersion=CR$CR.common.dispersion)
results <- glmLRT(d, fit, coef=2)
topTags(results)
Best wishes
Gordon
----- ORIGINAL MESSAGE -------------
[BioC] EdgeR - problems running estimateCRDisp
josquin.tibbits at dpi.vic.gov.au josquin.tibbits at dpi.vic.gov.au
Fri Oct 22 07:44:00 CEST 2010
I am running the new EdgeR package 1.8 and have updated R to 1.12.0 and
updated all the packages I have installed.
I have an RNAseq experiment with 24 samples and am wanting to run a glm
analysis using the Cox-Reid common dispersion (and tagwise) paramaters. I
have created a DGEList object using a targets file and this successfully
ran in the calcNormFactors() and in the plotMDS.dge() functions. I have
also created what appears to be a sensible and ok design matrix using
model.matrix(). These then are being input to estimateCRDisp() to
estimate the common dispersion. The code I have run is as follows (in
red) with output (blue) which is pretty much exactly the same as the
worked example in the manual on pp 51-- and does not seem abnormal at
all......
targets <- read.delim("Lp_NUE_DGEbymSEQ_TARGETS_EdgeRIN.txt",
stringsAsFactors = FALSE, row.names = "label")
targets$Nitrogen <- factor(targets$Nitrogen)
targets$Endophyte <- factor(targets$Endophyte)
targets$Tissue <- factor(targets$Tissue)
targets
## CREATE THE DGEList Object
DGEList.object <- readDGE(targets, skip = 1, comment.char = '#')
colnames(DGEList.object) <- row.names(targets)
## CALCULATE THE NORMALISATION (f) FACTORS
DGEList.object <- calcNormFactors(DGEList.object)
## Exclude data where rowsums are < 30 YOU CAN CHANGE THIS (there being 24
samples in the experiment)
DGEList.object <- DGEList.object[rowSums(DGEList.object$counts) > 30, ]
DGEList.object
An object of class "DGEList"
$samples
files Endophyte Tissue Nitrogen
lib.size norm.factors
Lp_NUE_LFFULL Lp_NUE_LFFULL_ER_RAWREADS.txt Free Leaf Full
3786614 0.9244799
Lp_NUE_LFNO3 Lp_NUE_LFNO3_ER_RAWREADS.txt Free Leaf Partial
5311856 0.8120410
Lp_NUE_LFK Lp_NUE_LFK_ER_RAWREADS.txt Free Leaf Full
3482772 0.7224960
Lp_NUE_LFPO4 Lp_NUE_LFPO4_ER_RAWREADS.txt Free Leaf Full
6208397 0.7367309
Lp_NUE_LFCa Lp_NUE_LFCa_ER_RAWREADS.txt Free Leaf Full
3597004 0.8147619
19 more rows ...
$counts
Lp_NUE_LFFULL Lp_NUE_LFNO3 Lp_NUE_LFK Lp_NUE_LFPO4
Lp_NUE_LFCa Lp_NUE_LFNH4 Lp_NUE_RFFULL Lp_NUE_RFNO3 Lp_NUE_RFK
Lp_NUE_RFPO4
>prg_cds_44 345 382 120 261 141
215 888 717 698 1112
>prg_cds_141 290 451 126 422 235
459 676 559 474 879
>prg_cds_2 233 277 131 333 239
376 246 185 124 282
>prg_cds_16 433 608 184 614 346
423 924 756 595 913
>prg_cds_84 287 353 90 241 248
394 823 612 989 1061
Lp_NUE_RFCa Lp_NUE_RFNH4 Lp_NUE_LSFULL Lp_NUE_LSNO3
Lp_NUE_LSK Lp_NUE_LSPO4 Lp_NUE_LSCa Lp_NUE_LSNH4 Lp_NUE_RSFULL
Lp_NUE_RSNO3
>prg_cds_44 1107 439 954 257 205
107 868 106 316 1147
>prg_cds_141 583 1052 773 325 230
193 600 264 414 1046
>prg_cds_2 235 454 177 232 177
203 138 239 334 228
>prg_cds_16 817 1040 2373 464 220
276 895 291 714 2655
>prg_cds_84 890 1653 894 268 169
123 714 209 311 1078
Lp_NUE_RSK Lp_NUE_RSPO4 Lp_NUE_RSCa Lp_NUE_RSNH4
>prg_cds_44 622 903 114 474
>prg_cds_141 505 711 205 1166
>prg_cds_2 93 136 157 340
>prg_cds_16 1398 1660 311 1258
>prg_cds_84 893 771 154 1269
63905 more rows ...
## Produce an MDS plot of the data
#plotMDS.dge(DGEList.object, main = "MDS plot for Lp_NUE_DGEbymSEQ")
## DEFINE THE DESIGN MATRIX
design <- model.matrix(~ Endophyte + Tissue + Nitrogen , data = targets)
print("This is the design matrix")
design
(Intercept) EndophyteSTWT TissueRoot NitrogenPartial
Lp_NUE_LFFULL 1 0 0 0
Lp_NUE_LFNO3 1 0 0 1
Lp_NUE_LFK 1 0 0 0
Lp_NUE_LFPO4 1 0 0 0
Lp_NUE_LFCa 1 0 0 0
Lp_NUE_LFNH4 1 0 0 1
Lp_NUE_RFFULL 1 0 1 0
Lp_NUE_RFNO3 1 0 1 1
Lp_NUE_RFK 1 0 1 0
Lp_NUE_RFPO4 1 0 1 0
Lp_NUE_RFCa 1 0 1 0
Lp_NUE_RFNH4 1 0 1 1
Lp_NUE_LSFULL 1 1 0 0
Lp_NUE_LSNO3 1 1 0 1
Lp_NUE_LSK 1 1 0 0
Lp_NUE_LSPO4 1 1 0 0
Lp_NUE_LSCa 1 1 0 0
Lp_NUE_LSNH4 1 1 0 1
Lp_NUE_RSFULL 1 1 1 0
Lp_NUE_RSNO3 1 1 1 1
Lp_NUE_RSK 1 1 1 0
Lp_NUE_RSPO4 1 1 1 0
Lp_NUE_RSCa 1 1 1 0
Lp_NUE_RSNH4 1 1 1 1
attr(,"assign")
[1] 0 1 2 3
attr(,"contrasts")
attr(,"contrasts")$Endophyte
[1] "contr.treatment"
attr(,"contrasts")$Tissue
[1] "contr.treatment"
attr(,"contrasts")$Nitrogen
[1] "contr.treatment"
## ESTIMATE THE Cox-Reid common dispersion
DGEList.object <- estimateCRDisp(DGEList.object, design)
names(DGEList.object)
This step is giving the following error:::
Loading required package: MASS
Error: NA/NaN/Inf in foreign function call (arg 1)
In addition: Warning messages:
1: glm.fit: algorithm did not converge
2: step size truncated due to divergence
> names(DGEList.object)
[1] "samples" "counts"
Clearly the function is not working. I have tried two different computers
and a run it in 64 bit but always with the same result. I have also tried
using a balanced design and a simple design all with the same results. I
have used the DGEList object to analyse the data using the original
estimateCommonDisp() and exactTest() which worked fine.
I was hoping someone might be able to help resolve this for me????
Thanks in advance....................... Josquin
The sessionInfo() and traceback() are below:
> sessionInfo()
R version 2.12.0 (2010-10-15)
Platform: i386-pc-mingw32/i386 (32-bit)
locale:
[1] LC_COLLATE=English_Australia.1252 LC_CTYPE=English_Australia.1252
LC_MONETARY=English_Australia.1252 LC_NUMERIC=C
[5] LC_TIME=English_Australia.1252
attached base packages:
[1] stats graphics grDevices utils datasets methods base
other attached packages:
[1] MASS_7.3-8 edgeR_1.8.1
loaded via a namespace (and not attached):
[1] limma_3.6.0
> traceback()
4: .Fortran("dqrls", qr = x[good, ] * w, n = ngoodobs, p = nvars,
y = w * z, ny = 1L, tol = min(1e-07, control$epsilon/1000),
coefficients = double(nvars), residuals = double(ngoodobs),
effects = double(ngoodobs), rank = integer(1L), pivot = 1L:nvars,
qraux = double(nvars), work = double(2 * nvars), PACKAGE = "base")
3: glm.fit(design, y[i, ], offset = offset[i, ], family = f)
2: adjustedProfileLik(spline.disp[i], y.select, design = design,
offset = offset.mat.select + lib.size.mat.select)
1: estimateCRDisp(DGEList.object, design)
Notice:
This email and any attachments may contain information t...{{dropped:9}}
More information about the Bioconductor
mailing list