[BioC] EdgeR additive linear model, errors

Gordon K Smyth smyth at wehi.EDU.AU
Sat Apr 6 00:05:19 CEST 2013


Dear Colleen,

Note also that your version of edgeR is fairly old.  There have been many 
improvements to the package since the version you're using.

Best wishes
Gordon

> Date: Thu, 4 Apr 2013 09:58:48 -0400
> From: Colleen Burge <cab433 at cornell.edu>
> To: bioconductor at r-project.org
> Subject: Re: [BioC] EdgeR additive linear model, errors
>
> Dear all,
>
> Sorry, my error, the time points should be J, N, O, and M.  I wrote "N"
> twice in the top of my e-mail but not my data :)
>
> Colleen
>
>
> On Thu, Apr 4, 2013 at 9:49 AM, Colleen Burge <cab433 at cornell.edu> wrote:
>
>> Dear all,
>>
>> I'm trying to set up my analysis in EdgeR to look at differential
>> expression at 4 time points (J, N, M, N) in three samples (A,B,C).  Our
>> hypothesis is that environmental conditions at "N" should have an affect on
>> gene expression, and make gene expression significantly different at this
>> time point than the pre: J or post M &N time points.  When I plot the MDS,
>> I can see that my experiment is having a "batch" affect, where samples are
>> grouping by genotype.  I would like to test for differences, particularly
>> between N and the other 3 time points accounting for the affect of the
>> samples.  I believe an additive linear model with genotype as the blocking
>> factor is appropriate (but PLEASE correct me if I'm wrong), is the correct
>> analysis.  I have not been able to get to contrasts, due to error messages,
>> and I may also need help with this :)
>>
>> I have tried the analysis a couple of different ways:
>>
>> model1: ~genotype +group
>>
>> or
>>
>> model ~genotype:genotype + group
>>
>> *Script for model1*
>>
>> #Consideration that treatment is the same for all genotypes
>>
>> setwd ("~/Desktop/CURRENT FILES/R Scripts/Bleaching_data")
>> library(edgeR)
>>
>> # making a matrix of factors called "targets",Time=Time , SF = genotype
>> targets <-readTargets("TargetsSF.txt")
>> targets
>>
>> # importing raw data
>> rawdata <- read.delim("SF_BLEACH.txt",row.names="BLContig")
>> head(rawdata)
>>
>> # setting groups equal to time differences
>> group <- targets$Time
>> group <- as.factor(group)
>> group
>>
>>
>> # making my DGE list
>> y <- DGEList(counts=rawdata,group=group)
>> head(y$counts)
>>
>> # filtering out lowly expressed genes
>> keep <- rowSums(cpm(y)>1) >= 4
>> y <- y[keep,]
>> table(keep)
>> dim(y)
>> # retained 67416 genes
>>
>> #recompute the library sizes:
>> y$samples$lib.size <- colSums(y$counts)
>>
>> #calculating normalization factors
>> y <- calcNormFactors(y)
>> y$samples
>> # looks good
>> n <- y$samples$lib.size
>>
>> # examining sample for outliers
>> plotMDS(y)
>>
>>
>> ## OK adding indiv SF
>> targets
>> genotype <- factor(targets$SF)
>> genotype
>>
>> design <- model.matrix(~ genotype + group)
>> design
>>
>> y <- estimateGLMCommonDisp(y, design, verbose=TRUE)
>> # Disp = 0.19995 , BCV = 0.4472
>> y <- estimateGLMTrendedDisp(y,design)
>> # Loading required package: splines
>> y <- estimateGLMTagwiseDisp(y,design)
>> plotBCV(y)
>> fit <- glmFit(y, design)
>> lrt <- glmLRT(y,fit, coef=5:6)
>> #error here: Error in beta[k, ] <- betaj[decr, ] :  NAs are not allowed in
>> subscripted assignments
>>
>> *Script for model2: consideration that treatment affects are different
>> between genotypes*
>>
>> setwd ("~/Desktop/CURRENT FILES/R Scripts/Bleaching_data")
>> library(edgeR)
>>
>> # making a matrix of factors called "targets",Time=Time , SF = subject
>> targets <-readTargets("TargetsSF.txt")
>> targets
>>
>> # importing raw data
>> rawdata <- read.delim("SF_BLEACH.txt",row.names="BLContig")
>> head(rawdata)
>>
>> # setting groups equal to time differences
>> group <- targets$Time
>> group <- as.factor(group)
>> group
>>
>>
>> # making my DGE list
>> y <- DGEList(counts=rawdata,group=group)
>> head(y$counts)
>>
>> # filtering out lowly expressed genes
>> keep <- rowSums(cpm(y)>1) >= 4
>> y <- y[keep,]
>> dim(y)
>> # retained 67416 genes
>>
>> #recompute the library sizes:
>> y$samples$lib.size <- colSums(y$counts)
>>
>> #calculating normalization factors
>> y <- calcNormFactors(y)
>> y$samples
>> # looks good
>> n <- y$samples$lib.size
>>
>> # examining sample for outliers
>> plotMDS(y)
>>
>>
>> ## OK adding indiv SF
>> targets
>> genotype <- factor(targets$SF)
>> genotype
>>
>> design <- model.matrix(~ genotype *group)
>> design
>> y <- estimateGLMCommonDisp(y, design, verbose=TRUE)
>>
>> #ERROR HERE: Warning message: In estimateGLMCommonDisp.default(y =
>> y$counts, design = design,  :No residual df: setting dispersion to NA
>> #Can clearly see that the model is parameterizing each sample separately
>>
>>
>> #Tried this iteration of the model
>> design2 <- model.matrix (~ genotype + genotype:group)
>> design2
>>
>>
>> y <- estimateGLMCommonDisp(y, design2, verbose=TRUE)
>> #Same error as above!!
>>
>>
>>
>> Here is the version of R/edgeR
>>
>> R version 2.15.1 (2012-06-22)
>>
>> Platform: x86_64-apple-darwin9.8.0/x86_64 (64-bit)
>>
>>
>>
>> locale:
>>
>> [1] C/en_US.UTF-8/C/C/C/C
>>
>>
>>
>> attached base packages:
>>
>> [1] splines   stats     graphics  grDevices utils     datasets  methods
>> base
>>
>>
>>
>> other attached packages:
>>
>> [1] edgeR_2.6.12 limma_3.12.3
>>
>>
>>
>> loaded via a namespace (and not attached):
>>
>> [1] tools_2.15.1
>>
>>
>>
>> Thanks you!!!!
>>
>> Colleen Burge
>>
>> Postdoctoral Research Associate
>>
>> Dept. Ecol & Evol. Biology
>>
>> Cornell University
>

______________________________________________________________________
The information in this email is confidential and intend...{{dropped:4}}



More information about the Bioconductor mailing list