[BioC] EdgeR GLM gene expression analysis of 3 groups

Mark Robinson mrobinson at wehi.EDU.AU
Fri Mar 25 01:20:40 CET 2011


Hi Shona.

1. Your counts aren't integers.  edgeR expects raw counts.

2. My guess is you don't really need a sample-specific effect.  Try:

  design <- model.matrix(~ age, data = targets)

Cheers,
Mark

On 2011-03-24, at 10:35 PM, Shona Wood wrote:

> Hi,
> 
> I have been trying to compare RNA-seq data 3 groups; young, middle aged and old. 
> I have done pairwise comparisons between these groups but i would like to 
> compare all three together. I have been trying to use the GLM method to do this, 
> following the tutorial, 11 Case study: Oral carcinomas vs matched normal tissue. 
> i understand that this only compares two conditions but i tried to adapt it to 
> my needs. Being new at this I am unsure if I am doing it right and I get an 
> error when trying to estimateCRdisp. Though I think the problem is actually to 
> do with the way i have specified the design because "middle" doesnt seem to be 
> in the design. Please see below:
> 
> R version 2.12.2 (2011-02-25)
> Copyright (C) 2011 The R Foundation for Statistical Computing
> ISBN 3-900051-07-0
> Platform: i386-pc-mingw32/i386 (32-bit)
> 
> library(edgeR)
> setwd ("/Users/swood/Documents/rna-seq analysis")
> targets<-read.delim (file="targets.txt",
> + stringsAsFactors = FALSE, row.names = "label")
> Error in data[[rowvar]] : attempt to select less than one element
> targets<-read.delim (file="targets.txt")
> targets$age<-factor(targets$age)
> targets$sample<-factor(targets$sample)
> targets
>   X  files    age sample
> 1 p1 p1.txt  young      1
> 2 p2 p2.txt  young      2
> 3 p3 p3.txt  young      3
> 4 p4 p4.txt    old      4
> 5 p5 p5.txt    old      5
> 6 p6 p6.txt    old      6
> 7 p7 p7.txt middle      7
> 8 p8 p8.txt middle      8
> 9 p9 p9.txt middle      9
> d<-readDGE(targets, skip=1, comment.char='#')
> colnames(d)<-row.names(targets)
> d<-calcNormFactors(d)
> d
> An object of class "DGEList"
> $samples
>   X  files    age sample lib.size norm.factors
> 1 p1 p1.txt  young      1 674816.7    0.7598895
> 2 p2 p2.txt  young      2 648856.9    0.9095203
> 3 p3 p3.txt  young      3 449797.3    1.2774867
> 4 p4 p4.txt    old      4 699036.0    0.8532141
> 5 p5 p5.txt    old      5 441649.7    1.1919854
> 6 p6 p6.txt    old      6 670755.9    0.8147330
> 7 p7 p7.txt middle      7 699630.1    1.0262252
> 8 p8 p8.txt middle      8 505795.8    1.2979023
> 9 p9 p9.txt middle      9 478527.1    1.0262468
> 
> $counts
>                          1        2        3       4        5       6       7
> ENSRNOG00000000007 38.23700 37.89900 34.85820 43.6703 40.81260 43.4009 42.4969
> ENSRNOG00000000008  0.00000  0.00000  0.00000  0.0000  0.00000  0.0000  0.0000
> ENSRNOG00000000009  0.00000  0.00000  0.00000  0.0000  0.00000  0.0000  0.0000
> ENSRNOG00000000010  1.40963  8.28797  2.09491 17.1853  1.76635 14.7567  1.0840
> ENSRNOG00000000012  0.00000  0.00000  0.00000  0.0000  0.00000  0.0000  0.0000
>                          8       9
> ENSRNOG00000000007 45.81270 48.6937
> ENSRNOG00000000008  0.00000  0.0000
> ENSRNOG00000000009  0.00000  0.0000
> ENSRNOG00000000010  5.44889 17.7295
> ENSRNOG00000000012  0.00000  0.0000
> 29510 more rows ...
> 
> d<-d[rowSums(d$counts)>9,]
> design <- model.matrix(~ sample + age, data = targets)
> design
>  (Intercept) sample2 sample3 sample4 sample5 sample6 sample7 sample8 sample9
> 1           1       0       0       0       0       0       0       0       0
> 2           1       1       0       0       0       0       0       0       0
> 3           1       0       1       0       0       0       0       0       0
> 4           1       0       0       1       0       0       0       0       0
> 5           1       0       0       0       1       0       0       0       0
> 6           1       0       0       0       0       1       0       0       0
> 7           1       0       0       0       0       0       1       0       0
> 8           1       0       0       0       0       0       0       1       0
> 9           1       0       0       0       0       0       0       0       1
>  ageold ageyoung
> 1      0        1
> 2      0        1
> 3      0        1
> 4      1        0
> 5      1        0
> 6      1        0
> 7      0        0
> 8      0        0
> 9      0        0
> attr(,"assign")
> [1] 0 1 1 1 1 1 1 1 1 2 2
> attr(,"contrasts")
> attr(,"contrasts")$sample
> [1] "contr.treatment"
> 
> attr(,"contrasts")$age
> [1] "contr.treatment"
> 
> d<-estimateCRDisp(d, design)
> Error in beta[i, ] <- z %*% X : 
>  number of items to replace is not a multiple of replacement length
> 
> 
> If you could help me out with the design or suggest a reason for the error i 
> would really appreciate it.
> 
> Thanks!
> 
> _______________________________________________
> Bioconductor mailing list
> Bioconductor at r-project.org
> https://stat.ethz.ch/mailman/listinfo/bioconductor
> Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor

------------------------------
Mark Robinson, PhD (Melb)
Epigenetics Laboratory, Garvan
Bioinformatics Division, WEHI
e: mrobinson at wehi.edu.au
e: m.robinson at garvan.org.au
p: +61 (0)3 9345 2628
f: +61 (0)3 9347 0852
------------------------------


______________________________________________________________________
The information in this email is confidential and intend...{{dropped:6}}



More information about the Bioconductor mailing list