[BioC] EdgeR GLM gene expression analysis of 3 groups
Mark Robinson
mrobinson at wehi.EDU.AU
Fri Mar 25 01:20:40 CET 2011
Hi Shona.
1. Your counts aren't integers. edgeR expects raw counts.
2. My guess is you don't really need a sample-specific effect. Try:
design <- model.matrix(~ age, data = targets)
Cheers,
Mark
On 2011-03-24, at 10:35 PM, Shona Wood wrote:
> Hi,
>
> I have been trying to compare RNA-seq data 3 groups; young, middle aged and old.
> I have done pairwise comparisons between these groups but i would like to
> compare all three together. I have been trying to use the GLM method to do this,
> following the tutorial, 11 Case study: Oral carcinomas vs matched normal tissue.
> i understand that this only compares two conditions but i tried to adapt it to
> my needs. Being new at this I am unsure if I am doing it right and I get an
> error when trying to estimateCRdisp. Though I think the problem is actually to
> do with the way i have specified the design because "middle" doesnt seem to be
> in the design. Please see below:
>
> R version 2.12.2 (2011-02-25)
> Copyright (C) 2011 The R Foundation for Statistical Computing
> ISBN 3-900051-07-0
> Platform: i386-pc-mingw32/i386 (32-bit)
>
> library(edgeR)
> setwd ("/Users/swood/Documents/rna-seq analysis")
> targets<-read.delim (file="targets.txt",
> + stringsAsFactors = FALSE, row.names = "label")
> Error in data[[rowvar]] : attempt to select less than one element
> targets<-read.delim (file="targets.txt")
> targets$age<-factor(targets$age)
> targets$sample<-factor(targets$sample)
> targets
> X files age sample
> 1 p1 p1.txt young 1
> 2 p2 p2.txt young 2
> 3 p3 p3.txt young 3
> 4 p4 p4.txt old 4
> 5 p5 p5.txt old 5
> 6 p6 p6.txt old 6
> 7 p7 p7.txt middle 7
> 8 p8 p8.txt middle 8
> 9 p9 p9.txt middle 9
> d<-readDGE(targets, skip=1, comment.char='#')
> colnames(d)<-row.names(targets)
> d<-calcNormFactors(d)
> d
> An object of class "DGEList"
> $samples
> X files age sample lib.size norm.factors
> 1 p1 p1.txt young 1 674816.7 0.7598895
> 2 p2 p2.txt young 2 648856.9 0.9095203
> 3 p3 p3.txt young 3 449797.3 1.2774867
> 4 p4 p4.txt old 4 699036.0 0.8532141
> 5 p5 p5.txt old 5 441649.7 1.1919854
> 6 p6 p6.txt old 6 670755.9 0.8147330
> 7 p7 p7.txt middle 7 699630.1 1.0262252
> 8 p8 p8.txt middle 8 505795.8 1.2979023
> 9 p9 p9.txt middle 9 478527.1 1.0262468
>
> $counts
> 1 2 3 4 5 6 7
> ENSRNOG00000000007 38.23700 37.89900 34.85820 43.6703 40.81260 43.4009 42.4969
> ENSRNOG00000000008 0.00000 0.00000 0.00000 0.0000 0.00000 0.0000 0.0000
> ENSRNOG00000000009 0.00000 0.00000 0.00000 0.0000 0.00000 0.0000 0.0000
> ENSRNOG00000000010 1.40963 8.28797 2.09491 17.1853 1.76635 14.7567 1.0840
> ENSRNOG00000000012 0.00000 0.00000 0.00000 0.0000 0.00000 0.0000 0.0000
> 8 9
> ENSRNOG00000000007 45.81270 48.6937
> ENSRNOG00000000008 0.00000 0.0000
> ENSRNOG00000000009 0.00000 0.0000
> ENSRNOG00000000010 5.44889 17.7295
> ENSRNOG00000000012 0.00000 0.0000
> 29510 more rows ...
>
> d<-d[rowSums(d$counts)>9,]
> design <- model.matrix(~ sample + age, data = targets)
> design
> (Intercept) sample2 sample3 sample4 sample5 sample6 sample7 sample8 sample9
> 1 1 0 0 0 0 0 0 0 0
> 2 1 1 0 0 0 0 0 0 0
> 3 1 0 1 0 0 0 0 0 0
> 4 1 0 0 1 0 0 0 0 0
> 5 1 0 0 0 1 0 0 0 0
> 6 1 0 0 0 0 1 0 0 0
> 7 1 0 0 0 0 0 1 0 0
> 8 1 0 0 0 0 0 0 1 0
> 9 1 0 0 0 0 0 0 0 1
> ageold ageyoung
> 1 0 1
> 2 0 1
> 3 0 1
> 4 1 0
> 5 1 0
> 6 1 0
> 7 0 0
> 8 0 0
> 9 0 0
> attr(,"assign")
> [1] 0 1 1 1 1 1 1 1 1 2 2
> attr(,"contrasts")
> attr(,"contrasts")$sample
> [1] "contr.treatment"
>
> attr(,"contrasts")$age
> [1] "contr.treatment"
>
> d<-estimateCRDisp(d, design)
> Error in beta[i, ] <- z %*% X :
> number of items to replace is not a multiple of replacement length
>
>
> If you could help me out with the design or suggest a reason for the error i
> would really appreciate it.
>
> Thanks!
>
> _______________________________________________
> Bioconductor mailing list
> Bioconductor at r-project.org
> https://stat.ethz.ch/mailman/listinfo/bioconductor
> Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor
------------------------------
Mark Robinson, PhD (Melb)
Epigenetics Laboratory, Garvan
Bioinformatics Division, WEHI
e: mrobinson at wehi.edu.au
e: m.robinson at garvan.org.au
p: +61 (0)3 9345 2628
f: +61 (0)3 9347 0852
------------------------------
______________________________________________________________________
The information in this email is confidential and intend...{{dropped:6}}
More information about the Bioconductor
mailing list