[BioC] EdgeR repeated measurment
Teshome Tilahun Bizuayehu
Teshome.Tilahun.Bizuayehu at uin.no
Tue Apr 8 14:29:15 CEST 2014
Thanks,
On the model that I used
quality <- factor(targets$quality, levels=c("Good","Low"))
time <- factor(targets$time, levels=c("0day","14day","28day"))
design <- model.matrix(~quality+quality:indv+quality:time)
I found that LR values from Low vs Good at 0 day, Low vs good at 14 day, and Low vs Good at 28 day have similar values, so I wonder how this LR value be similar?
Tesh
________________________________________
From: Gordon K Smyth <smyth at wehi.EDU.AU>
Sent: Saturday, April 5, 2014 9:09 AM
To: Teshome Tilahun Bizuayehu
Cc: Bioconductor mailing list
Subject: EdgeR repeated measurment
> Date: Fri, 4 Apr 2014 01:06:47 -0700 (PDT)
> From: "teshe [guest]" <guest at bioconductor.org>
> To: bioconductor at r-project.org, ttb at uin.no
> Subject: [BioC] EdgeR repeated measurment
>
>
> I have a miRNA data from an experiment, which contain repeated measurements. The experiment set up is as follows: 12 indviduals were assigned into two groups, Good and Low based on previous observations; where Good means those produce good quality and Low means those produce low quality. Samples were taken from each individual in a week interval for a month. The miRNA-seq data has three of these time points (0-day, 14-day and 28-day) for the eleven individuals, unfortunetly we lost one of the samples from Good-0-day group. Our aim is to check whether miRNA are involved in determining the quality and whether the time of sampling affect the quality through miRNAs.
> Hypotheses of test are:
>
> 1.Mean miRNA expression is the same regardless of quality at any time point
> 2.Mean miRNA expression is the same at different time point within the same quality group
>
> Below is the detail of the analysis, which I used in edgeR.
>
> My question is, am I in the write track?
I don't have time to go through your analysis, but I'd recommend that you
used the built-in function cpm() to compute 'm' near the beginning.
> How can I check whether the assumption of repeated measurments met or
> not?
You are not making distributional assumptions about the repeated measures,
so there is nothing to check.
> As you can see from the head of the data table individual S4 looks an
> outlier, which is also clear after normalization in a plot (plot is not
> shown). My question is that should I remove it? Is there any effect in
> case of removing or missing samples? Does sample imbalance affect the
> model?
edgeR is able to handle imbalanced numbers of samples without any problem.
Best wishes
Gordon
> R version 3.1.0 beta (2014-03-28 r65330)
> Platform: x86_64-pc-linux-gnu (64-bit)
>
>> y <- read.table(file="salmon",row.names="miRNA", header=TRUE)
>> head(y)
> S2 S12 S13 S19 S20 S4 S6 S9 S15 S23 S30 S42 S47 S52 S53 S59 S60 S44
> let-7a 50 17 7 13 18 570 40 25 89 126 36 22 12 21 23 59 6 27
> let-7b 3 3 1 2 0 54 6 2 11 7 7 0 4 5 3 22 0 8
> let-7d 0 1 0 0 1 21 0 2 0 0 1 0 0 1 0 6 0 1
> let-7e 0 0 1 3 1 491 4 2 15 8 11 0 6 1 0 3 0 2
> let-7g 40 1 1 1 1 162 3 4 10 57 13 2 1 8 2 8 0 7
> let-7h 0 0 0 0 0 271 1 2 0 1 2 0 1 0 0 2 2 0
> S46 S49 S55 S43 S50 S82 S87 S92 S93 S99 S100 S84 S85 S89 S95 S83 S90
> let-7a 110 18 5 42 1 24 14 2 11 47 5 48 28 30 26 6 18
> let-7b 15 1 4 5 1 3 0 0 2 11 0 30 5 1 1 0 3
> let-7d 1 0 0 0 0 0 2 0 0 1 0 0 2 0 0 0 0
> let-7e 12 2 2 33 0 7 5 0 3 21 3 1 18 12 2 0 11
> let-7g 4 1 0 11 2 9 6 0 0 19 5 10 7 5 4 2 1
> let-7h 0 0 1 9 0 0 0 0 2 3 1 0 0 0 0 0 3
>> time <- factor(c("0day", "0day", "0day","0day","0day","0day","0day","0day","0day","0day","0day","14day", "14day", "14day", "14day", "14day", "14day", "14day", "14day", "14day", "14day", "14day", "14day", "28day", "28day", "28day", "28day", "28day", "28day", "28day", "28day", "28day", "28day", "28day", "28day"))
>> quality <- factor(c("Good", "Good","Good","Good","Good","Low","Low","Low","Low","Low","Low","Good","Good","Good","Good","Good","Good","Low","Low","Low","Low","Low","Low","Good", "Good","Good","Good","Good","Good","Low","Low","Low","Low","Low","Low"))
>> indv <- factor(c("F427","F437","F438","F445","F446","F429","F431","F434","F440","F447","F448","F427","F432","F437","F438","F445","F446","F429","F431","F434","F440","F447","F448","F427","F432","F437","F438","F445","F446","F429","F431","F434","F440","F447","F448"))
>> dim(y)
> [1] 140 35
>> dge <- DGEList(counts=y)
>> dge <- calcNormFactors(dge)
>> m <- 1e6 * t(t(dge$counts)/dge$samples$lib.size)
>> keep <- rowSums(m > 1) >= 3
>> dge <- dge[keep,]
>> dim(dge)
> [1] 135 35
>> indv <- factor(c("1","2","3","4","5","1","2","3","4","5","6","1","2","3","4","5","6","1","2","3","4","5","6","1","2","3","4","5","6","1","2","3","4","5","6"))
>> targets <- data.frame(quality,indv,time)
>> targets
> quality indv time
> 1 Good 1 0day
> 2 Good 2 0day
> 3 Good 3 0day
> 4 Good 4 0day
> 5 Good 5 0day
> 6 Low 1 0day
> 7 Low 2 0day
> 8 Low 3 0day
> 9 Low 4 0day
> 10 Low 5 0day
> 11 Low 6 0day
> 12 Good 1 14day
> 13 Good 2 14day
> 14 Good 3 14day
> 15 Good 4 14day
> 16 Good 5 14day
> 17 Good 6 14day
> 18 Low 1 14day
> 19 Low 2 14day
> 20 Low 3 14day
> 21 Low 4 14day
> 22 Low 5 14day
> 23 Low 6 14day
> 24 Good 1 28day
> 25 Good 2 28day
> 26 Good 3 28day
> 27 Good 4 28day
> 28 Good 5 28day
> 29 Good 6 28day
> 30 Low 1 28day
> 31 Low 2 28day
> 32 Low 3 28day
> 33 Low 4 28day
> 34 Low 5 28day
> 35 Low 6 28day
>> quality <- factor(targets$quality, levels=c("Good","Low"))
>> time <- factor(targets$time, levels=c("0day","14day","28day"))
>> design <- model.matrix(~quality+quality:indv+quality:time)
>> dge <- estimateGLMCommonDisp(dge,design)
>> dge <- estimateGLMTrendedDisp(dge,design)
>> dge <- estimateGLMTagwiseDisp(dge,design)
>> fit <- glmFit(dge, design)
>> lrt <- glmLRT(fit, coef="qualityGood:time14day")
>> topTags(lrt)
>> tab <- topTags(lrt, n=Inf)
>> write.table(tab, file="Good_14dayvs0day.csv")
>
>> lrt <- glmLRT(fit, coef="qualityGood:time28day")
>> topTags(lrt)
>> tab <- topTags(lrt, n=Inf)
>> write.table(tab, file="Good_28dayvs0day.csv")
>
>> lrt <- glmLRT(fit, contrast=c(0,0,0,0,0,0,0,0,0,0,0,0,-1,0,1,0))
>> topTags(lrt)
>> tab <- topTags(lrt, n=Inf)
>> write.table(tab, file="Good_28dayvs14day.csv")
>
>> lrt <- glmLRT(fit, contrast=c(0,-1,0,0,0,0,0,0,0,0,0,0,0,1,0,0))
>> topTags(lrt)
>> tab <- topTags(lrt, n=Inf)
>> write.table(tab, file="Low_14dayvs0day.csv")
>
>> lrt <- glmLRT(fit, contrast=c(0,-1,0,0,0,0,0,0,0,0,0,0,0,0,0,1))
>> topTags(lrt)
>> tab <- topTags(lrt, n=Inf)
>> write.table(tab, file="Low_28dayvs0day.csv")
>
>> lrt <- glmLRT(fit, contrast=c(0,0,0,0,0,0,0,0,0,0,0,0,0,-1,0,1))
>> topTags(lrt)
>> tab <- topTags(lrt, n=Inf)
>> write.table(tab, file="Low_28dayvs14day.csv")
>
>> lrt <- glmLRT(fit, contrast=c(0,0,0,0,0,0,0,0,0,0,0,0,0,0,-1,1))
>> topTags(lrt)
>> tab <- topTags(lrt, n=Inf)
>> write.table(tab, file="28day_LowvsGood.csv")
>
>> lrt <- glmLRT(fit, contrast=c(0,0,0,0,0,0,0,0,0,0,0,0,-1,1,0,0))
>> topTags(lrt)
>> tab <- topTags(lrt, n=Inf)
>> write.table(tab, file="14day_LowvsGood.csv")
>
>> lrt <- glmLRT(fit, coef="qualityLow")
>> topTags(lrt)
>> tab <- topTags(lrt, n=Inf)
>> write.table(tab, file="0day_LowvsGood.csv")
>
> -- output of sessionInfo():
>
> R version 3.1.0 beta (2014-03-28 r65330)
> Platform: x86_64-pc-linux-gnu (64-bit)
>
> locale:
> [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C
> [3] LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8
> [5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8
> [7] LC_PAPER=en_US.UTF-8 LC_NAME=C
> [9] LC_ADDRESS=C LC_TELEPHONE=C
> [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
>
> attached base packages:
> [1] stats graphics grDevices utils datasets methods base
>
> other attached packages:
> [1] edgeR_3.4.2 limma_3.18.13
>
> loaded via a namespace (and not attached):
> [1] tools_3.1.0
______________________________________________________________________
The information in this email is confidential and intend...{{dropped:6}}
More information about the Bioconductor
mailing list