[BioC] linear model for illumina Data(identifying differentially expressed genes)
Md.Mamunur Rashid
mamunur.rashid at kcl.ac.uk
Wed Aug 26 11:27:47 CEST 2009
Dear All,
first I would like to apologize for a long mail.
I am working with probe profile file(tab separated file) generated by illumina beadstudio software. Main objective is to identify differentially expressed genes.
the experiment is done using single channel illumina microarray data.
There are 8 samples (from 3 groups)
1. 4135447095_A& 4135447095_B the replicates (patient with rupture 'R')
2. 4135447095_C, 4135447095_D are replicates (control 1 - 'C')
3. 4135447095_E& 4135447095_F are replicates (control 2 - 'C')
4. 4135447095_G& 4135447095_H are replicates( patient without rupture 'W')
I have used following code (lumi and limma package ) to pre-process the data and identify the differentially expressed genes.
Finally using the Venn diagram method in limma shows that around 5000 genes are differentially expressed which is really unexpected
(!!! OR may be I am not understanding it. I would really appreciate if anybody can enlight me about the data in the diagram).
I have attached the code below and the*probe profile file* is uploaded in the link below.
http://www.esnips.com/doc/471328df-77e3-4c70-ba03-e12ccea21ac2/test_Sample_Probe_Profile
thanks in advance.
regards,
mamun
## ***************************************
## ********** R code ******************
## loading library
library(lumi)
library(limma)
library(mgcv)
library(lumiHumanAll.db)
library(lumiHumanIDMapping)
library(annotate)
library(GO.db)
## data acquisition and pre-process
pilot_raw<- lumiR("test_Sample_Probe_Profile.txt",detectionTh =0.05)
pilot_b<- lumiB(pilot_raw)
pilot_t<- lumiT(pilot_b)
pilot_norm<- lumiN(pilot_t)
## identifying differentially expressed genes ##
#################################################
pilot_Matrix<- exprs(pilot_norm)
probeList<- rownames(pilot_Matrix)
## creating the design matrix
pilot_sampleType<- c('R','R','C','C','C','C','W','W')
design_norm_pilot<- model.matrix(~0+pilot_sampleType)
colnames(design_norm_pilot)<- c('C','R','W')
pilot_fit<- lmFit(pilot_Matrix,design_norm_pilot)
pilot.contrast<- makeContrasts (R-C,W-R,W-C, levels=design_norm_pilot)
pilot_fit2<- contrasts.fit(pilot_fit,pilot.contrast)
pilot_fit2<- eBayes(pilot_fit2)
topTable(pilot_fit2,coef=1, adjust="BH")
result<- decideTests(pilot_fit2)
*vennDiagram(result)*
## ******* End of Code *********** ##
*Result of top expressed genes :*
line ID logFC AveExpr t P.Value adj.P.Val B
9271 5220477 3.207187 7.518875 47.05839 1.099642e-13 2.439447e-09 19.48339
13425 2190519 2.783416 6.019115 43.02541 2.837996e-13 3.147905e-09 18.97582
20501 4590356 -2.479955 8.163643 -33.27002 4.288202e-12 3.170982e-08 17.26973
19555 4180546 1.647948 6.693443 31.70328 7.128156e-12 3.953275e-08 16.91037
12424 4150224 -1.335114 8.892449 -25.96928 5.799401e-11 2.573078e-07 15.30650
14224 3710184 1.776777 6.479575 25.18035 8.012080e-11 2.962333e-07 15.04307
7469 2230601 2.028836 8.904665 24.04887 1.296230e-10 3.757779e-07 14.64359
6648 7650678 1.425908 9.373222 23.94687 1.355131e-10 3.757779e-07 14.60626
4552 6770646 -1.651834 7.018473 -22.85906 2.202405e-10 5.428683e-07 14.19372
21007 5570673 -1.369335 10.130594 -22.26554 2.897959e-10 6.428833e-07 13.95698
More information about the Bioconductor
mailing list