[BioC] differential expression- edgeR
Mark Robinson
mark.robinson at imls.uzh.ch
Mon Dec 23 17:35:45 CET 2013
Dear Ashutosh,
I wasn't able to (nicely) get exactly the same files from GEO, but here is an alternative:
See:
http://www.ebi.ac.uk/arrayexpress/experiments/E-MTAB-71/samples/
f <- "ftp://ftp.ebi.ac.uk/pub/databases/microarray/data/experiment/MTAB/E-MTAB-71/E-MTAB-71.raw.1.zip"
bf <- basename(f)
download.file(f, bf) # download ZIP from ArrayExpress
unzip(bf) # should put the 8 TXT files in current dir
tg <- dir(".","^DLCK|WT")
library("edgeR")
counts <- readDGE(tg)$counts
counts[is.na(counts)] <- 0
grp <- sapply(colnames(counts),function(u) strsplit(u,"_")[[1]][1])
d <- DGEList(counts=counts, group=grp)
This should return:
> d
An object of class "DGEList"
$counts
DLCK.TG_1 DLCK.TG_2 DLCK.TG_3 DLCK.TG_4 WT_1 WT_3 WT_4 WT_6
AAAAAAAAAAAAAAAAA 22653 3059 1366 6574 7782 35096 6623 9633
AAAAAAAAAAAAAAAAC 82 51 55 93 412 134 335 519
AAAAAAAAAAAAAAAAG 2 3 7 9 59 5 45 84
AAAAAAAAAAAAAAAAT 118 471 359 717 1842 94 2465 3311
AAAAAAAAAAAAAAACA 67 4 4 12 17 108 12 21
844311 more rows ...
$samples
group lib.size norm.factors
DLCK.TG_1 DLCK.TG 651172 1
DLCK.TG_2 DLCK.TG 2685418 1
DLCK.TG_3 DLCK.TG 3202246 1
DLCK.TG_4 DLCK.TG 2460753 1
WT_1 WT 3142262 1
WT_3 WT 294909 1
WT_4 WT 3517977 1
WT_6 WT 3558260 1
[… proceed from here …]
Alternatively, from your original code, I think you want 'skip=1' and is it possible that you have an extra unwanted file in your 'targets' list ?
Anyways, hope that helps.
Best, Mark
----------
Prof. Dr. Mark Robinson
Bioinformatics, Institute of Molecular Life Sciences
University of Zurich
http://ow.ly/riRea
On 23.12.2013, at 16:50, Ashutosh [guest] <guest at bioconductor.org> wrote:
>
> I am new to R and edgeR. I am trying to follow to example
> 9. Case Study: deep-sequenced short tags from the edgeR manual.
>
> I download the data from http://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSM272105
> by clicking the full table and then saving webpage as .txt file containing
>
> First 4 rows as below folled by fifth which contains the tag seq and count.
>
> #SEQUENCE =
> #COUNT =
> #TPM = tags per million
> SEQUENCE COUNT TPM
> CATCGCCAGCGGGCACC 1 0.37
>
> Now we I follow the steps of 9.3 reading data and creating the DGElist: After running
> < d<- readDGE(targets, skip = 5, comment.char = "#"), I get
> Error in taglist[[i]] : subscript out of bounds
>
> Can anyone please help, how I can solve this issue.
>
> Best regards,
> Ashutosh
>
>
> -- output of sessionInfo():
>
> R version 3.0.2 (2013-09-25)
> Platform: x86_64-pc-linux-gnu (64-bit)
>
> locale:
> [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C
> [3] LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8
> [5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8
> [7] LC_PAPER=en_US.UTF-8 LC_NAME=C
> [9] LC_ADDRESS=C LC_TELEPHONE=C
> [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
>
> attached base packages:
> [1] stats graphics grDevices utils datasets methods base
>
> other attached packages:
> [1] edgeR_3.4.2 limma_3.18.7
>
>
> --
> Sent via the guest posting facility at bioconductor.org.
>
> _______________________________________________
> Bioconductor mailing list
> Bioconductor at r-project.org
> https://stat.ethz.ch/mailman/listinfo/bioconductor
> Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor
More information about the Bioconductor
mailing list