[BioC] illumina beadarray GEO files

Mark Dunning mark.dunning at gmail.com
Wed Jun 15 17:22:28 CEST 2011


Hi Nathalie,

There is a calculateDetection function in beadarray that will compute
the detection scores commonly used for thresholding. However, this
relies on the negative controls being identifiable and present in the
data, which is not always the case for GEO-submitted data. The
approach that James suggests may be the only option.

Best wishes,

Mark



On Wed, Jun 15, 2011 at 3:09 PM, James F. Reid
<james.reid at ifom-ieo-campus.it> wrote:
> Hi Nathalie,
>
> On 06/15/2011 02:13 PM, Nathalie Conte wrote:
>>
>> HI
>> I want to have a look at this experiment which is deposited in GEO under
>> the reference:GSM 290549, this experiment contains 6 files
>> GSM296418.csv.gz 293.0 Kb (ftp)
>>
>> <ftp://ftp.ncbi.nih.gov/pub/geo/DATA/supplementary/samples/GSM296nnn/GSM296418/GSM296418%2Ecsv%2Egz>(http)
>>
>> <http://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?mode=raw&acc=GSM296418&db=GSM296418%2Ecsv%2Egz&is_ftp=true>
>> CSV
>> GSM296418.locs.gz 7.2 Mb (ftp)
>>
>> <ftp://ftp.ncbi.nih.gov/pub/geo/DATA/supplementary/samples/GSM296nnn/GSM296418/GSM296418%2Elocs%2Egz>(http)
>>
>> <http://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?mode=raw&acc=GSM296418&db=GSM296418%2Elocs%2Egz&is_ftp=true>
>> LOCS
>> GSM296418.tif.gz 51.7 Mb (ftp)
>>
>> <ftp://ftp.ncbi.nih.gov/pub/geo/DATA/supplementary/samples/GSM296nnn/GSM296418/GSM296418%2Etif%2Egz>(http)
>>
>> <http://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?mode=raw&acc=GSM296418&db=GSM296418%2Etif%2Egz&is_ftp=true>
>> TIFF
>> GSM296418.txt.gz 11.4 Mb (ftp)
>>
>> <ftp://ftp.ncbi.nih.gov/pub/geo/DATA/supplementary/samples/GSM296nnn/GSM296418/GSM296418%2Etxt%2Egz>(http)
>>
>> <http://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?mode=raw&acc=GSM296418&db=GSM296418%2Etxt%2Egz&is_ftp=true>
>> TXT
>> GSM296418.xml.gz 665 b (ftp)
>>
>> <ftp://ftp.ncbi.nih.gov/pub/geo/DATA/supplementary/samples/GSM296nnn/GSM296418/GSM296418%2Exml%2Egz>(http)
>>
>> <http://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?mode=raw&acc=GSM296418&db=GSM296418%2Exml%2Egz&is_ftp=true>
>> XML
>>
>> I am trying to find a way to reanalyse this, but is struggling to find a
>> appropriate way. I just want to know whether the genes in this array are
>> below or above the detection level threshold used in beadarray.
>> Has anybody got any advice about a way to analyse microarray data
>> deposited in GEO to get this kind of information?
>> many thanks
>> Nat
>
> I am not sure what you mean by the 'detection level threshold used in
> beadarray'. But you can use GEOquery as Sean suggested in a previous
> mail/thread.
> By plotting the density of the expression values of this array I would say
> that the large peak represents the 'non-expressed' genes or genes expressed
> a very low levels which could be similar to the detection level threshold
> you mention. By running the following code I get that roughly 30% of the 20K
> genes are below this threshold (5.36 on log2 scale from signals ranging from
> 0 to 16).
>
> require("GEOquery") || stop("Could not load package 'GEOquery'.")
> ## download single array GSM290549
> gsm <- getGEO("GSM290549")
>
> ## extract 'Illumina average value' signal data
> head(Table(gsm), n=3)
> ##      ID_REF     VALUE
> ##1 ILMN_10000  105.0698
> ##2 ILMN_10001  2355.704
> ##3 ILMN_10002 -9.846933
> x <- as.numeric(Table(gsm)[, 'VALUE'])
> range(x)
> ##[1]   -35.65039 53405.58000
>
> ## transform data according to authors in original study
> Meta(gsm)$data_processing
> ##[1] "Data were extracted with Illumina BeadStudio software using
> ##background subtraction and cubic spline normalization. Data were then
> ##adjusted by shifting the absolute minimum value for each array to be
> ##equal to 1; and then log2 transformed."
> y <- log2(x + abs(min(x)) + 1)
> range(y)
> ##[1]  0.00000 16.25923
>
> ## plot kernel density of signals
> yDens <- density(y)
> plot(yDens, main=Meta(gsm)$geo_accession)
> ## calculate the density peak value
> densPeak <- yDens$x[which.max(yDens$y)]
> ## draw it
> abline(v=densPeak, lwd=2, lty=2)
> densPeak
> ##[1] 5.367655
> 2^(densPeak)
> ##[1] 41.28812
> sum(y < densPeak)
> ##[1] 5821
> sum(y > densPeak)
> ##[1] 14768
>
> HTH.
> J.
>
> _______________________________________________
> Bioconductor mailing list
> Bioconductor at r-project.org
> https://stat.ethz.ch/mailman/listinfo/bioconductor
> Search the archives:
> http://news.gmane.org/gmane.science.biology.informatics.conductor
>



More information about the Bioconductor mailing list