[BioC] filtering probes in affymetrix data
Sabet, Julia A
julia.sabet at tufts.edu
Thu Feb 13 21:59:38 CET 2014
I get this:
> traceback()
3: input[ind, ]
2: input[ind, ]
1: getMainProbes(eset.filt)
>
-----Original Message-----
From: James W. MacDonald [mailto:jmacdon at uw.edu]
Sent: Thursday, February 13, 2014 3:57 PM
To: Sabet, Julia A
Cc: bioconductor at r-project.org
Subject: Re: [BioC] filtering probes in affymetrix data
What do you get when you run
traceback()
right after that error?
Jim
On 2/13/2014 3:51 PM, Sabet, Julia A wrote:
> Thank you both. Solved that problem. Now when I do the next line of code, I get another error message:
>
>> eset.filt <- getMainProbes(eset.filt)
> Error in orig[[nm]][i, , ..., drop = drop] :
> (subscript) logical subscript too long
>
>
>
> -----Original Message-----
> From: James W. MacDonald [mailto:jmacdon at uw.edu]
> Sent: Thursday, February 13, 2014 3:44 PM
> To: Sabet, Julia A
> Cc: bioconductor at r-project.org
> Subject: Re: [BioC] filtering probes in affymetrix data
>
> Hi Julia,
>
> On 2/13/2014 3:23 PM, Sabet, Julia A wrote:
>> Thank you Jim. I think my R version is up to date and I am making sure to use "library()". I started the whole thing over and now I have this new error message, at an earlier step:
>>
>> library(pd.mogene.2.0.st)
>>> con <- db(pd.mogene.2.0.st)
>>> dbGetQuery(con, "select * from type_dict;")
>> type type_id
>> 1 1 main
>> 2 2 control->affx
>> 3 3 control->chip
>> 4 4 control->bgp->antigenomic
>> 5 5 control->bgp->genomic
>> 6 6 normgene->exon
>> 7 7 normgene->intron
>> 8 8 rescue->FLmRNA->unmapped
>> 9 9 control->affx->bac_spike
>> 10 10 oligo_spike_in
>> 11 11 r1_bac_spike_at
>>> table(dbGetQuery(con, "select type from featureSet;")[,1])
>> 1 2 4 7 9
>> 263551 18 23 5331 18
>>> antigm <- dbGetQuery(con, "select meta_fsetid from core_mps inner
>>> join
>> + featureSet on core_mps.fsetid=featureSet.fsetid where
>> + featureSet.type='4';")
>>> bkg <- apply(exprs(eset)[as.character(antigm[,1]),], 2, quantile,
>> + probs=0.95)
>>> library(genefilter)
>>> minval <- max(bkg)
>>> ind <- genefilter(eset, filterfun(kOverA(5, minval))) eset.filt <-
> The above line has a bit extra at the end that R doesn't like.
>
>> Error: unexpected symbol in "ind <- genefilter(eset, filterfun(kOverA(5, minval))) eset.filt"
> And this is your hint. Error messages are your friends.
>
> Best,
>
> Jim
>
>
>>> ind <- genefilter(eset, filterfun(kOverA(12, minval))) eset.filt <-
>>> eset[ind,]
>> Error: unexpected symbol in "ind <- genefilter(eset, filterfun(kOverA(12, minval))) eset.filt"
>>> ind <- genefilter(eset, filterfun(kOverA(12, minval))) eset.filt <-
>> Error: unexpected symbol in "ind <- genefilter(eset, filterfun(kOverA(12, minval))) eset.filt"
>> Here is my sessionInfo() output:
>>
>> R version 3.0.2 (2013-09-25)
>> Platform: x86_64-w64-mingw32/x64 (64-bit)
>>
>> locale:
>> [1] LC_COLLATE=English_United States.1252 LC_CTYPE=English_United States.1252 LC_MONETARY=English_United States.1252
>> [4] LC_NUMERIC=C LC_TIME=English_United States.1252
>>
>> attached base packages:
>> [1] parallel stats graphics grDevices utils datasets methods base
>>
>> other attached packages:
>> [1] BiocInstaller_1.12.0 genefilter_1.44.0 mogene20sttranscriptcluster.db_2.13.0
>> [4] org.Mm.eg.db_2.10.1 AnnotationDbi_1.24.0 pd.mogene.2.0.st_2.12.0
>> [7] RSQLite_0.11.4 DBI_0.2-7 oligo_1.26.1
>> [10] Biostrings_2.30.1 XVector_0.2.0 IRanges_1.20.6
>> [13] Biobase_2.22.0 oligoClasses_1.24.0 BiocGenerics_0.8.0
>>
>> loaded via a namespace (and not attached):
>> [1] affxparser_1.34.0 affyio_1.30.0 annotate_1.40.0 bit_1.1-11 codetools_0.2-8 ff_2.2-12
>> [7] foreach_1.4.1 GenomicRanges_1.14.4 iterators_1.0.6 preprocessCore_1.24.0 splines_3.0.2 stats4_3.0.2
>> [13] survival_2.37-7 tools_3.0.2 XML_3.98-1.1 xtable_1.7-1 zlibbioc_1.8.0
>> I appreciate your help...
>> Julia
>>
>> -----Original Message-----
>> From: James W. MacDonald [mailto:jmacdon at uw.edu]
>> Sent: Thursday, February 13, 2014 12:08 PM
>> To: Sabet, Julia A
>> Cc: bioconductor at r-project.org
>> Subject: Re: [BioC] filtering probes in affymetrix data
>>
>> Hi Julia,
>>
>> You should always include the output from sessionInfo() with any questions, so we can see what versions you are running, and what you have loaded.
>>
>> My guess is you are using an old version of R, prior to the introduction of that function, or you forgot to do library(affycoretools).
>>
>> Best,
>>
>> Jim
>>
>> On Thursday, February 13, 2014 12:03:54 PM, Sabet, Julia A wrote:
>>> Thank you so much, Jim. I did everything you recommended and everything seemed to be working and then I installed the affycoretools package and when I did:
>>> eset.filt <- getMainProbes(eset.filt)
>>>
>>> This error resulted:
>>> Error: could not find function "getMainProbes"
>>>
>>> What should I do?
>>> Thanks!
>>> Julia
>>>
>>>
>>> -----Original Message-----
>>> From: James W. MacDonald [mailto:jmacdon at uw.edu]
>>> Sent: Thursday, February 13, 2014 9:36 AM
>>> To: Sabet, Julia A
>>> Cc: bioconductor at r-project.org
>>> Subject: Re: [BioC] filtering probes in affymetrix data
>>>
>>> Hi Julia,
>>>
>>> There are several different things you can do. I'll show you one possibility.
>>>
>>> First, note that there are multiple different control probes on this
>>> array that aren't intended to measure differential expression, and
>>> should be excluded. So first let's look at the possible types of
>>> probesets:
>>>
>>>> library(pd.mogene.2.0.st)
>>>> con <- db(pd.mogene.2.0.st)
>>>> dbGetQuery(con, "select * from type_dict;")
>>> type type_id
>>> 1 1 main
>>> 2 2 control->affx
>>> 3 3 control->chip
>>> 4 4 control->bgp->antigenomic
>>> 5 5 control->bgp->genomic
>>> 6 6 normgene->exon
>>> 7 7 normgene->intron
>>> 8 8 rescue->FLmRNA->unmapped
>>> 9 9 control->affx->bac_spike
>>> 10 10 oligo_spike_in
>>> 11 11 r1_bac_spike_at
>>>
>>> These are all the possible types of probesets, but we don't have all of them on this array. To see which ones we do have we can do:
>>>
>>>
>>>> table(dbGetQuery(con, "select type from featureSet;")[,1])
>>> 1 2 4 7 9
>>> 263551 18 23 5331 18
>>>
>>> So we only have these probeset types:
>>>
>>> 1 1 main
>>> 2 2 control->affx
>>> 4 4 control->bgp->antigenomic
>>> 7 7 normgene->intron
>>> 9 9 control->affx->bac_spike
>>>
>>> And the 'main' probesets are those that we want to use for
>>> differential expression. Now one thing you could do is to say that
>>> the antigenomic probesets should give a good measure of background,
>>> as they are supposed to have sequences that don't exist in mice. So
>>> you could just extract those probesets, get some measure and use
>>> that as the lower limit of what you think is expressed or not.
>>> That's pretty naive, as a probe with higher GC content will have
>>> higher background than one with a lower GC content, but worrying
>>> about that is way beyond what I am prepared to go into.
>>>
>>> Now we can get the probeset IDs for the antigenomic probesets
>>>
>>> antigm <- dbGetQuery(con, "select meta_fsetid from core_mps inner
>>> join featureSet on core_mps.fsetid=featureSet.fsetid where
>>> featureSet.type='4';")
>>>
>>> And then extract those probesets and get a summary statistic.
>>>
>>> bkg <- apply(exprs(eset)[as.character(antigm[,1]),], 2, quantile,
>>> probs=0.95)
>>>
>>> Which will give us the 95th percentile of these background probes.
>>> You could then use the kOverA function in genefilter to filter out
>>> any probesets where all samples are below the background values. The
>>> idea being that you want to filter out any probesets unless k
>>> samples have expression levels >= A. So if you have 10 samples,
>>> where 5 are controls and 5 are treated, you would do something like
>>>
>>> minval <- max(bkg)
>>> ind <- genefilter(eset, filterfun(kOverA(5, minval))) eset.filt <-
>>> eset[ind,]
>>>
>>> You should also filter out all the non-main probesets. You can do
>>> that using getMainProbes() in the affycoretools package
>>>
>>> eset.filt <- getMainProbes(eset.filt)
>>>
>>> Best,
>>>
>>> Jim
>>>
>>>
>>>
>>>
>>> On Wednesday, February 12, 2014 10:16:31 PM, Sabet, Julia A wrote:
>>>> Hello all,
>>>> I am totally new to R/Bioconductor and have begun processing data from my Affymetrix Mouse Gene 2.0 ST arrays. I normalized the data like this:
>>>>
>>>> library(pd.mogene.2.0.st)
>>>> eset <- rma(affyRaw)
>>>>
>>>> and added gene annotation and I am following the limma user's
>>>> guide, which recommends removing "probes that appear not be expressed in any of the experimental conditions." I have read on previous posts that filtering may not be necessary. Should I filter, and if so, how? Using what code?
>>>>
>>>> Thank you!
>>>> Julia Sabet
>>>>
>>>> [[alternative HTML version deleted]]
>>>>
>>>> _______________________________________________
>>>> Bioconductor mailing list
>>>> Bioconductor at r-project.org
>>>> https://stat.ethz.ch/mailman/listinfo/bioconductor
>>>> Search the archives:
>>>> http://news.gmane.org/gmane.science.biology.informatics.conductor
>>> --
>>> James W. MacDonald, M.S.
>>> Biostatistician
>>> University of Washington
>>> Environmental and Occupational Health Sciences
>>> 4225 Roosevelt Way NE, # 100
>>> Seattle WA 98105-6099
>> --
>> James W. MacDonald, M.S.
>> Biostatistician
>> University of Washington
>> Environmental and Occupational Health Sciences
>> 4225 Roosevelt Way NE, # 100
>> Seattle WA 98105-6099
> --
> James W. MacDonald, M.S.
> Biostatistician
> University of Washington
> Environmental and Occupational Health Sciences
> 4225 Roosevelt Way NE, # 100
> Seattle WA 98105-6099
>
--
James W. MacDonald, M.S.
Biostatistician
University of Washington
Environmental and Occupational Health Sciences
4225 Roosevelt Way NE, # 100
Seattle WA 98105-6099
More information about the Bioconductor
mailing list