[BioC] Problems using text to subset array information from an expression set
Kasper Daniel Hansen
khansen at stat.Berkeley.EDU
Wed Apr 5 06:42:00 CEST 2006
On Apr 4, 2006, at 2:33 PM, Jeff Lande wrote:
> Kasper,
>
> On traceback(), I just get
>
>> traceback()
> 1: newdata["1007_s_at", ]
>
> One thing that I noticed when trying to compare expression sets
> that I was
> able to use text for subscripting and those that I was not was that
> the
> se.exprs was a <0 x 0 matrix> for the former and NA for the latter.
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
This is the crucial part. I have looked into the issue and the
culprit is indeed the rma function. The output object from that
function (in your case alldata) has a matrix filled with NA with the
same dimensions as the exprs slot. In the code for subsetting an
exprSet it checks whether nrows(se.exprs) > 0 (which is true for the
alldata object but not for <0 x 0 matrix>) and if true proceeds to
subset on that. That subsetting fails, because only the expr slot of
the output from rma has the relevant rownames.
I would suggest either adding the relevant rownames to the giant
se.exprs slot (which would take up some space) or simply set the
se.exprs slot to be <0 x 0 matrix>, which is clearly allowable
according to the documentation for exprSet.
I have cc:ed the package maintainer for affy, he will hopefully make
the necessary changes (short summary: the output from the rma
function is not sub-settable by probeid)
Jeff: a cleaner way to solve your problem (now and until it is fixed
in the codebase) is
dimnames(se.exprs(alldata)) = dimnames(exprs(alldata))
/Kasper
> Also, there is a UNIX version that I use for processing large data
> sets and
> a PC version that I use for less memory intensive work (I have
> control of
> updating packages, etc with the PC version but I don't have
> administrative
> rights on the UNIX version). I used a workaround to subset by
> arrays. When
> I assigned a new phenoData object to the subset (within the PC
> version), I
> was able to use text subscripting on the resulting expression set
> (code
> below).
>
>> atsarrays <- c("AA1.CEL, ..., "AA132.CEL")
>> atsmatch <- sampleNames(alldata) %in% atsarrays
>> atsdata <- alldata[,atsmatch]
>> pd <- read.phenoData("ATS_phenodata.TXT")
>> phenoData(atsdata) <- pd
>> atsdata <- new('exprSet', exprs=exprs(atsdata), phenoData = pd)
>>
>
> I'm still confused why I was having trouble using text to
> subscript, but I
> seem to be able to continue on with analysis now.
>
> Thanks,
>
> Jeff
>
> Jeff: you are using a very old version of Biobase (1.5.12). If I use
> a current version (1.8.0) I can subset exprSet's in the way you want
> (tested by running the example for exprSet and then subsetting using
> eset["31738_at",]
> )
>
> It might also (instead of just being an old version) be because of
> the way the exprSet is constructed using rma. Could you do the
> following
> 1) Do a traceback() after the error
> 2) test what the rownames/colnames are of
> exprs(Data), se.exprs(Data)
> I assume that se.exprs(Data) is a <0 x 0 matrix>.
>
> /Kasper
>
> On Apr 4, 2006, at 11:31 AM, Benilton Carvalho wrote:
>
>> isn't
>>
>> exprs(alldata)["1007_s_at",]
>> exprs(alldata)[, "AA100.CEL"]
>>
>> what you want?
>>
>> b
>>
>> On Tue, 4 Apr 2006, Jeff Lande wrote:
>>
>>> I have an odd problem that I cannot seem to figure out.
>>>
>>> I have a set of CEL files in a directory, which I read using the
>>> ReadAffy()
>>> command. Then I run the rma command to preprocess.
>>>
>>>> Data <- ReadAffy()
>>>> alldata <- rma(Data)
>>>
>>> I've done this many times before without problems. However, when
>>> I try to
>>> use text instead of numbers for subscripting, I get an error.
>>>
>>> For example, I am able to access data from the first row and
>>> column using
>>> numeric subscripts
>>>
>>>> alldata[1,1]
>>> Expression Set (exprSet) with
>>> 1 genes
>>> 1 samples
>>> phenoData object with 1 variables and 1 cases
>>> varLabels
>>> sample: arbitrary numbering
>>>
>>> but using text for either subscript, I get an error.
>>>
>>>> alldata["1007_s_at",]
>>> Error in alldata["1007_s_at", ] : no 'dimnames' attribute for array
>>>> alldata[,"AA100.CEL"]
>>> Error in alldata[, "AA100.CEL"] : no 'dimnames' attribute for array
>>>
>>> I actually went through what I think was the same process last
>>> week (and
>>> many times previously) and had no problems, so I'm stumped.
>>>
>>> Here is my session information:
>>>
>>>> sessionInfo()
>>> R version 2.1.0, 2005-04-18, ia64-unknown-linux-gnu
>>>
>>> attached base packages:
>>> [1] "tools" "methods" "stats" "graphics" "grDevices"
>>> "utils"
>>> [7] "datasets" "base"
>>>
>>> other attached packages:
>>> hgu133acdf affy reposTools Biobase
>>> "1.4.3" "1.6.7" "1.5.19" "1.5.12"
>>>
>>> I must be missing something obvious, but I just can't figure out
>>> what is
>>> going wrong. Does anyone have insight into this problem?
>>>
>>> Jeff Lande
>>> Post-Doctoral Associate
>>> University of Minnesota
>>>
>>> _______________________________________________
>>> Bioconductor mailing list
>>> Bioconductor at stat.math.ethz.ch
>>> https://stat.ethz.ch/mailman/listinfo/bioconductor
>>> Search the archives: http://news.gmane.org/
>>> gmane.science.biology.informatics.conductor
>>>
>>
>> _______________________________________________
>> Bioconductor mailing list
>> Bioconductor at stat.math.ethz.ch
>> https://stat.ethz.ch/mailman/listinfo/bioconductor
>> Search the archives: http://news.gmane.org/
>> gmane.science.biology.informatics.conductor
>
> _______________________________________________
> Bioconductor mailing list
> Bioconductor at stat.math.ethz.ch
> https://stat.ethz.ch/mailman/listinfo/bioconductor
> Search the archives: http://news.gmane.org/
> gmane.science.biology.informatics.conductor
More information about the Bioconductor
mailing list