[BioC] Problems using text to subset array information from an expression set

Wed Apr 5 06:42:00 CEST 2006

On Apr 4, 2006, at 2:33 PM, Jeff Lande wrote:

> Kasper,
>
> On traceback(), I just get
>
>> traceback()
> 1: newdata["1007_s_at", ]
>
> One thing that I noticed when trying to compare expression sets  
> that I was
> able to use text for subscripting and those that I was not was that  
> the
> se.exprs was a <0 x 0 matrix> for the former and NA for the latter.

^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

This is the crucial part. I have looked into the issue and the  
culprit is indeed the rma function. The output object from that  
function (in your case alldata) has a matrix filled with NA with the  
same dimensions as the exprs slot. In the code for subsetting an  
exprSet it checks whether nrows(se.exprs) > 0 (which is true for the  
alldata object but not for <0 x 0 matrix>) and if true proceeds to  
subset on that. That subsetting fails, because only the expr slot of  
the output from rma has the relevant rownames.

I would suggest either adding the relevant rownames to the giant  
se.exprs slot (which would take up some space) or simply set the  
se.exprs slot to be <0 x 0 matrix>, which is clearly allowable  
according to the documentation for exprSet.

I have cc:ed the package maintainer for affy, he will hopefully make  
the necessary changes (short summary: the output from the rma  
function is not sub-settable by probeid)

Jeff: a cleaner way to solve your problem (now and until it is fixed  
in the codebase) is
   dimnames(se.exprs(alldata)) = dimnames(exprs(alldata))

/Kasper

> Also, there is a UNIX version that I use for processing large data  
> sets and
> a PC version that I use for less memory intensive work (I have  
> control of
> updating packages, etc with the PC version but I don't have  
> administrative
> rights on the UNIX version).  I used a workaround to subset by  
> arrays.  When
> I assigned a new phenoData object to the subset (within the PC  
> version), I
> was able to use text subscripting on the resulting expression set  
> (code
> below).
>
>> atsarrays <- c("AA1.CEL, ..., "AA132.CEL")
>> atsmatch <- sampleNames(alldata) %in% atsarrays
>> atsdata <- alldata[,atsmatch]
>> pd <- read.phenoData("ATS_phenodata.TXT")
>> phenoData(atsdata) <- pd
>> atsdata <- new('exprSet', exprs=exprs(atsdata), phenoData = pd)
>>
>
> I'm still confused why I was having trouble using text to  
> subscript, but I
> seem to be able to continue on with analysis now.
>
> Thanks,
>
> Jeff
>
> Jeff: you are using a very old version of Biobase (1.5.12). If I use
> a current version (1.8.0) I can subset exprSet's in the way you want
> (tested by running the example for exprSet and then subsetting using
>   eset["31738_at",]
> )
>
> It might also (instead of just being an old version) be because of
> the way the exprSet is constructed using rma. Could you do the  
> following
>    1) Do a traceback() after the error
>    2) test what the rownames/colnames are of
> exprs(Data), se.exprs(Data)
> I assume that se.exprs(Data) is a <0 x 0 matrix>.
>
> /Kasper
>
> On Apr 4, 2006, at 11:31 AM, Benilton Carvalho wrote:
>
>> isn't
>>
>>   exprs(alldata)["1007_s_at",]
>>   exprs(alldata)[, "AA100.CEL"]
>>
>> what you want?
>>
>> b
>>
>> On Tue, 4 Apr 2006, Jeff Lande wrote:
>>
>>> I have an odd problem that I cannot seem to figure out.
>>>
>>> I have a set of CEL files in a directory, which I read using the
>>> ReadAffy()
>>> command.  Then I run the rma command to preprocess.
>>>
>>>> Data <- ReadAffy()
>>>> alldata <- rma(Data)
>>>
>>> I've done this many times before without problems.  However, when
>>> I try to
>>> use text instead of numbers for subscripting, I get an error.
>>>
>>> For example, I am able to access data from the first row and
>>> column using
>>> numeric subscripts
>>>
>>>> alldata[1,1]
>>> Expression Set (exprSet) with
>>>        1 genes
>>>        1 samples
>>>                 phenoData object with 1 variables and 1 cases
>>>         varLabels
>>>                sample: arbitrary numbering
>>>
>>> but using text for either subscript, I get an error.
>>>
>>>> alldata["1007_s_at",]
>>> Error in alldata["1007_s_at", ] : no 'dimnames' attribute for array
>>>> alldata[,"AA100.CEL"]
>>> Error in alldata[, "AA100.CEL"] : no 'dimnames' attribute for array
>>>
>>> I actually went through what I think was the same process last
>>> week (and
>>> many times previously) and had no problems, so I'm stumped.
>>>
>>> Here is my session information:
>>>
>>>> sessionInfo()
>>> R version 2.1.0, 2005-04-18, ia64-unknown-linux-gnu
>>>
>>> attached base packages:
>>> [1] "tools"     "methods"   "stats"     "graphics"  "grDevices"
>>> "utils"
>>> [7] "datasets"  "base"
>>>
>>> other attached packages:
>>> hgu133acdf       affy reposTools    Biobase
>>>   "1.4.3"    "1.6.7"   "1.5.19"   "1.5.12"
>>>
>>> I must be missing something obvious, but I just can't figure out
>>> what is
>>> going wrong.  Does anyone have insight into this problem?
>>>
>>> Jeff Lande
>>> Post-Doctoral Associate
>>> University of Minnesota
>>>
>>> _______________________________________________
>>> Bioconductor mailing list
>>> Bioconductor at stat.math.ethz.ch
>>> https://stat.ethz.ch/mailman/listinfo/bioconductor
>>> Search the archives: http://news.gmane.org/
>>> gmane.science.biology.informatics.conductor
>>>
>>
>> _______________________________________________
>> Bioconductor mailing list
>> Bioconductor at stat.math.ethz.ch
>> https://stat.ethz.ch/mailman/listinfo/bioconductor
>> Search the archives: http://news.gmane.org/
>> gmane.science.biology.informatics.conductor
>
> _______________________________________________
> Bioconductor mailing list
> Bioconductor at stat.math.ethz.ch
> https://stat.ethz.ch/mailman/listinfo/bioconductor
> Search the archives: http://news.gmane.org/ 
> gmane.science.biology.informatics.conductor