[BioC] Creating a new instance of oligoSnpSet

Martin Morgan mtmorgan at fhcrc.org
Wed Nov 26 22:22:44 CET 2008


Hi Steven --

Steven McKinney wrote:
> Hi all,
> 
> Thanks to Robert Scharpf for a quick and detailed
> off-line response.  For anyone else that may encounter
> this issue:  my problem was that my featureData object's
> 'data' slot data frame did not have names "chromosome" 
> and "position" .
> 
> I originally defined my featureData object as
> 
>> cclfd <-
> +   new("AnnotatedDataFrame",
> +       data = data.frame(position = pData(featureData(ccld)[, "MapInfo"]),
> +         chromosome = pData(featureData(ccld)[, "CHR"]),
> +         stringsAsFactors = FALSE),
> +       varMetadata = data.frame(labelDescription = c("position", "chromosome")))
> 
> extracting directly from my ccld object (a SnpSetIllumina object
> from beadarraySNP command read.SnpSetIllumina()
>  ccld <- read.SnpSetIllumina(samplesheet = "ccl_CNV370SampleSheet_8samples.csv",
>                              reportfile = "ccl_FinalReport_2.txt")
> )
> 
> 
> This yielded an AnnotatedDataFrame object with slot 'data'
> containing a data frame whose names were not those I had
> put in the data.frame() code above (namely "position"
> and "chromosome").
> 
>> str(cclfd)
> Formal class 'AnnotatedDataFrame' [package "Biobase"] with 4 slots
>   ..@ varMetadata      :'data.frame':	2 obs. of  1 variable:
>   .. ..$ labelDescription: chr [1:2] "position" "chromosome"
>   ..@ data             :'data.frame':	373397 obs. of  2 variables:
>   .. ..$ MapInfo: num [1:373397] 1.64e+08 1.66e+08 1.66e+08 1.66e+08 1.67e+08 ...
>   .. ..$ CHR    : Factor w/ 25 levels "1","10","11",..: 18 18 18 18 18 18 18 18 18 18 ...
>   .. .. ..- attr(*, "names")= chr [1:373397] "cnvi0000001" "cnvi0000002" "cnvi0000003" "cnvi0000004" ...
>   ..@ dimLabels        : chr [1:2] "rowNames" "columnNames"
>   ..@ .__classVersion__:Formal class 'Versions' [package "Biobase"] with 1 slots
>   .. .. ..@ .Data:List of 1
>   .. .. .. ..$ : int [1:3] 1 1 0
> 
> So that's my R lesson for today - names specified in a
> data.frame() call don't necessarily stick!

Hmm, I'm not sure that's the right lesson -- you don't have to be that 
suspicious of data.frame.

It might be AnnotatedDataFrame or oligoSnpSet, though. I wonder what 
your sessionInfo() is? Also what does str(featureData(ccld)) say? An 
unusual thing is the 'names' attribute of cclfd. Any chance of creating 
a reproducible example (i.e., without access to your files, maybe by 
referencing help pages [using the 'example()' function] or making a 
version with just a few features and using dput)?

A couple of short-cuts / tips. fData(obj) gives you direct access to 
pData(featureData(obj)). 'extract-then-subset' fData(obj))[,"cols"] -- 
will usually be more efficient that subset then extract; there's also a 
subtle difference that might be causing problems here (as you do it, you 
end up with a 1-column data frame for 'chromosome', whereas 
extract-then-subset results in a vector). '[[' pulls out a single column 
with featureData(obj)[["cols"]] (also [[<- can be useful for defining a 
single column and creating a labelDescription; obj[["cols"]] gives 
direct access to pData(obj)[["cols"]]).

Martin

> Explicitly forcing column names and
> mode "character" for the chromosome column
> solves the problem
> 
>  ccld.position <- pData(featureData(ccld)[, "MapInfo"])
>  names(ccld.position) <- "position"
>  ccld.chromosome <- pData(featureData(ccld)[, "CHR"])
>  names(ccld.chromosome) <- "chromosome"
>  ccld.chromosome$chromosome <- as.character(ccld.chromosome$chromosome)
> 
>  cclfd <-
>    new("AnnotatedDataFrame",
>        data = data.frame(position = ccld.position,
>          chromosome = ccld.chromosome,
>          stringsAsFactors = FALSE),
>        varMetadata = data.frame(labelDescription = c("position", "chromosome")))
>  
> and I can create the oligoSnpSet object successfully.
> 
>> cclss <-
> +   new("oligoSnpSet", copyNumber = logR, calls = gt,
> +       phenoData = annotatedDataFrameFrom(logR, byrow = FALSE),
> +       featureData = cclfd, annotation = "HumanCNV370-Quad")
>> str(cclss)
> Formal class 'oligoSnpSet' [package "oligoClasses"] with 6 slots
> 
> 
> So it was the absence of columns named "chromosome" and "position"
> in the 'data' slot of the featureData object that caused internal 
> code to attempt to acquire chromosome positional information from 
> an annotation source.
> 
> With the featureData at data data frame having the correct column
> labels "chromosome" and "position", the annotation argument
> is not processed further (it is just added to the SnpSet
> object's 'annotation' slot).
> 
> Thanks again to Robert Scharpf.
> 
> Best
> 
> Steve McKinney
> 
> 
> 
> 
> -----Original Message-----
> From: bioconductor-bounces at stat.math.ethz.ch on behalf of Steven McKinney
> Sent: Tue 11/25/2008 9:56 PM
> To: Bioconductor at stat.math.ethz.ch
> Subject: [BioC] Creating a new instance of oligoSnpSet
>  
> Hello All,
> 
> I am trying to get some Illumina HumanCNV370-Quad
> data into VanillaICE to do some copy number analysis.
> 
> In attempting to create an object of class "oligoSnpSet"
> I can not seem to specify an annotation that works.
> 
> e.g. as specified in a vignette
> 
>> cclss <-
> +   new("oligoSnpSet", copyNumber = logR, calls = gt,
> +       phenoData = annotatedDataFrameFrom(logR, byrow = FALSE),
> +       featureData = cclfd, annotation = "Illumina550k")
> Loading required package: Illumina550k
> Error in db(object) : Illumina550k package not available
> In addition: Warning message:
> In library(package, lib.loc = lib.loc, character.only = TRUE, logical.return = TRUE,  :
>   there is no package called 'Illumina550k'
> Error in dbGetQuery(db(object), sql) : 
>   error in evaluating the argument 'conn' in selecting a method for function 'dbGetQuery'
> 
> or even if I specify some annotation that does exist
> 
>> cclss <-
> +   new("oligoSnpSet", copyNumber = logR, calls = gt,
> +       phenoData = annotatedDataFrameFrom(logR, byrow = FALSE),
> +       featureData = cclfd, annotation = "hgu133plus2cdf")
> Loading required package: hgu133plus2cdf
> Error in db(object) : 
>   trying to get slot "getdb" from an object of a basic class ("environment") with no slots
> Error in dbGetQuery(db(object), sql) : 
>   error in evaluating the argument 'conn' in selecting a method for function 'dbGetQuery'
> 
> 
> Is there a way to work around this annotation bit of building
> an eSet object? 
> 
> I can't figure out from documentation, reading source code, or
> experimenting, as to what will work for this annotation argument.
> 
> I'm a bit hooped as there does not yet appear to be annotation
> for the Illumina HumanCNV370-Quad, but I have annotation
> information from other files from Illumina etc.
> 
> Can I put some dummy object as an argument for annotation
> and patch it up with my known info?
> 
> Any ideas?
> 
> 
> Steven McKinney
> 
> Statistician
> Molecular Oncology and Breast Cancer Program
> British Columbia Cancer Research Centre
> 
> email: smckinney +at+ bccrc +dot+ ca
> 
> tel: 604-675-8000 x7561
> 
> BCCRC
> Molecular Oncology
> 675 West 10th Ave, Floor 4
> Vancouver B.C. 
> V5Z 1L3
> Canada
> 
> _______________________________________________
> Bioconductor mailing list
> Bioconductor at stat.math.ethz.ch
> https://stat.ethz.ch/mailman/listinfo/bioconductor
> Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor
> 
> _______________________________________________
> Bioconductor mailing list
> Bioconductor at stat.math.ethz.ch
> https://stat.ethz.ch/mailman/listinfo/bioconductor
> Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor


-- 
Martin Morgan
Computational Biology / Fred Hutchinson Cancer Research Center
1100 Fairview Ave. N.
PO Box 19024 Seattle, WA 98109

Location: Arnold Building M2 B169
Phone: (206) 667-2793



More information about the Bioconductor mailing list