[R] FW: Selecting undefined column of a data frame (was [BioC] read.phenoData vs read.AnnotatedDataFrame)
Steven McKinney
smckinney at bccrc.ca
Fri Aug 3 19:37:49 CEST 2007
Hi all,
What are current methods people use in R to identify
mis-spelled column names when selecting columns
from a data frame?
Alice Johnson recently tackled this issue
(see [BioC] posting below).
Due to a mis-spelled column name ("FileName"
instead of "Filename") which produced no warning,
Alice spent a fair amount of time tracking down
this bug. With my fumbling fingers I'll be tracking
down such a bug soon too.
Is there any options() setting, or debug technique
that will flag data frame column extractions that
reference a non-existent column? It seems to me
that the "[.data.frame" extractor used to throw an
error if given a mis-spelled variable name, and I
still see lines of code in "[.data.frame" such as
if (any(is.na(cols)))
stop("undefined columns selected")
In R 2.5.1 a NULL is silently returned.
> foo <- data.frame(Filename = c("a", "b"))
> foo[, "FileName"]
NULL
Has something changed so that the code lines
if (any(is.na(cols)))
stop("undefined columns selected")
in "[.data.frame" no longer work properly (if
I am understanding the intention properly)?
If not, could "[.data.frame" check an
options() variable setting (say
warn.undefined.colnames) and throw a warning
if a non-existent column name is referenced?
> sessionInfo()
R version 2.5.1 (2007-06-27)
powerpc-apple-darwin8.9.1
locale:
en_CA.UTF-8/en_CA.UTF-8/en_CA.UTF-8/C/en_CA.UTF-8/en_CA.UTF-8
attached base packages:
[1] "stats" "graphics" "grDevices" "utils" "datasets" "methods" "base"
other attached packages:
plotrix lme4 Matrix lattice
"2.2-3" "0.99875-4" "0.999375-0" "0.16-2"
>
Steven McKinney
Statistician
Molecular Oncology and Breast Cancer Program
British Columbia Cancer Research Centre
email: smckinney +at+ bccrc +dot+ ca
tel: 604-675-8000 x7561
BCCRC
Molecular Oncology
675 West 10th Ave, Floor 4
Vancouver B.C.
V5Z 1L3
Canada
-----Original Message-----
From: bioconductor-bounces at stat.math.ethz.ch on behalf of Johnstone, Alice
Sent: Wed 8/1/2007 7:20 PM
To: bioconductor at stat.math.ethz.ch
Subject: Re: [BioC] read.phenoData vs read.AnnotatedDataFrame
For interest sake, I have found out why I wasn't getting my expected
results when using read.AnnotatedDataFrame
Turns out the error was made in the ReadAffy command, where I specified
the filenames to be read from my AnnotatedDataFrame object. There was a
typo error with a capital N ($FileName) rather than lowercase n
($Filename) as in my target file..whoops. However this meant the
filename argument was ignored without the error message(!) and instead
of using the information in the AnnotatedDataFrame object (which
included filenames, but not alphabetically) it read the .cel files in
alphabetical order from the working directory - hence the wrong file was
given the wrong label (given by the order of Annotated object) and my
comparisons were confused without being obvious as to why or where.
Our solution: specify that filename is as.character so assignment of
file to target is correct(after correcting $Filename) now that using
read.AnnotatedDataFrame rather than readphenoData.
Data<-ReadAffy(filenames=as.character(pData(pd)$Filename),phenoData=pd)
Hurrah!
It may be beneficial to others, that if the filename argument isn't
specified, that filenames are read from the phenoData object if included
here.
Thanks!
-----Original Message-----
From: Martin Morgan [mailto:mtmorgan at fhcrc.org]
Sent: Thursday, 26 July 2007 11:49 a.m.
To: Johnstone, Alice
Cc: bioconductor at stat.math.ethz.ch
Subject: Re: [BioC] read.phenoData vs read.AnnotatedDataFrame
Hi Alice --
"Johnstone, Alice" <Alice.Johnstone at esr.cri.nz> writes:
> Using R2.5.0 and Bioconductor I have been following code to analysis
> Affymetrix expression data: 2 treatments vs control. The original
> code was run last year and used the read.phenoData command, however
> with the newer version I get the error message Warning messages:
> read.phenoData is deprecated, use read.AnnotatedDataFrame instead The
> phenoData class is deprecated, use AnnotatedDataFrame (with
> ExpressionSet) instead
>
> I use the read.AnnotatedDataFrame command, but when it comes to the
> end of the analysis the comparison of the treatment to the controls
> gets mixed up compared to what you get using the original
> read.phenoData ie it looks like the 3 groups get labelled wrong and so
> the comparisons are different (but they can still be matched up).
> My questions are,
> 1) do you need to set up your target file differently when using
> read.AnnotatedDataFrame - what is the standard format?
I can't quite tell where things are going wrong for you, so it would
help if you can narrow down where the problem occurs. I think
read.AnnotatedDataFrame should be comparable to read.phenoData. Does
> pData(pd)
look right? What about
> pData(Data)
and
> pData(eset.rma)
? It's not important but pData(pd)$Target is the same as pd$Target.
Since the analysis is on eset.rma, it probably makes sense to use the
pData from there to construct your design matrix
> targs<-factor(eset.rma$Target)
> design<-model.matrix(~0+targs)
> colnames(design)<-levels(targs)
Does design look right?
> I have three columns sample, filename and target.
> 2) do you need to use a different model matrix to what I have?
> 3) do you use a different command for making the contrasts?
Depends on the question! If you're performing the same analysis as last
year, then the model matrix and contrasts have to be the same!
> I have included my code below if that is of any assistance.
> Many Thanks!
> Alice
>
>
>
> ##Read data
> pd<-read.AnnotatedDataFrame("targets.txt",header=T,row.name="sample")
> Data<-ReadAffy(filenames=pData(pd)$FileName,phenoData=pd)
> ##normalisation
> eset.rma<-rma(Data)
> ##analysis
> targs<-factor(pData(pd)$Target)
> design<-model.matrix(~0+targs)
> colnames(design)<-levels(targs)
> fit<-lmFit(eset.rma,design)
> cont.wt<-makeContrasts("treatment1-control","treatment2-control",level
> s=
> design)
> fit2<-contrasts.fit(fit,cont.wt)
> fit2.eb<-eBayes(fit2)
> testconts<-classifyTestsF(fit2.eb,p.value=0.01)
> topTable(fit2.eb,coef=2,n=300)
> topTable(fit2.eb,coef=1,n=300)
>
>
> [[alternative HTML version deleted]]
>
> _______________________________________________
> Bioconductor mailing list
> Bioconductor at stat.math.ethz.ch
> https://stat.ethz.ch/mailman/listinfo/bioconductor
> Search the archives:
> http://news.gmane.org/gmane.science.biology.informatics.conductor
--
Martin Morgan
Bioconductor / Computational Biology
http://bioconductor.org
_______________________________________________
Bioconductor mailing list
Bioconductor at stat.math.ethz.ch
https://stat.ethz.ch/mailman/listinfo/bioconductor
Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor
More information about the R-help
mailing list