[R] FW: Selecting undefined column of a data frame (was [BioC] read.phenoData vs read.AnnotatedDataFrame)
Steven McKinney
smckinney at bccrc.ca
Fri Aug 3 23:50:05 CEST 2007
> What would break is that three methods for doing the same thing would
> give different answers.
>
> Please do have the courtesy to actually read the detailed explanation you
> are given.
Sorry Prof. Ripley, I am attempting to read carefully, as this
issue has deeper coding/debugging implications, and as you
point out,
"[.data.frame is one of the most complex functions in R"
so please bear with me. This change in behaviour has
taken away a side-effect debugging tool, discussed below.
>
>
> On Fri, 3 Aug 2007, Steven McKinney wrote:
>
> >
> >> -----Original Message-----
> >> From: Prof Brian Ripley [mailto:ripley at stats.ox.ac.uk]
> >> Sent: Fri 8/3/2007 1:05 PM
> >> To: Steven McKinney
> >> Cc: r-help at stat.math.ethz.ch
> >> Subject: Re: [R] FW: Selecting undefined column of a data frame (was [BioC] read.phenoData vs read.AnnotatedDataFrame)
> >>
> >> I've since seen your followup a more detailed explanation may help.
> >> The path through the code for your argument list does not go where you
> >> quoted, and there is a reason for it.
> >
> >
> >> Generally when you extract in R and ask for an non-existent index you get
> >> NA or NULL as the result (and no warning), e.g.
> >>
> >>> y <- list(x=1, y=2)
> >>> y[["z"]]
> >> NULL
> >>
> >> Because data frames 'must' have (column) names, they are a partial
> >> exception and when the result is a data frame you get an error if it would
> >> contain undefined columns.
> >>
> >> But in the case of foo[, "FileName"], the result is a single column and so
> >> will not have a name: there seems no reason to be different from
> >>
> >>> foo[["FileName"]]
> >> NULL
> >>> foo$FileName
> >> NULL
> >>
> >> which similarly select a single column. At one time they were different
> >> in R, for no documented reason.
This difference provided a side-effect debugging tool, in that where
> bar <- foo[, "FileName"]
used to throw an error, alerting as to a typo, it now does not.
Having been burned by NULL results due to typos in code lines using
the $ extractor such as
> bar <- foo$FileName
I learned to use
> bar <- foo[, "FileName"]
to help cut down on typo bugs. With the ubiquity of
camelCase object names, this is a constant typing bug hazard.
I am wondering what to do now to double check spelling
when accessing columns of a dataframe.
If "[.data.frame" stays as is, can a debug mechanism
be implemented in R that forces strict adherence
to existing list names in debug mode? This would also help debug
typos in camelCase names when using the $ and [[
extractors and accessors.
Are there other debugging tools already in R that
can help point out such camelCase list element
name typos?
> >>
> >>
> >> On Fri, 3 Aug 2007, Prof Brian Ripley wrote:
> >>
> >>> You are reading the wrong part of the code for your argument list:
> >>>
> >>>> foo["FileName"]
> >>> Error in `[.data.frame`(foo, "FileName") : undefined columns selected
> >>>
> >>> [.data.frame is one of the most complex functions in R, and does many
> >>> different things depending on which arguments are supplied.
> >>>
> >>>
> >>> On Fri, 3 Aug 2007, Steven McKinney wrote:
> >>>
> >>>> Hi all,
> >>>>
> >>>> What are current methods people use in R to identify
> >>>> mis-spelled column names when selecting columns
> >>>> from a data frame?
> >>>>
> >>>> Alice Johnson recently tackled this issue
> >>>> (see [BioC] posting below).
> >>>>
> >>>> Due to a mis-spelled column name ("FileName"
> >>>> instead of "Filename") which produced no warning,
> >>>> Alice spent a fair amount of time tracking down
> >>>> this bug. With my fumbling fingers I'll be tracking
> >>>> down such a bug soon too.
> >>>>
> >>>> Is there any options() setting, or debug technique
> >>>> that will flag data frame column extractions that
> >>>> reference a non-existent column? It seems to me
> >>>> that the "[.data.frame" extractor used to throw an
> >>>> error if given a mis-spelled variable name, and I
> >>>> still see lines of code in "[.data.frame" such as
> >>>>
> >>>> if (any(is.na(cols)))
> >>>> stop("undefined columns selected")
> >>>>
> >>>>
> >>>>
> >>>> In R 2.5.1 a NULL is silently returned.
> >>>>
> >>>>> foo <- data.frame(Filename = c("a", "b"))
> >>>>> foo[, "FileName"]
> >>>> NULL
> >>>>
> >>>> Has something changed so that the code lines
> >>>> if (any(is.na(cols)))
> >>>> stop("undefined columns selected")
> >>>> in "[.data.frame" no longer work properly (if
> >>>> I am understanding the intention properly)?
> >>>>
> >>>> If not, could "[.data.frame" check an
> >>>> options() variable setting (say
> >>>> warn.undefined.colnames) and throw a warning
> >>>> if a non-existent column name is referenced?
> >>>>
> >>>>
> >>>>
> >>>>
> >>>>> sessionInfo()
> >>>> R version 2.5.1 (2007-06-27)
> >>>> powerpc-apple-darwin8.9.1
> >>>>
> >>>> locale:
> >>>> en_CA.UTF-8/en_CA.UTF-8/en_CA.UTF-8/C/en_CA.UTF-8/en_CA.UTF-8
> >>>>
> >>>> attached base packages:
> >>>> [1] "stats" "graphics" "grDevices" "utils" "datasets" "methods"
> >>>> "base"
> >>>>
> >>>> other attached packages:
> >>>> plotrix lme4 Matrix lattice
> >>>> "2.2-3" "0.99875-4" "0.999375-0" "0.16-2"
> >>>>>
> >>>>
> >>>>
> >>>>
> >>>> Steven McKinney
> >>>>
> >>>> Statistician
> >>>> Molecular Oncology and Breast Cancer Program
> >>>> British Columbia Cancer Research Centre
> >>>>
> >>>> email: smckinney +at+ bccrc +dot+ ca
> >>>>
> >>>> tel: 604-675-8000 x7561
> >>>>
> >>>> BCCRC
> >>>> Molecular Oncology
> >>>> 675 West 10th Ave, Floor 4
> >>>> Vancouver B.C.
> >>>> V5Z 1L3
> >>>> Canada
> >>>>
> >>>>
> >>
> >>
> >> --
> >> Brian D. Ripley, ripley at stats.ox.ac.uk
> >> Professor of Applied Statistics, http://www.stats.ox.ac.uk/~ripley/
> >> University of Oxford, Tel: +44 1865 272861 (self)
> >> 1 South Parks Road, +44 1865 272866 (PA)
> >> Oxford OX1 3TG, UK Fax: +44 1865 272595
> >>
> >>
> >>
> >
> >
> --
> Brian D. Ripley, ripley at stats.ox.ac.uk
> Professor of Applied Statistics, http://www.stats.ox.ac.uk/~ripley/
> University of Oxford, Tel: +44 1865 272861 (self)
> 1 South Parks Road, +44 1865 272866 (PA)
> Oxford OX1 3TG, UK Fax: +44 1865 272595
More information about the R-help
mailing list