[R] In praise of "options(warnPartialMatchDollar = TRUE)"

David Winsemius dwinsemius at comcast.net
Sun Nov 27 21:12:44 CET 2016


> On Nov 27, 2016, at 11:21 AM, Chris Evans <chrishold at psyctc.org> wrote:
> 
> Was about to reply just to sender but thought there were good byproducts of this so, to all ...
> 
> Many thanks Dr. Winsemius, and apologies for the HTML: tired sloppiness.  My bad.
> 
> Aha. For the first time I can really see a logic for dataFrame[['variable']] -- thanks for that.  I will have to break myself of the "$" habit.  Pity as it's a lot more keystrokes!  W
> 
> I understand about the nasty asset stripped side effects of cbind but thought in this situation it would cause no problems. Is the preferred route to create turn a first vector to data frame and then add the others using dataframe[['newVariable']] <- nextVector ?

With an existing dataframe the use of cbind is safe, because the cbind data.frame function will not coerce arguments to matrix class. It is the use of cbind with vectors that is the source of danger.

This would be the preferred method of constructing a data.frame from objects with attributes:

dt <- as.Date( 1:10, origin="1970-01-01")
fac <- factor(letters[1:10])
nums <-10:1

dfrm <- data.frame( dt, fac, nums)

#OR skip the preliminary vector creation and use a named argument list:

dfrm <- data.frame( dt=as.Date( 1:10, origin="1970-01-01"),
                    fac=factor(letters[1:10]),
                    nums=10:1)

The second method lets you avoid leaving loose vectors that might later get used inappropriately if you happened to later write a function with a parameter or object that matched a named vector in the global environment. 


After either method, you can `cbind` to that dfrm object to your heart's content, because the cbind.data.frame method is dispatched.

-- 
David.

> 
> Very best wishes and thanks again: this is an amazing list,
> 
> Chris
> 
> 
> ----- Original Message -----
>> From: "David Winsemius" <dwinsemius at comcast.net>
>> To: "Chris Evans" <chrishold at psyctc.org>
>> Cc: r-help at r-project.org
>> Sent: Sunday, 27 November, 2016 18:25:18
>> Subject: Re: [R] In praise of "options(warnPartialMatchDollar = TRUE)"
> 
>>> On Nov 27, 2016, at 7:12 AM, Chris Evans <chrishold at psyctc.org> wrote:
>>> 
>>> I am just posting this to the list because someone else may one day waste an
>>> hour or so because s/he has unknowingly hit a partial match failure using "$".
>>> It's my folly that I did but I am surprised that options(warnPartialMatchDollar
>>> = TRUE) isn't the default setting.
>>> 
>>> Here's a bit of reproducible code that shows the challenge.
>>> 
>>> #rm(list=ls()) ### BEWARE: me making sure environment was clean
>>> set.seed(12345) # get fully reproducible example
>>> nRows <- 100
>>> Sample <- sample(0:1,nRows,replace=TRUE)
>>> data2 <- data.frame(cbind(1:nRows,Sample)) # create data frame
>> 
>> Using dataframe( cbind( ...) ) is a predictable method for creating later
>> headaches. and there is no options-warning available. cbind coerces an argument
>> list of vectors to matrix class, thus dropping all attributes (dates, times and
>> factors are all destroyed.)
>> 
>>> table(data2$Samp) # call which silently achieves partial match
>>> data2$innoccuousname <-
>>> factor(data2$Samp,labels=c("Non-clinical","Clinical"),levels=0:1)
>>> str(data2$Samp) # all fine, no apparent destruction of the non-existent vector
>>> data2$Samp
>>> data2$SampFac <-
>>> factor(data2$Samp,labels=c("Non-clinical","Clinical"),levels=0:1)
>>> str(data2$Samp) # returns NULL because there is no longer a single partial match
>>> to "Samp" but no warning!
>>> str(data2$Sample) # but of course, data2$Sample is still there
>>> 
>>> Because I had used "data2$Samp" all the way through a large file of R (markup)
>>> code and hadn't noticed that the variable names in the SPSS file I was reading
>>> in had changed from "Samp" to "Sample" I appeared to be destroying data2$Samp.
>>> 
>>> I have now set options(warnPartialMatchDollar = TRUE) in my Rprofile.site file
>>> and am just posting this here in case it helps someone some day.
>> 
>> This is one of the reasons many experienced R programmers eschew the use of the
>> "$" function in programming.
>> 
>> The preferred use would be :
>> 
>> data2[['Samp']]
>> 
>> (No partial match.)
>> 
>>> 
>>> 	[[alternative HTML version deleted]]
>> 
>> Plain text is generally preferred on Rhelp but there does not appear to have
>> been a problem in this posting instance.
>> 
>>> 
>>> ______________________________________________
>>> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
>>> and provide commented, minimal, self-contained, reproducible code.
>> 
>> David Winsemius
>> Alameda, CA, USA
> 
> ______________________________________________
> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

David Winsemius
Alameda, CA, USA



More information about the R-help mailing list