[R] Confused about using data.table package,
David Winsemius
dwinsemius at comcast.net
Sun Feb 19 22:01:36 CET 2017
> On Feb 19, 2017, at 11:37 AM, C W <tmrsg11 at gmail.com> wrote:
>
> Hi R,
>
> I am a little confused by the data.table package.
>
> library(data.table)
>
> df <- data.frame(w=rnorm(20, -10, 1), x= rnorm(20, 0, 1), y=rnorm(20, 10, 1),
> z=rnorm(20, 20, 1))
>
> df <- data.table(df)
df <- setDT(df) is preferred.
>
> #drop column w
>
> df_1 <- df[, w := NULL] # I thought you are supposed to do: df_1 <- df[, -w]
Nope. The "[.data.table" function is very different from the "[.data.frame' function. As you should be able to see, an expression in the `j` position for "[.data.table" gets evaluated in the environment of the data.table object, so unquoted column names get returned after application of any function. Here it's just a unary minus.
Actually "nope" on two accounts. You cannot use a unary minus for column names in `[.data.frame` either. Would have needed to be df[ , !colnames(df) in "w"] # logical indexing
>
> df_2 <- df[x<y] # aren't you supposed to do df_2 <- df[x<y]?
I don't see a difference.
>
> df_3 <- df[, a := x-y] # created new column a using x minus y, why are we
> using colon equals?
You need to do more study of the extensive documentation. The behavior of the ":=" function is discussed in detail there.
>
> I am a bit confused by this syntax.
It's non-standard for R but many people find the efficiencies of the package worth the extra effort to learn what is essentially a different evaluation strategy.
>
> Thanks!
>
> [[alternative HTML version deleted]]
Rhelp is a plain text mailing list,
--
David
>
> ______________________________________________
> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
David Winsemius
Alameda, CA, USA
More information about the R-help
mailing list