[R] How does the data.frame function generate column names?

Joshua Wiley jwiley.psych at gmail.com
Mon Jan 24 01:22:54 CET 2011


Hi,

Welcome to R!  What you have run into is a feature of how subsetting
works.  By default, it converts to the lowest possible dimensions.
The odd name you see, "d.8.10...c..",  is an attempt to convert "
d[8:10, "c"]  " into a valid name.  R does this approximately by
converting disallowed characters (like ":") into periods (.).  This is
because data.frame() uses whatever was passed to it as the name of the
column, unless whatever it is already has a column name.  Here is some
code (you should be able to copy and paste), with comments that
explains a bit further and hopefully gives you a better feel for
indexing and creating data frame objects.

Cheers,

Josh

################################################
## your data (in one step)
d <- data.frame(a = 1:10, b = 11:20, c = 21:30)

## because only one column of 'd' is selected, the conversion
## to lowest possible dimensions is 1 (a vector)
## and that loses its column name, so use drop = FALSE
f <- data.frame(d[8:10, "c", drop = FALSE])

## another option is to explicitly name the column
g <- data.frame(c = d[8:10, "c"])

## here you have selected two columns so there must
## be at least two dimensions, and names are kept
g2 <-data.frame(d[8:10, c("b", "c")])

## to "see" what is happening
d[8:10, "c", drop = FALSE]
d[8:10, "c", drop = TRUE] # default

## for more details, see the documentation
?"["  # see the "drop" argument description
?data.frame # under the "value" section on names

################################################

On Sun, Jan 23, 2011 at 1:53 PM, H Roark <hrbuilder at hotmail.com> wrote:
>
> Hi all,
>
> I'm a new R user and am confused about how R behaves when converting a vector to a data frame when using the data.frame function.  I'm specifically interested in cases where the vector is expressed as a subset of another data frame.  For example, say I want to create a data frame from the last three rows of the third column of the data frame, d, that I've created below:
>
> a<-(1:10)
> b<-(11:20)
> c<-(21:30)
> d<-data.frame(a,b,c)
>
> To do that, I know that I could do:
>
> e<-d[8:10,"c"]
> f<-data.frame(e)
>
> However, I would like for the single column in the data frame, f, to be named "c".  Obviously, I could just use the vector, c<-d[8:10,"c"], in place of the vector e.  However, I wonder why I can't do:
>
> g<-data.frame(d[8:10,"c"])
>
> This expression returns the proper values, but the resulting variable is named "d.8.10...c.." and not "c" as I expected it to be named.
>
> Could someone explain the mechanics of this statement and tell me why it produced such an oddly named variable?  I'm especially confused as to why I get the result I expect if I use the data.frame function on multiple vectors, as in:
>
> g2<-data.frame(d[8:10,c("b","c")])
>
> which produces a data frame with columns named "b" and "c".
>
> Many thanks in advance,
> Alec
>
>        [[alternative HTML version deleted]]
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>



-- 
Joshua Wiley
Ph.D. Student, Health Psychology
University of California, Los Angeles
http://www.joshuawiley.com/



More information about the R-help mailing list