[R] How to automatically create data frames from an existing one?

William Dunlap wdunlap at tibco.com
Wed Jan 11 18:36:38 CET 2017

You can use the 'with' function or the 'data' argument to many functions
to use the variables in the data frame without copying them out to the
global environment.  Leaving them in the data.frame keeps them from
getting lost among the temporary variables in the global environment.

> Data <- read.csv(header=TRUE, text=
+ "Name,Education,Wage
+ Abe,PhD,105
+ Bob,MS,108
+ Chuck,BS,118
+ Dave,PhD,102")
> with(Data, tapply(Wage, Education, mean))
   BS    MS   PhD
118.0 108.0 103.5
> lm(data=Data, Wage ~ Education - 1)

lm(formula = Wage ~ Education - 1, data = Data)

 EducationBS   EducationMS  EducationPhD
       118.0         108.0         103.5

Bill Dunlap
TIBCO Software
wdunlap tibco.com

On Wed, Jan 11, 2017 at 3:53 AM, Tunga Kantarcı <tungakantarci at gmail.com> wrote:
> I have a data frame that includes several columns representing
> variables and variables names are indicated at the top row of the data
> frame. That is, I had a csv file where variable names were stored in
> the top row, and when I imported the csv file to R, R created a data
> frame that appears with the name rwrdatafile (custom name I gave)
> where I can see all the variables with their names on the top row in
> RStudio. For example, one of the columns stores wage data and I can
> create a stand alone data frame (shall I call it a vector data frame?)
> for wage, but do this for all variables.
> That is, I can execute the command
> wage = rwrdatafile[,1,drop=FALSE]
> which nicely creates wage and RStudio shows it as data in its
> environment window and if I click on it, I can inspect it in a spread
> sheet like view and work with that data say in regression analysis.
> The problem is that there are many variables stored in the data frame
> rwrdatafile, and it is very tedious to repeat the above mentioned
> routine for each variable. Hence I attempted to write a for loop for
> this but it helped to no avail.
> In particular, I tried
> for (i in 1:k){
>   assign(names(rwrdatafile)[i],rwrdatafile[,i])
> }
> and in fact this nicely assigns each column in the data frame to a
> name, but I do not see the variables as data in the environment
> section. But what I need are variables that I can work with in matrix
> operations.
> I also tried
> for(i in 1:k){
>   names(rwrdatafile)[i] = rwrdatafile[,i,drop=FALSE]
> }
> thinking that this for loop would just repeat what I do for
> wage = rwrdatafile[,1,drop=FALSE]
> for all the variables in rwrdatafile.
> Please note that I do need to use a for loop and in fact I need to
> translate and imitate the MATLAB code below, which does the job in
> MATLAB, as close as possible in R.
> # MATLAB code generating variables from structure array rwrdatafile
> [N,k] = size(rwrdatafile.data);
> for i = 1:k
>     eval([cell2mat(rwrdatafile.textdata(i)) '= rwrdatafile.data(:,i);'])
> end
> ______________________________________________
> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

More information about the R-help mailing list