[R] assigning and saving datasets in a loop, with names changing with "i"
Marie Pierre Sylvestre
MP.Sylvestre at epimgh.mcgill.ca
Wed Dec 19 03:24:32 CET 2007
Dear R users,
I am analysing a very large data set and I need to perform several data
manipulations. The dataset is so big that the only way I can play with it
without having memory problems (E.g. "cannot allocate vectors of size...")
is to write a batch script to:
1. cut the data into pieces
2. save the pieces in seperate .RData files
3. Remove everything from the environment
4. load one of the piece
5. perform the manipulations on it
6. save it and remove it from the environment
7. Redo 4-6 for every piece
8. Merge everything together at the end
It works if coded line by line but since I'll have to perform these tasks
on other data sets, I am trying to automate this as much as I can.
I am using a loop in which I used 'assign' and 'get' (pseudo code below).
My problem is when I use 'get', it prints the whole object on the screen.
I am wondering whether there is a more efficient way to do what I need to
do. Any help would be appreciated. Please keep in mind that the whole
process is quite computer-intensive, so I can't keep everything in the
environment while R performs calculations.
Say I have 1 big dataframe called data. I use 'split' to divide it into a
list of 12 dataframes (call this list my.list)
my.fun is a function that takes a dataframe, performs several
manipulations on it and returns a dataframe.
for (i in 1:12){
assign( paste( "data", i, sep=""), my.fun(my.list[i])) # this works
# now I need to save this new object as a RData.
# The following line does not work
save(paste("data", i, sep = ""), file = paste( paste("data", i, sep =
""), "RData", sep="."))
}
# This works but it is a bit convoluted!!!
temp <- get(paste("data", i, sep = ""))
save(temp, file = "lala.RData")
}
I am *sure* there is something more clever to do but I can't find it. Any
help would be appreciated.
best regards,
MP
More information about the R-help
mailing list