[R-SIG-Finance] sapply on a dataframe column of 30000 entries killed R session
George Kumar
grgkumar4 at gmail.com
Tue Oct 9 05:18:07 CEST 2012
Hi,
I have a data frame of 6 columns of 25000 entries each whose 4th column is
of type character with numbers in a format like 2.43B, 3.13M. These are
numbers and I would like to change this column to numeric. So that I can put
this data in SQL. So I wrote the following code:
fun=function(x)
{
if (is.na(x)) {
return (NA)
}
if (length(grep("M", x))) {
x=unlist(strsplit(x,"M"))
x = as.numeric(x)
return (x*1000000)
}
if (length(grep("B", x))) {
x=unlist(strsplit(x,"B"))
x = as.numeric(x)
return (x*1000000000)
}
}
df=read.table("MYFILE", header=TRUE, sep="\t", as.is=TRUE)
df[,4] = sapply(df[, 4], fun)
But this never came back. The OS killed the R session. I saw using "free' in
Linux that system ran out of memory.
Any suggestions on how to handle this problem.
Thanks in advance.
George
--
View this message in context: http://r.789695.n4.nabble.com/sapply-on-a-dataframe-column-of-30000-entries-killed-R-session-tp4645524.html
Sent from the Rmetrics mailing list archive at Nabble.com.
More information about the R-SIG-Finance
mailing list