[R-SIG-Finance] sapply on a dataframe column of 30000 entries killed R session

George Kumar grgkumar4 at gmail.com
Tue Oct 9 05:18:07 CEST 2012


Hi,

I have a data frame of 6 columns of 25000 entries each whose 4th column is
of type character with numbers in a format like 2.43B, 3.13M. These are
numbers and I would like to change this column to numeric. So that I can put
this data in SQL. So I wrote the following code:

 fun=function(x)
{
    if (is.na(x)) {
        return (NA)
    }     
    if (length(grep("M", x))) {
        x=unlist(strsplit(x,"M"))
        x = as.numeric(x)  
        return (x*1000000)
    } 
    if (length(grep("B", x))) {
        x=unlist(strsplit(x,"B"))
        x = as.numeric(x)
        return (x*1000000000)
    }
}   
df=read.table("MYFILE", header=TRUE, sep="\t", as.is=TRUE)
df[,4] = sapply(df[, 4], fun)

But this never came back. The OS killed the R session. I saw using "free' in
Linux that system ran out of memory.

Any suggestions on how to handle this problem.

Thanks in advance.
George




--
View this message in context: http://r.789695.n4.nabble.com/sapply-on-a-dataframe-column-of-30000-entries-killed-R-session-tp4645524.html
Sent from the Rmetrics mailing list archive at Nabble.com.



More information about the R-SIG-Finance mailing list