[BioC] Help on alternative and efficient data frame manipulation

Zhu, Lihua (Julie) Julie.Zhu at umassmed.edu
Wed Dec 28 21:10:33 CET 2011


Thanks, Steve,

Matrix is definitely faster. I will try with list to see if it makes it
faster.

Best regards,

Julie


On 12/28/11 3:06 PM, "Steve Lianoglou" <mailinglist.honeypot at gmail.com>
wrote:

> Hi,
> 
> On Wed, Dec 28, 2011 at 3:01 PM, Zhu, Lihua (Julie)
> <Julie.Zhu at umassmed.edu> wrote:
>> Hi,
>> 
>> I have a data frame consisting of 5000 columns and 16000 rows. I would like
>> to convert all values x in column 4 to 5000 to 1 if x >0. The following code
>> works but it is very slow. Are there more efficient ways to modify large
>> number of entries in a data frame? Many thanks for your kind help!
>> 
>> id <- 4:ncol(mydata)
>> for (i in id) {mydata[mydata[,i]>0,i]=1}
> 
> You might have better results if you treat the columns of the
> data.frame as a list, so something like:
> 
> for (i in 4:ncol(mydata)) {
>   mydata[[i]] <- ifelse(mydata[[i]] > 0, 1, mydata[[i]])
> }
> 
> 
> ## Or, what if you convert to a matrix?
> m <- as.matrix(mydata[, -(1:4)])
> m[m > 0] <- 1
> ans <- cbind(mydata[,1:4], as.data.frame(m))
> 
> 
> Are any of those better?
> 
> -steve



More information about the Bioconductor mailing list