[BioC] Help on alternative and efficient data frame manipulation
Zhu, Lihua (Julie)
Julie.Zhu at umassmed.edu
Wed Dec 28 21:10:33 CET 2011
Thanks, Steve,
Matrix is definitely faster. I will try with list to see if it makes it
faster.
Best regards,
Julie
On 12/28/11 3:06 PM, "Steve Lianoglou" <mailinglist.honeypot at gmail.com>
wrote:
> Hi,
>
> On Wed, Dec 28, 2011 at 3:01 PM, Zhu, Lihua (Julie)
> <Julie.Zhu at umassmed.edu> wrote:
>> Hi,
>>
>> I have a data frame consisting of 5000 columns and 16000 rows. I would like
>> to convert all values x in column 4 to 5000 to 1 if x >0. The following code
>> works but it is very slow. Are there more efficient ways to modify large
>> number of entries in a data frame? Many thanks for your kind help!
>>
>> id <- 4:ncol(mydata)
>> for (i in id) {mydata[mydata[,i]>0,i]=1}
>
> You might have better results if you treat the columns of the
> data.frame as a list, so something like:
>
> for (i in 4:ncol(mydata)) {
> mydata[[i]] <- ifelse(mydata[[i]] > 0, 1, mydata[[i]])
> }
>
>
> ## Or, what if you convert to a matrix?
> m <- as.matrix(mydata[, -(1:4)])
> m[m > 0] <- 1
> ans <- cbind(mydata[,1:4], as.data.frame(m))
>
>
> Are any of those better?
>
> -steve
More information about the Bioconductor
mailing list