[BioC] Help on alternative and efficient data frame manipulation
Zhu, Lihua (Julie)
Julie.Zhu at umassmed.edu
Wed Dec 28 21:41:01 CET 2011
Steve,
Converting to a matrix resulted in a much larger increase in speed compared
with treating the columns as list. Here are the comparison results for a 100
by 100 data frame.
id <- 4:ncol(mydata)
system.time(for (i in id) {
mydata[[i]] <- ifelse(mydata[[i]] > 0, 1, mydata[[i]])}
)
user system elapsed
0.034 0.000 0.037
system.time(for (i in id) {
mydata[,i] <- ifelse(mydata[,i] > 0, 1, mydata[,i])}
)
user system elapsed
0.038 0.003 0.042
system.time({m <- as.matrix(mydata[, -(id)])
m[m > 0] <- 1
ans <- cbind(mydata[,1:4], as.data.frame(m))})
user system elapsed
0.006 0.000 0.009
Many thanks for your great suggestions!
Best regards,
Julie
On 12/28/11 3:10 PM, "Julie Zhu" <julie.zhu at umassmed.edu> wrote:
> Thanks, Steve,
>
> Matrix is definitely faster. I will try with list to see if it makes it
> faster.
>
> Best regards,
>
> Julie
>
>
> On 12/28/11 3:06 PM, "Steve Lianoglou" <mailinglist.honeypot at gmail.com>
> wrote:
>
>> Hi,
>>
>> On Wed, Dec 28, 2011 at 3:01 PM, Zhu, Lihua (Julie)
>> <Julie.Zhu at umassmed.edu> wrote:
>>> Hi,
>>>
>>> I have a data frame consisting of 5000 columns and 16000 rows. I would like
>>> to convert all values x in column 4 to 5000 to 1 if x >0. The following code
>>> works but it is very slow. Are there more efficient ways to modify large
>>> number of entries in a data frame? Many thanks for your kind help!
>>>
>>> id <- 4:ncol(mydata)
>>> for (i in id) {mydata[mydata[,i]>0,i]=1}
>>
>> You might have better results if you treat the columns of the
>> data.frame as a list, so something like:
>>
>> for (i in 4:ncol(mydata)) {
>> mydata[[i]] <- ifelse(mydata[[i]] > 0, 1, mydata[[i]])
>> }
>>
>>
>> ## Or, what if you convert to a matrix?
>> m <- as.matrix(mydata[, -(1:4)])
>> m[m > 0] <- 1
>> ans <- cbind(mydata[,1:4], as.data.frame(m))
>>
>>
>> Are any of those better?
>>
>> -steve
>
> _______________________________________________
> Bioconductor mailing list
> Bioconductor at r-project.org
> https://stat.ethz.ch/mailman/listinfo/bioconductor
> Search the archives:
> http://news.gmane.org/gmane.science.biology.informatics.conductor
More information about the Bioconductor
mailing list