[BioC] sum of columns

Steve Lianoglou lianoglou.steve at gene.com
Fri Jul 12 17:27:44 CEST 2013


Hi,

On Fri, Jul 12, 2013 at 1:40 AM, chris Jhon <cjhon217 at gmail.com> wrote:
> Hi Alex,
> Thank you i used this
>
> pos = rep(NA, nrow(D))
>>
>> for (i in 1:nrow(D)) {
>> if(sum(D[i,-c(1,2)]) <0)  pos[i] = i
>> }
>
> and it is working and i can get all !is.na and put them is vector P
>
> P=pos[!is.na(pos)]
>
> The subset can not work i got the error
> Error: memory exhausted (limit reached?)
>
> ANy idea??
> Thank you very much

It's not clear what you are expecting to get out of this ... it seems
pretty obvious that you do not have enough memory to process this
data, yet you keep asking if anybody has "any idea".

The simple idea is that you should use a machine with more RAM.

If that is not sufficient advice, could you please specify a bit more
clearly what you want help with?

You might consider reading in the file line by line, if the
numbers/counts in the current line are not sufficient to keep that row
of data in the next step of the analysis, then simply ignore the line,
otherwise append the line to a new file that you are building that
will contain the reduced set of data you want to work with.

Once that's done, then you can restart R and just load in the filtered
data, and move on. Still, if you keep running out of RAM trying to
take a subset of a data.frame that you want to process, then I suspect
actually *processing* the data even after it has been filtered will be
problematic with the limited resources the machine you have is working
with.

Last question is to ask yourself if it makes sense that you are
running out of memory. How big is the data you are trying to process?
How much RAM do you have?

Also, while you are sorting all of these things out, you might as well
upgrade to the latest version of R (3.0.1), as it seems you are
working with R-2.14, which is a bit outdated, and if you'd like help
with later parts of your analysis, you will be asked to upgraded to
the latest and greatest version of R anyway.

-steve


> On 7/11/13, Alessandro Brozzi <alessandro.brozzi at gmail.com> wrote:
>> you might try to allocate first an empty vector of NA and then use a for
>> loop:
>>
>> pos = rep(NA, nrow(D))
>>
>> for (i in 1:nrow(D)) {
>> if(sum(D[i,-c(1,2)]) <0)  pos[i] = i
>> }
>>
>> the subset:
>>
>> D[ pos[!is.na(pos)] , ]
>>
>> should be what you are seeking.
>>
>> hth,
>> alex
>>
>> On Thu, Jul 11, 2013 at 11:22 AM, chris Jhon <cjhon217 at gmail.com> wrote:
>>
>>> Thank you very much ,However, I got an error
>>> Error: cannot allocate vector of size 290 Kb
>>>
>>> > sessionInfo()
>>> R version 2.14.0 (2011-10-31)
>>> Platform: x86_64-unknown-linux-gnu (64-bit)
>>>
>>> locale:
>>> [1] C
>>>
>>> attached base packages:
>>> [1] stats     graphics  grDevices utils     datasets  methods   base
>>>
>>> loaded via a namespace (and not attached):
>>> [1] tools_2.14.0
>>>
>>> On 7/11/13, Alessandro Brozzi <alessandro.brozzi at gmail.com> wrote:
>>> > D[ rowSums( D[ , -c(1,2) ] ) > 0 , ]
>>> >
>>> > where 1 and 2 are the indices of the non-numerical columns
>>> >
>>> >
>>> > On Thu, Jul 11, 2013 at 11:12 AM, chris Jhon <cjhon217 at gmail.com>
>>> > wrote:
>>> >
>>> >> Hi Alex,
>>> >>
>>> >> Thank you.
>>> >>
>>> >> However , i got error due to memory limit
>>> >>
>>> >> Error: memory exhausted (limit reached?)
>>> >>
>>> >> In addition i have one col that have no numerical vlaue (e.g gene
>>> >> name) row sums will work only for numerical value columns?
>>> >>
>>> >> On 7/11/13, Alessandro Brozzi <alessandro.brozzi at gmail.com> wrote:
>>> >> > let call D your dataframe then:
>>> >> >
>>> >> > D[ rowSums(D) > 0 , ]
>>> >> >
>>> >> > alex
>>> >> >
>>> >> >
>>> >> > On Thu, Jul 11, 2013 at 10:56 AM, chris Jhon <cjhon217 at gmail.com>
>>> >> > wrote:
>>> >> >
>>> >> >> Hi All,
>>> >> >>
>>> >> >> I have a data frame like this
>>> >> >>
>>> >> >>
>>> >> >> gene         symbol    sample1 sample2 sample3 sample4
>>> >> >>
>>> >> >> gene1             A          0          0                0
>>> >> >> 0
>>> >> >> gene2             B          0          10              2
>>>  0
>>> >> >> gene3             C          0         0                0
>>> >> >> 0
>>> >> >>
>>> >> >> and i would like to subset the data frame to have only genes that
>>> have
>>> >> >> sum in all samples greater than zero.
>>> >> >>
>>> >> >> How to do this in R
>>> >> >>
>>> >> >> Thank you for any help
>>> >> >>
>>> >> >> _______________________________________________
>>> >> >> Bioconductor mailing list
>>> >> >> Bioconductor at r-project.org
>>> >> >> https://stat.ethz.ch/mailman/listinfo/bioconductor
>>> >> >> Search the archives:
>>> >> >> http://news.gmane.org/gmane.science.biology.informatics.conductor
>>> >> >>
>>> >> >
>>> >>
>>> >
>>>
>>
>
> _______________________________________________
> Bioconductor mailing list
> Bioconductor at r-project.org
> https://stat.ethz.ch/mailman/listinfo/bioconductor
> Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor



--
Steve Lianoglou
Computational Biologist
Bioinformatics and Computational Biology
Genentech



More information about the Bioconductor mailing list