[R] working on a data frame

William Dunlap wdunlap at tibco.com
Fri Jul 25 20:07:24 CEST 2014


> if
> yourData[,8]==0,
> then
> yourData[,8]==1, yourData[,10] <- yourData[,9]/yourData[,8]

You could do express this in R as
   is8Zero <- yourData[,8] == 0
   yourData[is8Zero, 8] <- 1
   yourData[is8Zero, 10] <- yourData[is8Zero,9] / yourData[is8Zero,8]
Note how logical (Boolean) values are used as subscripts - read the '['
as 'such that' when using logical subscripts.

There are many more ways to express the same thing.

(I am tempted to change the algorithm to avoid the divide by zero problem
by making the quotient (numerator + epsilon)/(denominator + epsilon) where
epsilon is a very small number.  I am assuming that the raw numbers are
counts or at least cannot be negative.)

Bill Dunlap
TIBCO Software
wdunlap tibco.com


On Fri, Jul 25, 2014 at 10:44 AM, Matthew
<mccormack at molbio.mgh.harvard.edu> wrote:
> Thank you for your comments, Peter.
>
> A couple of questions.  Can I do something like the following ?
>
> if
> yourData[,8]==0,
> then
> yourData[,8]==1, yourData[,10] <- yourData[,9]/yourData[,8]
>
>
> I think I am just going to have to learn more about R. I thought getting
> into R would be like going from Perl to Python or Java etc., but it seems
> like R programming works differently.
>
> Matthew
>
>
> On 7/25/2014 12:06 AM, Peter Alspach wrote:
>>
>> Tena koe Matthew
>>
>> " Column 10 contains the result of the value in column 9 divided by the
>> value in column 8. If the value in column 8==0, then the division can not be
>> done, so  I want to change the zero to a one in order to do the division.".
>> That being the case, think in terms of vectors, as Sarah says.  Try:
>>
>> yourData[,10] <- yourData[,9]/yourData[,8]
>> yourData[yourData[,8]==0,10] <- yourData[yourData[,8]==0,9]
>>
>> This doesn't change the 0 to 1 in column 8, but it doesn't appear you
>> actually need to do that.
>>
>> HTH ....
>>
>> Peter Alspach
>>
>> -----Original Message-----
>> From: r-help-bounces at r-project.org [mailto:r-help-bounces at r-project.org]
>> On Behalf Of Matthew McCormack
>> Sent: Friday, 25 July 2014 3:16 p.m.
>> To: Sarah Goslee
>> Cc: r-help at r-project.org
>> Subject: Re: [R] working on a data frame
>>
>>
>> On 7/24/2014 8:52 PM, Sarah Goslee wrote:
>>>
>>> Hi,
>>>
>>> Your description isn't clear:
>>>
>>> On Thursday, July 24, 2014, Matthew <mccormack at molbio.mgh.harvard.edu
>>> <mailto:mccormack at molbio.mgh.harvard.edu>> wrote:
>>>
>>>      I am coming from the perspective of Excel and VBA scripts, but I
>>>      would like to do the following in R.
>>>
>>>       I have a data frame with 14 columns and 32,795 rows.
>>>
>>>      I want to check the value in column 8 (row 1) to see if it is a 0.
>>>      If it is not a zero, proceed to the next row and check the value
>>>      for column 8.
>>>      If it is a zero, then
>>>      a) change the zero to a 1,
>>>      b) divide the value in column 9 (row 1) by 1,
>>>
>>>
>>> Row 1, or the row in which column 8 == 0?
>>
>> All rows in which the value in column 8==0.
>>>
>>> Why do you want to divide by 1?
>>
>> Column 10 contains the result of the value in column 9 divided by the
>> value in column 8. If the value in column 8==0, then the division can not be
>> done, so  I want to change the zero to a one in order to do the division.
>> This is a fairly standard thing to do with this data. (The data are
>> measurements of amounts at two time points. Sometimes a thing will not be
>> present in the beginning (0), but very present at the later time. Column 10
>> is the log2 of the change. Infinite is not an easy number to work with, so
>> it is common to change the 0 to a 1. On the other hand, something may be
>> present at time 1, but not at the later time. In this case column 10 would
>> be taking the log2 of a number divided by 0, so again the zero is commonly
>> changed to a one in order to get a useable value in column 10. In both the
>> preceding cases there was a real change, but Inf and NaN are not helpful.)
>>>
>>>      c) place the result in column 10 (row 1) and
>>>
>>>
>>> Ditto on the row 1 question.
>>
>> I want to work on all rows where column 8 (and column 9) contain a zero.
>> Column 10 contains the result of the value in column 9 divided by the
>> value in column 8. So, for row 1, column 10 row 1 contains the ratio column
>> 9 row 1 divided by column 8 row 1, and so on through the whole
>> 32,000 or so rows.
>>
>> Most rows do not have a zero in columns 8 or 9. Some rows have  zero in
>> column 8 only, and some rows have a zero in column 9 only. I want to get rid
>> of the zeros in these two columns and then do the division to get a
>> manageable value in column 10. Division by zero and Inf are not considered
>> 'manageable' by me.
>>>
>>> What do you want column 10 to be if column 8 isn't 0? Does it already
>>> have a value. I suppose it must.
>>
>> Yes column 10 does have something, but this something can be Inf or NaN,
>> which I want to get rid of.
>>>
>>>      d) repeat this for each of the other 32,794 rows.
>>>
>>>      Is this possible with an R script, and is this the way to go about
>>>      it. If it is, could anyone get me started ?
>>>
>>>
>>> Assuming you want to put the new values in the rows where column 8 ==
>>> 0, you can do it in two steps:
>>>
>>> mydata[,10] <- ifelse(mydata[,8] == 0, mydata[,9]/whatever,
>>> mydata[,10]) #where whatever is the thing you want to divide by that
>>> probably isn't 1 mydata[,8] <- ifelse(mydata[,8] == 0, 1, mydata[,8])
>>>
>>> R programming is best done by thinking about vectorizing things,
>>> rather than doing them in loops. Reading the Intro to R that comes
>>> with your installation is a good place to start.
>>
>> Would it be better to change the data frame into a matrix, or something
>> else ?
>> Thanks for your help.
>>>
>>> Sarah
>>>
>>>
>>>      Matthew
>>>
>>>
>>>
>>>
>>> --
>>> Sarah Goslee
>>> http://www.stringpage.com
>>> http://www.sarahgoslee.com
>>> http://www.functionaldiversity.org
>>
>>
>>         [[alternative HTML version deleted]]
>>
>> ______________________________________________
>> R-help at r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide
>> http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>> The contents of this e-mail are confidential and may be subject to legal
>> privilege.
>>   If you are not the intended recipient you must not use, disseminate,
>> distribute or
>>   reproduce all or any part of this e-mail or attachments.  If you have
>> received this
>>   e-mail in error, please notify the sender and delete all material
>> pertaining to this
>>   e-mail.  Any opinion or views expressed in this e-mail are those of the
>> individual
>>   sender and may not represent those of The New Zealand Institute for
>> Plant and
>>   Food Research Limited.
>>
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.



More information about the R-help mailing list