[R] Trouble retrieving the second largest value from each row of a data.frame

David Winsemius dwinsemius at comcast.net
Sun Jul 25 02:09:32 CEST 2010


On Jul 24, 2010, at 4:54 PM, <mpward at illinois.edu> wrote:

> THANKS, but I have one issue and one question.
>
> For some reason the "secondstrongest" value for row 3 and 6 are  
> incorrect (they are the strongest) the remaining 10 are correct??

In my run of Wiley's code I instead get identical values for rows  
2,5,6. Holtman's and my solutions did not suffer from that defect,  
although mine suffered from my misreading of your request, thinking  
that you wanted the top 3. The fix is trivial
>
> These data are being used to track radio-tagged birds, they are from  
> automated radio telemetry receivers.  I will applying the following  
> formula
>
>   diff <- ((strongest- secondstrongest)/100)
>   bearingdiff <-30-(-0.0624*(diff**2))-(2.8346*diff)

vals <- c("value0", "value60", "value120", "value180", "value240",  
"value300")
value.str2 <- (match(yourdata$secondstrongestantenna, vals)-1)*60
value.str1 <- (match(yourdata$strongestantenna, vals)-1)*60
change.ind <- abs(match(yourdata, vals) - which(match(yourdata, vals) )

>
> A) Then the bearing diff is added to strongestantenna (value0 =  
> 0degrees) if the secondstrongestatenna is greater (eg value0 and  
> value60),

> B) or if the secondstrongestantenna is smaller than the  
> strongestantenna,
> then the bearingdiff is substracted from the strongestantenna.

>
> C) The only exception is that if value0 (0degrees) is strongest and  
> value300(360degrees) is the secondstrongestantenna then the bearing  
> is 360-bearingdiff.


> D) Also the strongestantenna and secondstrongestantenna have to be  
> next to each other (e.g. value0 with value60, value240 with  
> value300, value0 with value300) or the results should be NA.

After setting finalbearing with A, B, and C then:
yourdata$finalbearing <- with(yourdata, ifelse (
                                 change.ind <5 & change.ind > 1 ,
                                              NA, finalbearing) )

> I have been trying to use a series of if,else statements to produce  
> these bearing, but all I am producing is errors. Any suggestion  
> would be appreciated.


>
> Again THANKS for you efforts.
>
> Mike
>
> ---- Original message ----
>> Date: Fri, 23 Jul 2010 23:01:56 -0700
>> From: Joshua Wiley <jwiley.psych at gmail.com>
>> Subject: Re: [R] Trouble retrieving the second largest value from  
>> each row of  a data.frame
>> To: mpward at illinois.edu
>> Cc: r-help at r-project.org
>>
>> Hi,
>>
>> Here is a little function that will do what you want and return a  
>> nice output:
>>
>> #Function To calculate top two values and return
>> my.finder <- function(mydata) {
>> my.fun <- function(data) {
>>   strongest <- which.max(data)
>>   secondstrongest <- which.max(data[-strongest])
>>   strongestantenna <- names(data)[strongest]
>>   secondstrongantenna <- names(data[-strongest])[secondstrongest]
>>   value <- matrix(c(data[strongest], data[secondstrongest],
>>                     strongestantenna, secondstrongantenna), ncol =4)
>>   return(value)
>> }
>> dat <- apply(mydata, 1, my.fun)
>> dat <- t(dat)
>> dat <- as.data.frame(dat, stringsAsFactors = FALSE)
>> colnames(dat) <- c("strongest", "secondstrongest",
>>                    "strongestantenna", "secondstrongantenna")
>> dat[ , "strongest"] <- as.numeric(dat[ , "strongest"])
>> dat[ , "secondstrongest"] <- as.numeric(dat[ , "secondstrongest"])
>> return(dat)
>> }
>>
>>
>> #Using your example data:
>>
>> yourdata <- structure(list(value0 = c(-13007L, -12838L, -12880L,  
>> -12805L,
>> -12834L, -11068L, -12807L, -12770L, -12988L, -11779L), value60 =  
>> c(-11707L,
>> -13210L, -11778L, -11653L, -13527L, -11698L, -14068L, -11665L,
>> -11736L, -12873L), value120 = c(-11072L, -11176L, -11113L, -11071L,
>> -11067L, -12430L, -11092L, -11061L, -11137L, -12973L), value180 =  
>> c(-12471L,
>> -11799L, -12439L, -12385L, -11638L, -12430L, -11709L, -12373L,
>> -12570L, -12537L), value240 = c(-12838L, -13210L, -13089L, -11561L,
>> -13527L, -12430L, -11607L, -11426L, -13467L, -12973L), value300 =  
>> c(-13357L,
>> -13845L, -13880L, -13317L, -13873L, -12814L, -13025L, -12805L,
>> -13739L, -11146L)), .Names = c("value0", "value60", "value120",
>> "value180", "value240", "value300"), class = "data.frame",  
>> row.names = c("1",
>> "2", "3", "4", "5", "6", "7", "8", "9", "10"))
>>
>> my.finder(yourdata) #and what you want is in a nicely labeled data  
>> frame
>>
>> #A potential problem is that it is not very efficient
>>
>> #Here is a test using a matrix of 100,000 rows
>> #sampled from the same range as your data
>> #with the same number of columns
>>
>> data.test <- matrix(
>> sample(seq(min(yourdata),max(yourdata)), size = 500000, replace =  
>> TRUE),
>> ncol = 5)
>>
>> system.time(my.finder(data.test))
>>
>> #On my system I get
>>
>>> system.time(my.finder(data.test))
>>  user  system elapsed
>>  2.89    0.00    2.89
>>
>> Hope that helps,
>>
>> Josh
>>
>>
>>
>> On Fri, Jul 23, 2010 at 6:20 PM,  <mpward at illinois.edu> wrote:
>>> I have a data frame with a couple million lines and want to  
>>> retrieve the largest and second largest values in each row, along  
>>> with the label of the column these values are in. For example
>>>
>>> row 1
>>> strongest=-11072
>>> secondstrongest=-11707
>>> strongestantenna=value120
>>> secondstrongantenna=value60
>>>
>>> Below is the code I am using and a truncated data.frame.   
>>> Retrieving the largest value was easy, but I have been getting  
>>> errors every way I have tried to retrieve the second largest  
>>> value.  I have not even tried to retrieve the labels for the value  
>>> yet.
>>>
>>> Any help would be appreciated
>>> Mike
>>>
>>>
>>>> data<- 
>>>> data.frame(value0,value60,value120,value180,value240,value300)
>>>> data
>>>   value0 value60 value120 value180 value240 value300
>>> 1  -13007  -11707   -11072   -12471   -12838   -13357
>>> 2  -12838  -13210   -11176   -11799   -13210   -13845
>>> 3  -12880  -11778   -11113   -12439   -13089   -13880
>>> 4  -12805  -11653   -11071   -12385   -11561   -13317
>>> 5  -12834  -13527   -11067   -11638   -13527   -13873
>>> 6  -11068  -11698   -12430   -12430   -12430   -12814
>>> 7  -12807  -14068   -11092   -11709   -11607   -13025
>>> 8  -12770  -11665   -11061   -12373   -11426   -12805
>>> 9  -12988  -11736   -11137   -12570   -13467   -13739
>>> 10 -11779  -12873   -12973   -12537   -12973   -11146
>>>> #largest value in the row
>>>> strongest<-apply(data,1,max)
>>>>
>>>>
>>>> #second largest value in the row
>>>> n<-function(data)(1/(min(1/(data[1,]-max(data[1,]))))+  
>>>> (max(data[1,])))
>>>> secondstrongest<-apply(data,1,n)
>>> Error in data[1, ] : incorrect number of dimensions
>>>>
>>>
>>> ______________________________________________
>>> R-help at r-project.org mailing list
>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
>>> and provide commented, minimal, self-contained, reproducible code.
>>>
>>
>>
>>
>> -- 
>> Joshua Wiley
>> Ph.D. Student, Health Psychology
>> University of California, Los Angeles
>> http://www.joshuawiley.com/
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.



More information about the R-help mailing list