[R] Follow-up Question: data frames; matching/merging

Mon Feb 8 19:05:13 CET 2010

Wow.. thanks for the deluge of responses!

Aggregate seems like the way to go here.

But, suppose that instead of integers in column V2, I actually have
dates (and instead of keeping the minimum integer, I want to keep the
earliest date):

> df = data.frame(V1=c(1,1,2,3,1,2),V2=c('2002-03-13','1989-03-10','1988-01-20','1997-05-15','1996-11-18','2000-01-12'));
> aggregate(df$V2,df['V1'],min) #thanks, Phil
Error in Summary.factor(c(6L, 2L, 3L), na.rm = FALSE) :
  min not meaningful for factors

So I checked:
> min(df$V2)
Error in Summary.factor(c(6L, 2L, 1L, 4L, 3L, 5L), na.rm = FALSE) :
  min not meaningful for factors

This can be fixed with:
> min(as.Date(df$V2))
[1] "1988-01-20"

However, I get a strange response when I try:
> aggregate(as.Date(df$V2),df['V1'],min)
  V1    x
1  1 7008
2  2 6593
3  3 9996

What I'm looking for, of course, is:
> aggregate(as.Date(df$V2),df['V1'],min)
  V1    V2
1  1   1996-11-18
2  2   1988-01-20
3  3   1997-05-15

Any idea how to produce the desired output?

Thanks,
Jonathan

On Mon, Feb 8, 2010 at 12:11 PM, Gabor Grothendieck
<ggrothendieck at gmail.com> wrote:
> Here are 3 solutions assuming DF contains the data frame:
>
>> # 1. aggregate
>> aggregate(DF[2], DF[1], min)
>  V1 V2
> 1  a  2
> 2  b  9
> 3  c  4
>
>> # 2. aggregate.formula - requires R 2.11.x
>> aggregate(V2 ~ V1, DF, min)
>  V1 V2
> 1  a  2
> 2  b  9
> 3  c  4
>
>> # 3. SQL using sqldf
>> library(sqldf)
>> sqldf("select V1, min(V2) V2 from DF group by V1")
>  V1 V2
> 1  a  2
> 2  b  9
> 3  c  4
>
>> # 4. summaryBy in the doBy package
>> library(doBy)
>> summaryBy(V2 ~., DF, FUN = min, keep.names = TRUE)
>  V1 V2
> 1  a  2
> 2  b  9
> 3  c  4
>
> On Mon, Feb 8, 2010 at 11:39 AM, Jonathan <jonsleepy at gmail.com> wrote:
>> Hi all,
>>    I'm feeling a little guilty to ask this question, since I've
>> written a solution using a rather clunky for loop that gets the job
>> done.  But I'm convinced there must be a faster (and probably more
>> elegant) way to accomplish what I'm looking to do (perhaps using the
>> "merge" function?).  I figured somebody out there might've already
>> figured this out:
>>
>> I have a dataframe with two columns (let's call them V1 and V2).  All
>> rows are unique, although column V1 has several redundant entries.
>>
>> Ex:
>>
>>     V1     V2
>> 1    a        3
>> 2    a        2
>> 3    b        9
>> 4    c        4
>> 5    a        7
>> 6    b        11
>>
>>
>> What I'd like is to return a dataframe cut down to have only unique
>> entires in V1.  V2 should contain a vector, for each V1, that is the
>> minimum of all the possible choices from the set of redundant V1's.
>>
>> Example output:
>>
>>      V1     V2
>> 1     a        2
>> 2     b        9
>> 3     c        4
>>
>>
>> If somebody could (relatively easily) figure out how to get closer to
>> a solution, I'd appreciate hearing how.  Also, I'd be interested to
>> hear how you came upon the answer (so I can get better at searching
>> the R resources myself).
>>
>> Regards,
>> Jonathan
>>
>> ______________________________________________
>> R-help at r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>>
>