[R] subsetting a data.frame to the 'unique' of a column

Spencer Graves spencer.graves at pdf.com
Thu Dec 23 18:58:58 CET 2004


      Thanks, Bert, for the correction.  Moreover, I see now that mine 
didn't even give an acceptable answer, converting levels "a" and "c" of 
the factor DF$c to 1 and 3.  I confess I didn't read the documentation 
before replying.  Here is "duplicate with my example case: 

 > DF[!duplicated(DF$a), ]
  a b c
1 1 1 a
3 2 3 c

      Thanks again for the correction.  spencer graves

Berton Gunter wrote:

>Spencer's solution is considerably more inefficient then using duplicated()
>and subscripting: in a small example with 3 columns and 10000 rows, it took
>5 times as long on my Windows setup.
>
>The reason is that aggregate() is basically a wrapper for tapply and tapply
>basically loops in R. duplicated() loops in C (and uses hashing, I believe).
>
>Cheers,
>
>-- Bert Gunter
>Genentech Non-Clinical Statistics
>South San Francisco, CA
> 
>"The business of the statistician is to catalyze the scientific learning
>process."  - George E. P. Box
> 
> 
>
>  
>
>>-----Original Message-----
>>From: r-help-bounces at stat.math.ethz.ch 
>>[mailto:r-help-bounces at stat.math.ethz.ch] On Behalf Of Spencer Graves
>>Sent: Thursday, December 23, 2004 9:06 AM
>>To: Göran Broström
>>Cc: Rudi Alberts; r-help at stat.math.ethz.ch
>>Subject: Re: [R] subsetting a data.frame to the 'unique' of a column
>>
>>      What about "aggregate"? 
>>
>> DF <- data.frame(a=c(1,1,2), b=1:3, c=letters[1:3])
>> aggregate(DF[2:3], DF[1], function(x)x[1])
>>  a b c
>>1 1 1 1
>>2 2 3 3
>>
>>      hope this helps.  spencer graves
>>
>>Göran Broström wrote:
>>
>>    
>>
>>>On Thu, Dec 23, 2004 at 11:28:31AM -0800, Rudi Alberts wrote:
>>> 
>>>
>>>      
>>>
>>>>Hi,
>>>>
>>>>I often run into this problem:
>>>>I have a data.frame with one column containing entries that are not
>>>>unique. What I then want is a subset of the data.frame in which
>>>>the entries in that column have become the 'unique' of the original
>>>>column. 
>>>>Normally I program around it by taking the unique of the column and
>>>>making a new data.frame with it and filling the rest of the data.
>>>>
>>>>(By the way, when moving to the smaller data.frame for 
>>>>        
>>>>
>>example 5 rows
>>    
>>
>>>>with the same value in that column will be replaced by one 
>>>>        
>>>>
>>row for that
>>    
>>
>>>>value. I don't mind which of the rows now..)
>>>>
>>>>
>>>>something like this, however, this gives me the complete df.
>>>>
>>>>df[df$colname %in% unique(df$colname),]
>>>>
>>>>or this, which doesnt work
>>>>
>>>>df[df$colname == unique(df$colname),]
>>>>
>>>>   
>>>>
>>>>        
>>>>
>>>Use 'duplicated':
>>>
>>> 
>>>
>>>      
>>>
>>>>df[!duplicated(df$colname), ]
>>>>   
>>>>
>>>>        
>>>>
>>> 
>>>
>>>      
>>>
>>-- 
>>Spencer Graves, PhD, Senior Development Engineer
>>O:  (408)938-4420;  mobile:  (408)655-4567
>>
>>______________________________________________
>>R-help at stat.math.ethz.ch mailing list
>>https://stat.ethz.ch/mailman/listinfo/r-help
>>PLEASE do read the posting guide! 
>>http://www.R-project.org/posting-guide.html
>>
>>    
>>
>
>
>  
>

-- 
Spencer Graves, PhD, Senior Development Engineer
O:  (408)938-4420;  mobile:  (408)655-4567




More information about the R-help mailing list