[R] Keep rows in a dataset if one value in a column is duplicated

Simon Knapp sleepingwell at gmail.com
Fri Sep 28 02:35:19 CEST 2012


#By using cbind in:
PairIDs<-cbind(PairID, PairIDDuplicates)

#You create a numeric matrix (the logical
#vector PairIDDuplicates gets converted
#to numeric - note that your second column
#contains 1s and 0s, not Trues and Falses).
#Matricies are not subsetable using $,
#they are basically a vector with
#a dimension attribute - hence your error).

#Two ways you could have avoided your error are:
# 1) changing the cbind to data.frame
PairIDs <- data.frame(PairID, PairIDDuplicates)
names(PairIDs) <- c("Pairid","Pairiddups")
Health2PairsOnly <- PairIDs[PairIDs$Pairiddups,]

# 2) using the dimensions name like:
PairIDs<-cbind(PairID, PairIDDuplicates)
colnames(PairIDs) <- c("Pairid","Pairiddups")
Health2PairsOnly <- PairIDs[PairIDs[,'Pairiddups']==1,]

#In the latter you can save a line of code with
PairIDs <- data.frame(Pairid=PairID, Pairiddups=PairIDDuplicates)



#Note that there is a fair bit of redundancy throughout
#your code. A neater way of subsetting your original
#data, for instance, would be:
PairIDdup <- unique(PairID[duplicated(PairID)])
Health2[PairID %in% PairIDdup,]



Have Fun!
Simon Knapp



On Fri, Sep 28, 2012 at 5:46 AM, GradStudentDD <dd7kc at virginia.edu> wrote:
> Hi,
>
> I have a data set of observations by either one person or a pair of people.
> I want to only keep the pair observations, and was using the code below
> until it gave me the error " $ operator is invalid for atomic vectors". I am
> just beginning to learn R, so I apologize if the code is really rough.
>
> Basically I want to keep all the rows in the data set for which the value of
> "Pairiddups" is TRUE. How do I do it? And how do I get past the error?
>
> Thank you so much,
> Diana
>
> PairID<-c(Health2$pairid)
>
> duplicated(PairID, incomparables=TRUE, fromLast=TRUE)
>
> PairIDdup=duplicated(PairID)
> cbind(PairID, PairIDdup)
> PairID[which(PairIDdup)]
>
> PairIDDuplicates<-PairID%in%PairID[which(PairIDdup)]
> PairIDs<-cbind(PairID, PairIDDuplicates)
>
> colnames(PairIDs)<-c("Pairid","Pairiddups")
>
> Health2PairsOnly<-PairIDs[ which(PairIDs$Pairiddups=='TRUE'), ]




More information about the R-help mailing list