[R] Remove duplicate elements in lists via recursive indexing

Janko Thyson janko.thyson.rstuff at googlemail.com
Mon May 23 16:58:00 CEST 2011


Hi Timothy,

and thanks for the answer. Loops where exactly what I was trying to 
avoid as much as possible. My initial idea was that that once I had 
recursive indexes at my disposal (which were retrieved over recursive 
loops),  I could simply use it in a similar manner as we do with indexes 
(that is 'do-all-at-once', like in 'x <- x[-idx.drop]'). But I think 
even though recursive indexes are nice, you can't get around looping, 
which I think in turn means that you constantly have to adapt your 
recursive index set to the most recent 'state' of your list.

In case you're interested, in the attachment you'll find my current 
solution ('listDuplicatesProcess.txt' including the example script 
'listDuplicatesProcess_examples.txt'). It builds on some other code, so 
you'd have to source 'flatten.txt' and 'envirToList' as well.

Regards,
Janko

On 23.05.2011 14:23, Timothy Bates wrote:
> Dear Janko,
> I think requires a for loop. The approach I took here was mark the dups, then dump them all in one hit:
>
> testData = expand.grid(letters[1:4],c(1:3))
> testData$keep=F
> uniqueIDS = unique(testData$Var1)
> for(thisID in uniqueIDS) {
> 	firstCaseOnly = match(thisID,testData$Var1)
> 	testData[firstCaseOnly,"keep"]=T
> }
>
> (testData = testData[testData$keep==T,])
>
>
> On 23 May 2011, at 11:59 AM, Janko Thyson wrote:
>
>> Dear list,
>>
>> I'm trying to solve something pretty basic here, but I can't really come up with a good solution. Basically, I would just like to remove duplicated named elements in lists via a their respective recursive indexes (given that I have a routine that identifies these recursive indexes). Here's a little example:
>>
>> # VECTORS
>> # Here, it's pretty simple to remove duplicated entries
>> y<- c(1,2,3,1,1)
>> idx.dupl<- which(duplicated(y))
>> y<- y[-idx.dupl]
>> # /
>>
>> # LISTS
>> x<- list(a=list(a.1.1=1, a.1.1=2, a.1.1=3))
>>
>> x[[c(1,1)]]
>> x[[c(1,2)]] # Should be removed.
>> x[[c(1,3)]] # Should be removed.
>>
>> # Let's say a 'checkDuplicates' routine would give me:
>> idx.dupl<- list(c(1,2), c(1,3))
>>
>> # Remove first duplicate:
>> x[[idx.dupl[[1]]]]<- NULL
>> x
>> # Problem:
>> # Once I remove the first duplicate, my duplicate index would have to be
>> # updated as well as there is not third element anymore.
>> x[[idx.dupl[[2]]]]<- NULL
>>
>> # So something like this would not work:
>> sapply(idx.dupl, function(x.idx){
>>     x[[x.idx]]<<- NULL
>> })
>> # /
>>
>> Sorry if I'm missing something totally obvious here, but do you have any idea how to solve this?
>>
>> Thanks a lot,
>> Janko
>>
>> ______________________________________________
>> R-help at r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>
-------------- next part --------------
An embedded and charset-unspecified text was scrubbed...
Name: envirToList.txt
URL: <https://stat.ethz.ch/pipermail/r-help/attachments/20110523/6d5a4d15/attachment.txt>
-------------- next part --------------
An embedded and charset-unspecified text was scrubbed...
Name: flatten.txt
URL: <https://stat.ethz.ch/pipermail/r-help/attachments/20110523/6d5a4d15/attachment-0001.txt>
-------------- next part --------------
An embedded and charset-unspecified text was scrubbed...
Name: flatten_examples.txt
URL: <https://stat.ethz.ch/pipermail/r-help/attachments/20110523/6d5a4d15/attachment-0002.txt>
-------------- next part --------------
An embedded and charset-unspecified text was scrubbed...
Name: listDuplicatesProcess.txt
URL: <https://stat.ethz.ch/pipermail/r-help/attachments/20110523/6d5a4d15/attachment-0003.txt>
-------------- next part --------------
An embedded and charset-unspecified text was scrubbed...
Name: listDuplicatesProcess_examples.txt
URL: <https://stat.ethz.ch/pipermail/r-help/attachments/20110523/6d5a4d15/attachment-0004.txt>


More information about the R-help mailing list