[R] unqiue problem

David Winsemius dwinsemius at comcast.net
Mon Jun 14 19:10:21 CEST 2010


On Jun 14, 2010, at 12:32 PM, Assa Yeroslaviz wrote:

> I thought unique delete the whole line.
> I don't really need the row names, but I thought of it as a way of  
> getting
> the unique items.
>
> Is there a way of deleting whole lines completely according to their
> identifiers?
>
> What I really need are unique values on the first column.
>
> Assa
>
> On Mon, Jun 14, 2010 at 18:04, jim holtman <jholtman at gmail.com> wrote:
>
>> Your process does remove all the duplicate entries based on the
>> content of the two columns.  After you do this, there are still
>> duplicate entries in the first column that you are trying to use as
>> rownames and therefore the error.  Why to you want to use non-unique
>> entries as rownames?  Do you really need the row names, or should you
>> only be keeping unique values for the first column?
>>
>> On Mon, Jun 14, 2010 at 8:54 AM, Assa Yeroslaviz <frymor at gmail.com>  
>> wrote:
>>> Hello everybody,
>>>
>>> I have a a matrix of 2 columns and over 27k rows.
>>> some of the rows are double , so I tried to remove them with the  
>>> command
>>> unique():
>>>
>>>> Workbook5 <- read.delim(file =  "Workbook5.txt")
>>>> dim(Workbook5)
>>> [1] 27748     2
>>>> Workbook5 <- unique(Workbook5)

Jim already showed you one way in another thread and it is probably  
more intuitive than this way, but just so you know...

  Workbook5 <- Workbook5[ unique(Workbook5[ ,1] ) , ]

... should have worked. Logical indexing on first column with return  
of both columns of qualifying rows.

-- 
David.
>>>> dim(Workbook5)
>>> [1] 20101     2
>>>
>>> it removed a lot of line, but unfortunately not all of them. I  
>>> wanted to
>> add
>>> the row names to the matrix and got this error message:
>>>> rownames(Workbook5) <- Workbook5[,1]
>>> Error in `row.names<-.data.frame`(`*tmp*`, value = c(1L, 2L, 3L,  
>>> 4L, 5L,
>> :
>>> duplicate 'row.names' are not allowed
>>> In addition: Warning message:
>>> non-unique values when setting 'row.names': ‘A_51_P102339’,
>>> ‘A_51_P102518’, ‘A_51_P103435’, ‘A_51_P103465’,
>>> ‘A_51_P103594’, ‘A_51_P104409’, ‘A_51_P104718’,
>>> ‘A_51_P105869’, ‘A_51_P106428’, ‘A_51_P106799’,
>>> ‘A_51_P107176’, ‘A_51_P107959’, ‘A_51_P108767’,
>>> ‘A_51_P109258’, ‘A_51_P109708’, ‘A_51_P110341’,
>>> ‘A_51_P111757’, ‘A_51_P112427’, ‘A_51_P112662’,
>>> ‘A_51_P113672’, ‘A_51_P115018’, ‘A_51_P116496’,
>>> ‘A_51_P116636’, ‘A_51_P117666’, ‘A_51_P118132’,
>>> ‘A_51_P118168’, ‘A_51_P118400’, ‘A_51_P118506’,
>>> ‘A_51_P119315’, ‘A_51_P120093’, ‘A_51_P120305’,
>>> ‘A_51_P120738’, ‘A_51_P120785’, ‘A_51_P121134’,
>>> ‘A_51_P121359’, ‘A_51_P121412’, ‘A_51_P121652’,
>>> ‘A_51_P121724’, ‘A_51_P121829’, ‘A_51_P122141’,
>>> ‘A_51_P122964’, ‘A_51_P123422’, ‘A_51_P123895’,
>>> ‘A_51_P124008’, ‘A_51_P124719’, ‘A_51_P125648’,
>>> ‚ÄòA_51_P125679‚Äô, ‚ÄòA_51_P125779‚ [... truncated]
>>>
>>> Is there a better way to discard the duplicataions in the text file
>> (Excel
>>> file is the origin).
>>>
>>>> R.version
>>>              _
>>> platform       x86_64-apple-darwin9.8.0
>>> arch           x86_64
>>> os             darwin9.8.0
>>> system         x86_64, darwin9.8.0
>>> status         Patched
>>> major          2
>>> minor          11.1
>>> year           2010
>>> month          06
>>> day            03
>>> svn rev        52201
>>> language       R
>>> version.string R version 2.11.1 Patched (2010-06-03 r52201)
>>>
>>> THX
>>>
>>> Assa
>>>
>>> ______________________________________________
>>> R-help at r-project.org mailing list
>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>> PLEASE do read the posting guide
>> http://www.R-project.org/posting-guide.html
>>> and provide commented, minimal, self-contained, reproducible code.
>>>
>>>
>>
>>
>>
>> --
>> Jim Holtman
>> Cincinnati, OH
>> +1 513 646 9390
>>
>> What is the problem that you are trying to solve?
>>
>
> 	[[alternative HTML version deleted]]
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

David Winsemius, MD
West Hartford, CT



More information about the R-help mailing list