[R] [FORGED] Re: remove
Bert Gunter
bgunter.4567 at gmail.com
Sun Feb 12 17:19:25 CET 2017
My understanding was that the discordant names has been identified. So
in the example the OP gave, removing rows with first = "Alex" is done
by:
df[df$first !="Alex",]
If that is not the case, as others have pointed out, various forms of
tapply() (by, ave, etc.) can be used. I agree that that is not so
"basic," so I apologize if my understanding was incorrect.
Cheers,
Bert
Bert Gunter
"The trouble with having an open mind is that people keep coming along
and sticking things into it."
-- Opus (aka Berkeley Breathed in his "Bloom County" comic strip )
On Sat, Feb 11, 2017 at 10:04 PM, Rolf Turner <r.turner at auckland.ac.nz> wrote:
>
> On 12/02/17 18:36, Bert Gunter wrote:
>>
>> Basic stuff!
>>
>> Either subscripting or ?subset.
>>
>> There are many good R tutorials on the web. You should spend some
>> (more?) time with some.
>
>
> Uh, Bert, perhaps I'm being obtuse (a common occurrence) but it doesn't seem
> basic to me. The only way that I can see how to go at it is via
> a for loop:
>
> rdln <- function(X) {
> # Remove discordant last names.
> ok <- logical(nrow(X))
> for(nm in unique(X$first)) {
> xxx <- unique(X$last[X$first==nm])
> if(length(xxx)==1) ok[X$first==nm] <- TRUE
> }
> Y <- X[ok,]
> Y <- Y[order(Y$first),]
> rownames(Y) <- 1:nrow(Y)
> Y
> }
>
> Calling the toy data frame "melvin" rather than "df" (since "df" is the name
> of the built in F density function, it is bad form to use it as the name of
> another object) I get:
>
>> rdln(melvin)
> first week last
> 1 Bob 1 John
> 2 Bob 2 John
> 3 Bob 3 John
> 4 Cory 1 Jack
> 5 Cory 2 Jack
>
> which is the desired output. If there is a "basic stuff" way to do this
> I'd like to see it. Perhaps I will then be toadally embarrassed, but they
> say that this is good for one.
>
> cheers,
>
> Rolf
>
> --
> Technical Editor ANZJS
> Department of Statistics
> University of Auckland
> Phone: +64-9-373-7599 ext. 88276
>
>> On Sat, Feb 11, 2017 at 9:02 PM, Val <valkremk at gmail.com> wrote:
>>>
>>> Hi all,
>>> I have a big data set and want to remove rows conditionally.
>>> In my data file each person were recorded for several weeks. Somehow
>>> during the recording periods, their last name was misreported. For
>>> each person, the last name should be the same. Otherwise remove from
>>> the data. Example, in the following data set, Alex was found to have
>>> two last names .
>>>
>>> Alex West
>>> Alex Joseph
>>>
>>> Alex should be removed from the data. if this happens then I want
>>> remove all rows with Alex. Here is my data set
>>>
>>> df <- read.table(header=TRUE, text='first week last
>>> Alex 1 West
>>> Bob 1 John
>>> Cory 1 Jack
>>> Cory 2 Jack
>>> Bob 2 John
>>> Bob 3 John
>>> Alex 2 Joseph
>>> Alex 3 West
>>> Alex 4 West ')
>>>
>>> Desired output
>>>
>>> first week last
>>> 1 Bob 1 John
>>> 2 Bob 2 John
>>> 3 Bob 3 John
>>> 4 Cory 1 Jack
>>> 5 Cory 2 Jack
More information about the R-help
mailing list