[R] Row exclude

David Carlson dc@r|@on @end|ng |rom t@mu@edu
Mon Jan 31 01:27:05 CET 2022


You need to add "-": ` (dat3 <- dat1[-unique(c(BadName, BadAge,
BadWeight)), ])` which makes the command NOT).

David

On Sun, Jan 30, 2022 at 11:00 AM Val <valkremk using gmail.com> wrote:

> Thank you David. What about if I want to list the excluded rows? I used
> this     (dat3 <- dat1[unique(c(BadName, BadAge, BadWeight)), ]) It did not
> work.The desired output  is,   Alex,  20,  13X  John,  3BC, 175  Jack3, 34,
>  140 ‍ ‍ ‍ ‍ ‍ ‍ ‍ ‍ ZjQcmQRYFpfptBannerStart
> This Message Is From an External Sender
> This message came from outside your organization.
> ZjQcmQRYFpfptBannerEnd
> Thank you David.
>
> What about if I want to list the excluded rows?
> I used this
>     (dat3 <- dat1[unique(c(BadName, BadAge, BadWeight)), ])
>
> It did not work.The desired output  is,
>   Alex,  20,  13X
>  John,  3BC, 175
>  Jack3, 34,  140
>
> Thank you,
>
> On Sat, Jan 29, 2022 at 10:15 PM David Carlson <dcarlson using tamu.edu> wrote:
>
>> It is possible that there would be errors on the same row for different
>> columns. This does not happen in your example. If row 4 was "John6, 3BC,
>> 175X" then row 4 would be included 3 times, but we only need to remove it
>> once. Removing the duplicates is not necessary since R would not get
>> confused, but length(unique(c(BadName, BadAge, BadWeight)) indicates how
>> many lines are being removed.
>>
>> David
>>
>> On Sat, Jan 29, 2022 at 8:32 PM Val <valkremk using gmail.com> wrote:
>>
>>> Thank you David for your help. I just have one question on this. What is
>>> the purpose of  using the "unique" function on this?   (dat2 <-
>>> dat1[-unique(c(BadName, BadAge, BadWeight)), ])   I got the same result
>>> without using it. ZjQcmQRYFpfptBannerStart
>>> This Message Is From an External Sender
>>> This message came from outside your organization.
>>> ZjQcmQRYFpfptBannerEnd
>>> Thank you David for your help.
>>>
>>> I just have one question on this. What is the purpose of  using the
>>> "unique" function on this?
>>>   (dat2 <- dat1[-unique(c(BadName, BadAge, BadWeight)), ])
>>>
>>> I got the same result without using it.
>>>        (dat2 <- dat1[-(c(BadName, BadAge, BadWeight)), ])
>>>
>>> My concern is when I am applying this for the large data set the
>>> "unique"  function may consume resources(time  and memory).
>>>
>>> Thank you.
>>>
>>> On Sat, Jan 29, 2022 at 12:30 AM David Carlson <dcarlson using tamu.edu>
>>> wrote:
>>>
>>>> Given that you know which columns should be numeric and which should be
>>>> character, finding characters in numeric columns or numbers in character
>>>> columns is not difficult. Your data frame consists of three character
>>>> columns so you can use regular expressions as Bert mentioned. First
>>>> you should strip the whitespace out of your data:
>>>>
>>>> dat1 <-read.table(text="Name, Age, Weight
>>>>   Alex,  20,  13X
>>>>   Bob,  25,  142
>>>>   Carol, 24,  120
>>>>   John,  3BC,  175
>>>>   Katy,  35,  160
>>>>   Jack3, 34,  140",sep=",", header=TRUE, stringsAsFactors=FALSE,
>>>> strip.white=TRUE)
>>>>
>>>> Now check to see if all of the fields are character as expected.
>>>>
>>>> sapply(dat1, typeof)
>>>> #        Name         Age      Weight
>>>> # "character" "character" "character"
>>>>
>>>> Now identify character variables containing numbers and numeric
>>>> variables containing characters:
>>>>
>>>> BadName <- which(grepl("[[:digit:]]", dat1$Name))
>>>> BadAge <- which(grepl("[[:alpha:]]", dat1$Age))
>>>> BadWeight <- which(grepl("[[:alpha:]]", dat1$Weight))
>>>>
>>>> Next remove those rows:
>>>>
>>>> (dat2 <- dat1[-unique(c(BadName, BadAge, BadWeight)), ])
>>>> #    Name Age Weight
>>>> #  2   Bob  25    142
>>>> #  3 Carol  24    120
>>>> #  5  Katy  35    160
>>>>
>>>> You still need to convert Age and Weight to numeric, e.g. dat2$Age <-
>>>> as.numeric(dat2$Age).
>>>>
>>>> David Carlson
>>>>
>>>>
>>>> On Fri, Jan 28, 2022 at 11:59 PM Bert Gunter <bgunter.4567 using gmail.com>
>>>> wrote:
>>>>
>>>>> As character 'polluted' entries will cause a column to be read in (via
>>>>> read.table and relatives) as factor or character data, this sounds like a
>>>>> job for regular expressions. If you are not familiar with this subject,
>>>>> time to learn. And, yes, ZjQcmQRYFpfptBannerStart
>>>>> This Message Is From an External Sender
>>>>> This message came from outside your organization.
>>>>> ZjQcmQRYFpfptBannerEnd
>>>>>
>>>>> As character 'polluted' entries will cause a column to be read in (via
>>>>> read.table and relatives) as factor or character data, this sounds like a
>>>>> job for regular expressions. If you are not familiar with this subject,
>>>>> time to learn. And, yes, some heavy lifting will be required.
>>>>> See ?regexp for a start maybe? Or the stringr package?
>>>>>
>>>>> Cheers,
>>>>> Bert
>>>>>
>>>>>
>>>>>
>>>>>
>>>>> On Fri, Jan 28, 2022, 7:08 PM Val <valkremk using gmail.com> wrote:
>>>>>
>>>>> > Hi All,
>>>>> >
>>>>> > I want to remove rows that contain a character string in an integer
>>>>> > column or a digit in a character column.
>>>>> >
>>>>> > Sample data
>>>>> >
>>>>> > dat1 <-read.table(text="Name, Age, Weight
>>>>> >  Alex,  20,  13X
>>>>> >  Bob,   25,  142
>>>>> >  Carol, 24,  120
>>>>> >  John,  3BC,  175
>>>>> >  Katy,  35,  160
>>>>> >  Jack3, 34,  140",sep=",",header=TRUE,stringsAsFactors=F)
>>>>> >
>>>>> > If the Age/Weight column contains any character(s) then remove
>>>>> > if the Name  column contains an digit then remove that row
>>>>> > Desired output
>>>>> >
>>>>> >    Name   Age weight
>>>>> > 1   Bob     25    142
>>>>> > 2   Carol   24    120
>>>>> > 3   Katy    35    160
>>>>> >
>>>>> > Thank you,
>>>>> >
>>>>> > ______________________________________________
>>>>> > R-help using r-project.org mailing list -- To UNSUBSCRIBE and more, see
>>>>> > https://urldefense.com/v3/__https://stat.ethz.ch/mailman/listinfo/r-help__;!!KwNVnqRv!QW1WPKY5eSNT7sMW28dnAKV7IXWvIc4UwOwUHkJgJ8uuGUrIAXvRjZWVXhZB_0c$
>>>>> > PLEASE do read the posting guide
>>>>> > https://urldefense.com/v3/__http://www.R-project.org/posting-guide.html__;!!KwNVnqRv!QW1WPKY5eSNT7sMW28dnAKV7IXWvIc4UwOwUHkJgJ8uuGUrIAXvRjZWVRmZSfcI$
>>>>> > and provide commented, minimal, self-contained, reproducible code.
>>>>> >
>>>>>
>>>>> 	[[alternative HTML version deleted]]
>>>>>
>>>>> ______________________________________________R-help using r-project.org mailing list -- To UNSUBSCRIBE and more, seehttps://urldefense.com/v3/__https://stat.ethz.ch/mailman/listinfo/r-help__;!!KwNVnqRv!QW1WPKY5eSNT7sMW28dnAKV7IXWvIc4UwOwUHkJgJ8uuGUrIAXvRjZWVXhZB_0c$
>>>>> PLEASE do read the posting guide https://urldefense.com/v3/__http://www.R-project.org/posting-guide.html__;!!KwNVnqRv!QW1WPKY5eSNT7sMW28dnAKV7IXWvIc4UwOwUHkJgJ8uuGUrIAXvRjZWVRmZSfcI$
>>>>> and provide commented, minimal, self-contained, reproducible code.
>>>>>
>>>>>

	[[alternative HTML version deleted]]



More information about the R-help mailing list