[R] Row exclude

Avi Gross @v|gro@@ @end|ng |rom ver|zon@net
Sat Jan 29 08:26:08 CET 2022


Looks like David decided to supply a detailed answer.
I came up with a very different solution that calculates which row indices to keep using three grep statements and intersecting those to use as an index into the original data.
To each their own


-----Original Message-----
From: David Carlson via R-help <r-help using r-project.org>
To: Bert Gunter <bgunter.4567 using gmail.com>
Cc: r-help using R-project.org (r-help using r-project.org) <r-help using r-project.org>
Sent: Sat, Jan 29, 2022 1:30 am
Subject: Re: [R] Row exclude

Given that you know which columns should be numeric and which should be
character, finding characters in numeric columns or numbers in character
columns is not difficult. Your data frame consists of three character
columns so you can use regular expressions as Bert mentioned. First you
should strip the whitespace out of your data:

dat1 <-read.table(text="Name, Age, Weight
  Alex,  20,  13X
  Bob,  25,  142
  Carol, 24,  120
  John,  3BC,  175
  Katy,  35,  160
  Jack3, 34,  140",sep=",", header=TRUE, stringsAsFactors=FALSE,
strip.white=TRUE)

Now check to see if all of the fields are character as expected.

sapply(dat1, typeof)
#        Name        Age      Weight
# "character" "character" "character"

Now identify character variables containing numbers and numeric variables
containing characters:

BadName <- which(grepl("[[:digit:]]", dat1$Name))
BadAge <- which(grepl("[[:alpha:]]", dat1$Age))
BadWeight <- which(grepl("[[:alpha:]]", dat1$Weight))

Next remove those rows:

(dat2 <- dat1[-unique(c(BadName, BadAge, BadWeight)), ])
#    Name Age Weight
#  2  Bob  25    142
#  3 Carol  24    120
#  5  Katy  35    160

You still need to convert Age and Weight to numeric, e.g. dat2$Age <-
as.numeric(dat2$Age).

David Carlson


On Fri, Jan 28, 2022 at 11:59 PM Bert Gunter <bgunter.4567 using gmail.com> wrote:

> As character 'polluted' entries will cause a column to be read in (via
> read.table and relatives) as factor or character data, this sounds like a
> job for regular expressions. If you are not familiar with this subject,
> time to learn. And, yes, ZjQcmQRYFpfptBannerStart
> This Message Is From an External Sender
> This message came from outside your organization.
> ZjQcmQRYFpfptBannerEnd
>
> As character 'polluted' entries will cause a column to be read in (via
> read.table and relatives) as factor or character data, this sounds like a
> job for regular expressions. If you are not familiar with this subject,
> time to learn. And, yes, some heavy lifting will be required.
> See ?regexp for a start maybe? Or the stringr package?
>
> Cheers,
> Bert
>
>
>
>
> On Fri, Jan 28, 2022, 7:08 PM Val <valkremk using gmail.com> wrote:
>
> > Hi All,
> >
> > I want to remove rows that contain a character string in an integer
> > column or a digit in a character column.
> >
> > Sample data
> >
> > dat1 <-read.table(text="Name, Age, Weight
> >  Alex,  20,  13X
> >  Bob,  25,  142
> >  Carol, 24,  120
> >  John,  3BC,  175
> >  Katy,  35,  160
> >  Jack3, 34,  140",sep=",",header=TRUE,stringsAsFactors=F)
> >
> > If the Age/Weight column contains any character(s) then remove
> > if the Name  column contains an digit then remove that row
> > Desired output
> >
> >    Name  Age weight
> > 1  Bob    25    142
> > 2  Carol  24    120
> > 3  Katy    35    160
> >
> > Thank you,
> >
> > ______________________________________________
> > R-help using r-project.org mailing list -- To UNSUBSCRIBE and more, see
> > https://urldefense.com/v3/__https://stat.ethz.ch/mailman/listinfo/r-help__;!!KwNVnqRv!QW1WPKY5eSNT7sMW28dnAKV7IXWvIc4UwOwUHkJgJ8uuGUrIAXvRjZWVXhZB_0c$
> > PLEASE do read the posting guide
> > https://urldefense.com/v3/__http://www.R-project.org/posting-guide.html__;!!KwNVnqRv!QW1WPKY5eSNT7sMW28dnAKV7IXWvIc4UwOwUHkJgJ8uuGUrIAXvRjZWVRmZSfcI$
> > and provide commented, minimal, self-contained, reproducible code.
> >
>
>     [[alternative HTML version deleted]]
>
> ______________________________________________R-help using r-project.org mailing list -- To UNSUBSCRIBE and more, seehttps://urldefense.com/v3/__https://stat.ethz.ch/mailman/listinfo/r-help__;!!KwNVnqRv!QW1WPKY5eSNT7sMW28dnAKV7IXWvIc4UwOwUHkJgJ8uuGUrIAXvRjZWVXhZB_0c$
> PLEASE do read the posting guide https://urldefense.com/v3/__http://www.R-project.org/posting-guide.html__;!!KwNVnqRv!QW1WPKY5eSNT7sMW28dnAKV7IXWvIc4UwOwUHkJgJ8uuGUrIAXvRjZWVRmZSfcI$
> and provide commented, minimal, self-contained, reproducible code.
>
>

    [[alternative HTML version deleted]]

______________________________________________
R-help using r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

	[[alternative HTML version deleted]]



More information about the R-help mailing list