[R] Row exclude

David Carlson dc@r|@on @end|ng |rom t@mu@edu
Sat Jan 29 07:30:42 CET 2022


Given that you know which columns should be numeric and which should be
character, finding characters in numeric columns or numbers in character
columns is not difficult. Your data frame consists of three character
columns so you can use regular expressions as Bert mentioned. First you
should strip the whitespace out of your data:

dat1 <-read.table(text="Name, Age, Weight
  Alex,  20,  13X
  Bob,  25,  142
  Carol, 24,  120
  John,  3BC,  175
  Katy,  35,  160
  Jack3, 34,  140",sep=",", header=TRUE, stringsAsFactors=FALSE,
strip.white=TRUE)

Now check to see if all of the fields are character as expected.

sapply(dat1, typeof)
#        Name         Age      Weight
# "character" "character" "character"

Now identify character variables containing numbers and numeric variables
containing characters:

BadName <- which(grepl("[[:digit:]]", dat1$Name))
BadAge <- which(grepl("[[:alpha:]]", dat1$Age))
BadWeight <- which(grepl("[[:alpha:]]", dat1$Weight))

Next remove those rows:

(dat2 <- dat1[-unique(c(BadName, BadAge, BadWeight)), ])
#    Name Age Weight
#  2   Bob  25    142
#  3 Carol  24    120
#  5  Katy  35    160

You still need to convert Age and Weight to numeric, e.g. dat2$Age <-
as.numeric(dat2$Age).

David Carlson


On Fri, Jan 28, 2022 at 11:59 PM Bert Gunter <bgunter.4567 using gmail.com> wrote:

> As character 'polluted' entries will cause a column to be read in (via
> read.table and relatives) as factor or character data, this sounds like a
> job for regular expressions. If you are not familiar with this subject,
> time to learn. And, yes, ZjQcmQRYFpfptBannerStart
> This Message Is From an External Sender
> This message came from outside your organization.
> ZjQcmQRYFpfptBannerEnd
>
> As character 'polluted' entries will cause a column to be read in (via
> read.table and relatives) as factor or character data, this sounds like a
> job for regular expressions. If you are not familiar with this subject,
> time to learn. And, yes, some heavy lifting will be required.
> See ?regexp for a start maybe? Or the stringr package?
>
> Cheers,
> Bert
>
>
>
>
> On Fri, Jan 28, 2022, 7:08 PM Val <valkremk using gmail.com> wrote:
>
> > Hi All,
> >
> > I want to remove rows that contain a character string in an integer
> > column or a digit in a character column.
> >
> > Sample data
> >
> > dat1 <-read.table(text="Name, Age, Weight
> >  Alex,  20,  13X
> >  Bob,   25,  142
> >  Carol, 24,  120
> >  John,  3BC,  175
> >  Katy,  35,  160
> >  Jack3, 34,  140",sep=",",header=TRUE,stringsAsFactors=F)
> >
> > If the Age/Weight column contains any character(s) then remove
> > if the Name  column contains an digit then remove that row
> > Desired output
> >
> >    Name   Age weight
> > 1   Bob     25    142
> > 2   Carol   24    120
> > 3   Katy    35    160
> >
> > Thank you,
> >
> > ______________________________________________
> > R-help using r-project.org mailing list -- To UNSUBSCRIBE and more, see
> > https://urldefense.com/v3/__https://stat.ethz.ch/mailman/listinfo/r-help__;!!KwNVnqRv!QW1WPKY5eSNT7sMW28dnAKV7IXWvIc4UwOwUHkJgJ8uuGUrIAXvRjZWVXhZB_0c$
> > PLEASE do read the posting guide
> > https://urldefense.com/v3/__http://www.R-project.org/posting-guide.html__;!!KwNVnqRv!QW1WPKY5eSNT7sMW28dnAKV7IXWvIc4UwOwUHkJgJ8uuGUrIAXvRjZWVRmZSfcI$
> > and provide commented, minimal, self-contained, reproducible code.
> >
>
> 	[[alternative HTML version deleted]]
>
> ______________________________________________R-help using r-project.org mailing list -- To UNSUBSCRIBE and more, seehttps://urldefense.com/v3/__https://stat.ethz.ch/mailman/listinfo/r-help__;!!KwNVnqRv!QW1WPKY5eSNT7sMW28dnAKV7IXWvIc4UwOwUHkJgJ8uuGUrIAXvRjZWVXhZB_0c$
> PLEASE do read the posting guide https://urldefense.com/v3/__http://www.R-project.org/posting-guide.html__;!!KwNVnqRv!QW1WPKY5eSNT7sMW28dnAKV7IXWvIc4UwOwUHkJgJ8uuGUrIAXvRjZWVRmZSfcI$
> and provide commented, minimal, self-contained, reproducible code.
>
>

	[[alternative HTML version deleted]]



More information about the R-help mailing list