[R] Row exclude

Avi Gross @v|gro@@ @end|ng |rom ver|zon@net
Sat Jan 29 04:25:09 CET 2022


You may need a few more steps than that, Val.

I commend you for stating your need clearly and showing a reasonable set of test data and spelling out the expected result.


If your data is polluted the way you describe, then read.table() likely will treat those columns as character and not numeric. In your example you want to recognize "13X" as having an X. Similarly "3BC" has a B. Those two columns can be handled by the same technique and later made numeric. You seem to want any numerals in the first column to disqualify it too.

So consider what techniques you have learned and thus what you are allowed to do for such an assignment. Unless we know otherwise, we may assume this is homework of some sort.

We, reading this, have no idea what parts of basic R you can use and I hope nobody jumps in offering tidyverse packages.

So ask yourself how to create one of dozens of ways to make a copy of your data that includes only rows where column 1 follows the rule of containing no digits between 0 and 9. You can use things that say count characters of some kind and compare it to the length of the item, for example. You might use regular expressions. Whatever you do, should remove your sixth row in the example and nothing else.

Can you now take the result and shorten it by removing anything in column 2 using some new technique that shows if there are one or more letters? An example might be to try converting the value to an integer and back to character and seeing if they match. Again, lots of possibilities but you need only one that works.

Can you take that shorter version and repeat pretty much the same filter on column 3?

That should work and if ambitious, you can even find a way to create a compound filter that does all three columns at once.



-----Original Message-----
From: Val <valkremk using gmail.com>
To: r-help using R-project.org (r-help using r-project.org) <r-help using r-project.org>
Sent: Fri, Jan 28, 2022 10:08 pm
Subject: [R] Row exclude

Hi All,

I want to remove rows that contain a character string in an integer
column or a digit in a character column.

Sample data

dat1 <-read.table(text="Name, Age, Weight
 Alex,  20,  13X
 Bob,  25,  142
 Carol, 24,  120
 John,  3BC,  175
 Katy,  35,  160
 Jack3, 34,  140",sep=",",header=TRUE,stringsAsFactors=F)

If the Age/Weight column contains any character(s) then remove
if the Name  column contains an digit then remove that row
Desired output

  Name  Age weight
1  Bob    25    142
2  Carol  24    120
3  Katy    35    160

Thank you,

______________________________________________
R-help using r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

	[[alternative HTML version deleted]]



More information about the R-help mailing list