[R] Cleaning

Sarah Goslee sarah.goslee at gmail.com
Thu Nov 12 01:02:36 CET 2015


Hi,

On Wed, Nov 11, 2015 at 6:51 PM, Ashta <sewashm at gmail.com> wrote:
> Hi all,
>
> I have a data frame with  huge rows and columns.
>
> When I looked at the data,  it has several garbage values need to be
>
> cleaned. For a sample I am showing you the frequency distribution
> of one variables
>
>     Var1 Freq
> 1    :    3
> 2    ]    6
> 3    MSN 1040
> 4    YYZ  300
> 5    \\    4
> 6    +     3
> 7.   ?>   15

Please use dput() to provide your data. I made a guess at what you had
in R, but could be wrong.


> and continues.
>
> I want to keep those rows that contain only a valid variable value
>
> In this  case MSN and YYZ. I tried the following
>
> *test <- dat[dat$Var1 == "YYZ" | dat$Var1 =="MSN" ,]*
>
> but I am not getting the desired result.

What are you getting? How does it differ from the desired result?

>  I have
>
> Any help or idea?

I get:

> dat <- structure(list(X = 1:7, Var1 = c(":", "]", "MSN", "YYZ", "\\\\",
+ "+", "?>"), Freq = c(3L, 6L, 1040L, 300L, 4L, 3L, 15L)), .Names = c("X",
+ "Var1", "Freq"), class = "data.frame", row.names = c(NA, -7L))
>
> test <- dat[dat$Var1 == "YYZ" | dat$Var1 =="MSN" ,]
> test
  X Var1 Freq
3 3  MSN 1040
4 4  YYZ  300

Which seems reasonable to me.


>
>         [[alternative HTML version deleted]]

Please don't post in HTML either: it introduces all sorts of errors to
your message.

Sarah

-- 
Sarah Goslee
http://www.functionaldiversity.org



More information about the R-help mailing list