[R] Cleaning
Sarah Goslee
sarah.goslee at gmail.com
Thu Nov 12 01:02:36 CET 2015
Hi,
On Wed, Nov 11, 2015 at 6:51 PM, Ashta <sewashm at gmail.com> wrote:
> Hi all,
>
> I have a data frame with huge rows and columns.
>
> When I looked at the data, it has several garbage values need to be
>
> cleaned. For a sample I am showing you the frequency distribution
> of one variables
>
> Var1 Freq
> 1 : 3
> 2 ] 6
> 3 MSN 1040
> 4 YYZ 300
> 5 \\ 4
> 6 + 3
> 7. ?> 15
Please use dput() to provide your data. I made a guess at what you had
in R, but could be wrong.
> and continues.
>
> I want to keep those rows that contain only a valid variable value
>
> In this case MSN and YYZ. I tried the following
>
> *test <- dat[dat$Var1 == "YYZ" | dat$Var1 =="MSN" ,]*
>
> but I am not getting the desired result.
What are you getting? How does it differ from the desired result?
> I have
>
> Any help or idea?
I get:
> dat <- structure(list(X = 1:7, Var1 = c(":", "]", "MSN", "YYZ", "\\\\",
+ "+", "?>"), Freq = c(3L, 6L, 1040L, 300L, 4L, 3L, 15L)), .Names = c("X",
+ "Var1", "Freq"), class = "data.frame", row.names = c(NA, -7L))
>
> test <- dat[dat$Var1 == "YYZ" | dat$Var1 =="MSN" ,]
> test
X Var1 Freq
3 3 MSN 1040
4 4 YYZ 300
Which seems reasonable to me.
>
> [[alternative HTML version deleted]]
Please don't post in HTML either: it introduces all sorts of errors to
your message.
Sarah
--
Sarah Goslee
http://www.functionaldiversity.org
More information about the R-help
mailing list