[R] Cleaning
Boris Steipe
boris.steipe at utoronto.ca
Thu Nov 12 05:33:44 CET 2015
If what you posted here is what you typed, your syntax is wrong.
I strongly advise you to consult the two links here:
http://adv-r.had.co.nz/Reproducibility.html
http://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example
... and please read the posting guide and don't post in HTML.
B.
On Nov 11, 2015, at 10:03 PM, Ashta <sewashm at gmail.com> wrote:
> Sarah,
>
> Thank you very much. For the other variables
> I was trying to do the same job in different way because it is easier to
> list it
>
> Example
>
> test < which(dat$var1 !="BAA" | dat$var1 !="FAG" )
> {
> dat <- dat[-test,]} and I did not get the right result. What am I
> missing here?
>
>
>
>
>
> On Wed, Nov 11, 2015 at 7:54 PM, Sarah Goslee <sarah.goslee at gmail.com>
> wrote:
>
>> On Wed, Nov 11, 2015 at 8:44 PM, Ashta <sewashm at gmail.com> wrote:
>>> Hi Sarah,
>>>
>>> I used the following to clean my data, the program crushed several times.
>>>
>>> test <- dat[dat$Var1 == "YYZ" | dat$Var1 =="MSN" ,]
>>>
>>> What is the difference between these two
>>>
>>> test <- dat[dat$Var1 %in% "YYZ" | dat$Var1 %in% "MSN" ,]
>>
>> Besides that you're using %in% wrong? I told you how to proceed.
>>
>> myvalues <- c("YYZ", "MSN")
>>
>> test <- subset(dat, Var1 %in% myvalues)
>>
>>
>>> subset(dat, Var1 %in% myvalues)
>> X Var1 Freq
>> 3 3 MSN 1040
>> 4 4 YYZ 300
>>
>>>
>>>
>>>
>>>
>>> On Wed, Nov 11, 2015 at 6:38 PM, Sarah Goslee <sarah.goslee at gmail.com>
>>> wrote:
>>>>
>>>> Please keep replies on the list so others may participate in the
>>>> conversation.
>>>>
>>>> If you have a character vector containing the potential values, you
>>>> might look at %in% for one approach to subsetting your data.
>>>>
>>>> Var1 %in% myvalues
>>>>
>>>> Sarah
>>>>
>>>> On Wed, Nov 11, 2015 at 7:10 PM, Ashta <sewashm at gmail.com> wrote:
>>>>> Thank you Sarah for your prompt response!
>>>>>
>>>>> I have the list of values of the variable Var1 it is around 20.
>>>>> How can I modify this one to include all the 20 valid values?
>>>>>
>>>>> test <- dat[dat$Var1 == "YYZ" | dat$Var1 =="MSN" ,]
>>>>>
>>>>> Is there a way (efficient ) of doing it?
>>>>>
>>>>> Thank you again
>>>>>
>>>>>
>>>>>
>>>>> On Wed, Nov 11, 2015 at 6:02 PM, Sarah Goslee <sarah.goslee at gmail.com
>>>
>>>>> wrote:
>>>>>>
>>>>>> Hi,
>>>>>>
>>>>>> On Wed, Nov 11, 2015 at 6:51 PM, Ashta <sewashm at gmail.com> wrote:
>>>>>>> Hi all,
>>>>>>>
>>>>>>> I have a data frame with huge rows and columns.
>>>>>>>
>>>>>>> When I looked at the data, it has several garbage values need to
>> be
>>>>>>>
>>>>>>> cleaned. For a sample I am showing you the frequency distribution
>>>>>>> of one variables
>>>>>>>
>>>>>>> Var1 Freq
>>>>>>> 1 : 3
>>>>>>> 2 ] 6
>>>>>>> 3 MSN 1040
>>>>>>> 4 YYZ 300
>>>>>>> 5 \\ 4
>>>>>>> 6 + 3
>>>>>>> 7. ?> 15
>>>>>>
>>>>>> Please use dput() to provide your data. I made a guess at what you
>> had
>>>>>> in R, but could be wrong.
>>>>>>
>>>>>>
>>>>>>> and continues.
>>>>>>>
>>>>>>> I want to keep those rows that contain only a valid variable value
>>>>>>>
>>>>>>> In this case MSN and YYZ. I tried the following
>>>>>>>
>>>>>>> *test <- dat[dat$Var1 == "YYZ" | dat$Var1 =="MSN" ,]*
>>>>>>>
>>>>>>> but I am not getting the desired result.
>>>>>>
>>>>>> What are you getting? How does it differ from the desired result?
>>>>>>
>>>>>>> I have
>>>>>>>
>>>>>>> Any help or idea?
>>>>>>
>>>>>> I get:
>>>>>>
>>>>>>> dat <- structure(list(X = 1:7, Var1 = c(":", "]", "MSN", "YYZ",
>>>>>>> "\\\\",
>>>>>> + "+", "?>"), Freq = c(3L, 6L, 1040L, 300L, 4L, 3L, 15L)), .Names =
>>>>>> c("X",
>>>>>> + "Var1", "Freq"), class = "data.frame", row.names = c(NA, -7L))
>>>>>>>
>>>>>>> test <- dat[dat$Var1 == "YYZ" | dat$Var1 =="MSN" ,]
>>>>>>> test
>>>>>> X Var1 Freq
>>>>>> 3 3 MSN 1040
>>>>>> 4 4 YYZ 300
>>>>>>
>>>>>> Which seems reasonable to me.
>>>>>>
>>>>>>
>>>>>>>
>>>>>>> [[alternative HTML version deleted]]
>>>>>>
>>>>>> Please don't post in HTML either: it introduces all sorts of errors
>> to
>>>>>> your message.
>>>>>>
>>>>>> Sarah
>>>>>>
>>>
>>>
>>
>
> [[alternative HTML version deleted]]
>
> ______________________________________________
> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
More information about the R-help
mailing list