[R] Removing values containing a specific character
arun
smartpink111 at yahoo.com
Sun Jan 27 20:16:40 CET 2013
Hi,
I tried with bigger dataset.
set.seed(25)
names <- sample(c("bob", "joe", "craig at gmail.com", "emily", "jane at yahoo.com"),5e6,replace=TRUE)
set.seed(1651)
emails
<- sample(c("bobj at cup.com", "joesmith at gmail.com", "craig at gmail.com",
"emily2 at yahoo.com", "jane at yahoo.com"),5e6,replace=TRUE)
df <- data.frame(names, emails)
dim(df)
#[1] 5000000 2
df[]<-lapply(df,as.character)
system.time(df[,1][grep("@",df$names)]<- "" )
# user system elapsed
# 1.732 0.108 1.844
system.time(dfNew1<-df[grep("\\w+",df$names),])
# user system elapsed
# 0.896 0.024 0.923
system.time(dfNew2<- df[df$names!="",])
# user system elapsed
# 0.460 0.028 0.490
A.K.
________________________________
From: Yasha Podeswa <ypodeswa at gmail.com>
To: arun <smartpink111 at yahoo.com>
Cc: R help <r-help at r-project.org>; Uwe Ligges <ligges at statistik.tu-dortmund.de>
Sent: Sunday, January 27, 2013 2:05 PM
Subject: Re: [R] Removing values containing a specific character
You two were 100% right, it was just a memory issue. This was part of a bigger project where I had a number of data frames loaded, all with 1-5 million rows. Cleaned up my code to have less data frames loaded at once, and everything is working great. Thanks for the help!
On Jan 27, 2013 9:46 AM, "arun" <smartpink111 at yahoo.com> wrote:
Hi Yasha,
>
> I guess you got Uwe's response.
>
> I created `df2` with the intention of getting the two results from the original dataset.
>For example, after you get the first result
>df[,1][grep("@",df$names)]<- ""
>#you can get the second result by:
>df[df$names!="",]
> # names emails
>#1 bob bobj at cup.com
>#2 joe joesmith at gmail.com
>#4 emily emily2 at yahoo.com
>
>#or
>df[grep("\\w+",df$names),]
># names emails
>#1 bob bobj at cup.com
>#2 joe joesmith at gmail.com
>#4 emily emily2 at yahoo.com
>
>But, I am not sure how this will work over a 5.5 million rows.
>A.K.
>
>
>
>
>----- Original Message -----
>From: ypodeswa <ypodeswa at gmail.com>
>To: r-help at r-project.org
>Cc:
>Sent: Sunday, January 27, 2013 1:11 AM
>Subject: Re: [R] Removing values containing a specific character
>
>Actually, it worked perfectly for my sample data, but my actual data has
>5.5 million rows, and grep doesn't seem to work with over a million rows.
>Any idea on a workaround?
>
>
>On Sat, Jan 26, 2013 at 9:37 PM, Yasha Podeswa <ypodeswa at gmail.com> wrote:
>
>> Awesome, thanks Arun, that's exactly what I was looking for!
>>
>>
>> On Sat, Jan 26, 2013 at 9:21 PM, arun kirshna [via R] <
>> ml-node+s789695n4656749h63 at n4.nabble.com> wrote:
>>
>>> Hi,
>>> Try this:
>>> df[]<-lapply(df,as.character)
>>> df2<-df
>>> df[,1][grep("@",df$names)]<- ""
>>> df
>>> #names emails
>>> #1 bob bobj at cup.com
>>> #2 joe joesmith at gmail.com
>>> #3 craig at gmail.com
>>> #4 emily emily2 at yahoo.com
>>> #5 jane at yahoo.com
>>>
>>> #2nd part:
>>>
>>> df2[-grep("@",df2$names),]
>>> names emails
>>> #1 bob bobj at cup.com
>>> #2 joe joesmith at gmail.com
>>> #4 emily emily2 at yahoo.com
>>> A.K.
>>>
>>> ------------------------------
>>> If you reply to this email, your message will be added to the
>>> discussion below:
>>>
>>> http://r.789695.n4.nabble.com/Removing-values-containing-a-specific-character-tp4656744p4656749.html
>>> To unsubscribe from Removing values containing a specific character, click
>>> here<http://r.789695.n4.nabble.com/template/NamlServlet.jtp?macro=unsubscribe_by_code&node=4656744&code=eXBvZGVzd2FAZ21haWwuY29tfDQ2NTY3NDR8LTEyMTY0MzM4NDk=>
>>> .
>>> NAML<http://r.789695.n4.nabble.com/template/NamlServlet.jtp?macro=macro_viewer&id=instant_html%21nabble%3Aemail.naml&base=nabble.naml.namespaces.BasicNamespace-nabble.view.web.template.NabbleNamespace-nabble.view.web.template.NodeNamespace&breadcrumbs=notify_subscribers%21nabble%3Aemail.naml-instant_emails%21nabble%3Aemail.naml-send_instant_email%21nabble%3Aemail.naml>
>>>
>>
>>
>
>
>
>
>--
>View this message in context: http://r.789695.n4.nabble.com/Removing-values-containing-a-specific-character-tp4656744p4656751.html
>Sent from the R help mailing list archive at Nabble.com.
> [[alternative HTML version deleted]]
>
>______________________________________________
>R-help at r-project.org mailing list
>https://stat.ethz.ch/mailman/listinfo/r-help
>PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
>and provide commented, minimal, self-contained, reproducible code.
>
>
More information about the R-help
mailing list