[R] Data Frame Manipulation using function
David Winsemius
dwinsemius at comcast.net
Fri Jul 9 05:06:06 CEST 2010
On Jul 8, 2010, at 10:33 PM, Erik Iverson wrote:
>
>> I have a data frame:
>> id
>> url urlType
>> 1 1 www.yahoo.com <http://
>> www.yahoo.com> 1
>> 2 2 www.google.com/?search= <http://www.google.com/?
>> search=> 2
>> 3 3 www.google.com <http://
>> www.google.com> 1
>> 4 4 www.yahoo.com/?query= <http://www.yahoo.com/?
>> query=> 2
>> 5 5 www.gmail.com <http://
>> www.gmail.com> 1
>
> This is not output from ?dput, which means more work to read it in.
>
Yeah it was kind of pain, but ...
dta <- read.table(textConnection(' id
url urlType
1 1 "www.yahoo.com <http://www.yahoo.com>" 1
2 2 "www.google.com/?search= <http://www.google.com/?
search=>" 2
3 3 "www.google.com <http://www.google.com>" 1
4 4 "www.yahoo.com/?query= <http://www.yahoo.com/?query=>" 2
5 5 "www.gmail.com <http://www.gmail.com>" 1') )
>
>> Here is the definition for WHITELIST:-
>> WHITELIST = "[?]query=, [?]search="
>> WHITELIST <- unlist(trim(strsplit(trim(WHITELIST), ",")))
>
> What is the 'trim' function? I do not have that defined.
>
> Perhaps David's answer will work for you...
Seems to ... after I fixed my incorrect cmd-V paste of the function
name and guessing that trim was the one in gdata:
> require(gdata)
> checkBaseLine <- function(s){
+ for (listItem in WHITELIST){
+ if(regexpr(as.character(listItem), s)[1] > -1){
+ return(TRUE)
+ }
+ }
+ return(FALSE)
+ }
>
> #Here is the definition for WHITELIST:-
>
> WHITELIST = "[?]query=, [?]search="
> WHITELIST <- unlist(trim(strsplit(trim(WHITELIST), ",")))
> vcheck <- Vectorize(checkBaseLine)
>
> vcheck <- Vectorize(checkBaseLine)
>
> dta[ dta$urlType != 1 & vcheck(dta$url) , "url" ]
[1] www.google.com/?search= <http://www.google.com/?search=> www.yahoo.com/?query=
<http://www.yahoo.com/?query=>
5 Levels: www.gmail.com <http://www.gmail.com> www.google.com <http://www.google.com
> ... www.yahoo.com/?query= <http://www.yahoo.com/?query=>
--
David.
More information about the R-help
mailing list