[R] Data Frame Manipulation using function

David Winsemius dwinsemius at comcast.net
Fri Jul 9 05:06:06 CEST 2010


On Jul 8, 2010, at 10:33 PM, Erik Iverson wrote:

>
>> I have a data frame:
>>      id      
>> url                                                         urlType
>> 1     1      www.yahoo.com <http:// 
>> www.yahoo.com>                                    1
>> 2     2      www.google.com/?search= <http://www.google.com/? 
>> search=>                     2
>> 3     3      www.google.com <http:// 
>> www.google.com>                                   1
>> 4     4      www.yahoo.com/?query= <http://www.yahoo.com/? 
>> query=>                       2
>> 5     5      www.gmail.com <http:// 
>> www.gmail.com>                                     1
>
> This is not output from ?dput, which means more work to read it in.
>

Yeah it was kind of pain, but ...

dta <- read.table(textConnection('     id      
url                                                         urlType
1     1      "www.yahoo.com <http://www.yahoo.com>"      1
2     2      "www.google.com/?search= <http://www.google.com/? 
search=>" 2
3     3      "www.google.com <http://www.google.com>" 1
4     4      "www.yahoo.com/?query= <http://www.yahoo.com/?query=>"   2
5     5      "www.gmail.com <http://www.gmail.com>" 1') )


>
>> Here is the definition for WHITELIST:-
>> WHITELIST = "[?]query=, [?]search="
>> WHITELIST <- unlist(trim(strsplit(trim(WHITELIST), ",")))
>
> What is the 'trim' function?  I do not have that defined.
>
> Perhaps David's answer will work for you...

Seems to ... after I fixed my incorrect cmd-V paste of the function  
name and guessing that trim was the one in gdata:

 > require(gdata)
 > checkBaseLine <- function(s){
+ for (listItem in WHITELIST){
+ if(regexpr(as.character(listItem), s)[1] > -1){
+ return(TRUE)
+ }
+ }
+ return(FALSE)
+ }
 >
 > #Here is the definition for WHITELIST:-
 >
 > WHITELIST = "[?]query=, [?]search="
 > WHITELIST <- unlist(trim(strsplit(trim(WHITELIST), ",")))
 > vcheck <- Vectorize(checkBaseLine)
 >
 > vcheck <- Vectorize(checkBaseLine)
 >
 > dta[ dta$urlType != 1 & vcheck(dta$url) , "url" ]
[1] www.google.com/?search= <http://www.google.com/?search=> www.yahoo.com/?query= 
  <http://www.yahoo.com/?query=>
5 Levels: www.gmail.com <http://www.gmail.com> www.google.com <http://www.google.com 
 > ... www.yahoo.com/?query= <http://www.yahoo.com/?query=>

-- 
David.



More information about the R-help mailing list