[R] Data Frame Manipulation using function

David Winsemius dwinsemius at comcast.net
Fri Jul 9 04:25:54 CEST 2010


On Jul 8, 2010, at 10:09 PM, harsh yadav wrote:

> Hi,
>
> Here is a somewhat detailed explanation of what I want to achieve:
>
> I have a data frame:
>
>      id     url
> urlType
> 1     1      www.yahoo.com                                    1
> 2     2      www.google.com/?search=                     2
> 3     3      www.google.com                                   1
> 4     4      www.yahoo.com/?query=                       2
> 5     5      www.gmail.com                                     1
>
> I want to get all the URLs that are not of type `1` and satisfy the
> condition defined by the following function:
>
> checkBaseLine <- function(s){
> for (listItem in WHITELIST){
> if(regexpr(as.character(listItem), s)[1] > -1){
> return(TRUE)
> }
> }
> return(FALSE)
> }
>
> Here is the definition for WHITELIST:-
>
> WHITELIST = "[?]query=, [?]search="
> WHITELIST <- unlist(trim(strsplit(trim(WHITELIST), ",")))
>
> Now, for the given data frame I want to apply the above function for
> all row values for a given column:-
>
> That is:
>
> It works fine when I define a condition like:
> data <- data[data$urlType != 1,]

Arrrgh. Why do people keep using "data" as an object name? Is there  
some water pump from which I can remove the handle?

Anyway ... try:

vcheck <- Vectorize(V)

data[ data$urlType != 1 & vcheck(data$url) , "url" ]

-- 
David
>
> However, I want to combine two logical conditions together like:
> data <- data[data$urlType != 1 & checkBaseLine(data$url),]
>
> This would check whether the column `urlType` contains row values  
> that !=
> 1, and the column `url` contains row values that satisfy the function
> definition.
>
> Any ideas how this can be done?
>
> Thanks in advance.
>
> Regards,
> Harsh Yadav
>
>
> On Thu, Jul 8, 2010 at 9:43 PM, Erik Iverson <eriki at ccbr.umn.edu>  
> wrote:
>
>> It will be a lot easier to help you if you follow the posting guide  
>> and
>> PLEASE do read the posting guide and provide commented, minimal,
>> self-contained, reproducible code.
>>
>> You gave your function definition, which is good.  Use ?dput to  
>> give us a
>> small data.frame that can accurately show what you want.
>>
>>
>> harsh yadav wrote:
>>
>>> Hi all,
>>>
>>> I have a data frame for which I want to limit the output by checking
>>> whether
>>> row values for specific column meets particular conditions.
>>>
>>> Here are the more specific details:
>>>
>>> I have a function that checks whether an input string exists in a  
>>> defined
>>> list:-
>>>
>>> checkBaseLine <- function(s){
>>> for (listItem in WHITELIST){
>>> if(regexpr(as.character(listItem), s)[1] > -1){
>>> return(TRUE)
>>> }
>>> }
>>> return(FALSE)
>>> }
>>>
>>> Now, I have a data frame for which I want to apply the above  
>>> function for
>>> all row values for a given column:-
>>>
>>> This works fine when I define a condition like:
>>> data <- data[data$urlType != 1,]
>>>
>>> However, I want to combine two logical conditions together like:
>>> data <- data[data$urlType != 1 & checkBaseLine(data$url),]
>>>
>>> This would check whether the column `urlType` contains row values  
>>> that !=
>>> 1,
>>> and the column `url` contains row values that gets evaluated using  
>>> the
>>> defined function.
>>>
>>> Any ideas how this can be done?
>>>
>>> Thanks in advance.
>>>
>>> Regards,
>>> Harsh Yadav

>

David Winsemius, MD
West Hartford, CT



More information about the R-help mailing list