[R] (no subject)

David Winsemius dwinsemius at comcast.net
Fri Jul 15 02:57:14 CEST 2011


On Jul 14, 2011, at 6:15 PM, Tyler Rinker wrote:

>
>
> Good Afternoon R Community,
>
> I often work with very large data bases and want to search for  
> select cases by a particular word or numeric value.  I created the  
> following simple function to do just that.  It searchs a particular  
> column for the phrase and returns a data frame with the rows that  
> contain that phrase (for a particular column).
>
> Search<-function(term, dataframe, column.name, variation=.02,...){
>    te<-substitute(term)
>       te<-as.character(te)
>   cn<-substitute(column.name)
>      cn<-as.character(cn)
>          HUNT<-agrep(te,dataframe[,cn],ignore.case  
> =TRUE,max.distance=variation,...)
>    ### dataframe[c(HUNT),]

    HUNTL <- (1:NROW(dataframe) %in% HUNT)

> }
>

You would make life simpler by keeping your results as logical vectors  
the same length as your dataframe.

Then:

  logHunt <-  sapply(dfrmname, Search, term=term, )
      indexL <- rowSums(logHunt) >=1
     dfrmname[indexL, ]

Untested in absence of test data.

-- 
David.


> I would like to modify this to search all columns for the phrase  
> keep only the unique rows and return a data frame for any columns  
> (minus repeated rows) that contain the phrase.
>
> I assumed this would be an easy task for me using sapply() and  
> unique() or union().  Because this argument takes more than one  
> argument (vector{column} is not the only argument) I don’t know how  
> to set it up.  Could someone tell me how to apply this function to  
> multiple columns and return one data frame with all the agrep  
> matches (I’ll figure out how to deal with duplicates after that;  
> that’s the easy part).
>
> Thank you in advance for your help,
> Tyler Rinker
>
> PS if your idea is a for loop please explain it well or provide the  
> code because I do not have a programming background and for loops  
> are very difficult to wrap my head around.
>
> Running windows 7
> R version 2.14.0 (beta) 		 	   		
> 	[[alternative HTML version deleted]]
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

David Winsemius, MD
West Hartford, CT



More information about the R-help mailing list