[R] (no subject)
David Winsemius
dwinsemius at comcast.net
Fri Jul 15 02:57:14 CEST 2011
On Jul 14, 2011, at 6:15 PM, Tyler Rinker wrote:
>
>
> Good Afternoon R Community,
>
> I often work with very large data bases and want to search for
> select cases by a particular word or numeric value. I created the
> following simple function to do just that. It searchs a particular
> column for the phrase and returns a data frame with the rows that
> contain that phrase (for a particular column).
>
> Search<-function(term, dataframe, column.name, variation=.02,...){
> te<-substitute(term)
> te<-as.character(te)
> cn<-substitute(column.name)
> cn<-as.character(cn)
> HUNT<-agrep(te,dataframe[,cn],ignore.case
> =TRUE,max.distance=variation,...)
> ### dataframe[c(HUNT),]
HUNTL <- (1:NROW(dataframe) %in% HUNT)
> }
>
You would make life simpler by keeping your results as logical vectors
the same length as your dataframe.
Then:
logHunt <- sapply(dfrmname, Search, term=term, )
indexL <- rowSums(logHunt) >=1
dfrmname[indexL, ]
Untested in absence of test data.
--
David.
> I would like to modify this to search all columns for the phrase
> keep only the unique rows and return a data frame for any columns
> (minus repeated rows) that contain the phrase.
>
> I assumed this would be an easy task for me using sapply() and
> unique() or union(). Because this argument takes more than one
> argument (vector{column} is not the only argument) I dont know how
> to set it up. Could someone tell me how to apply this function to
> multiple columns and return one data frame with all the agrep
> matches (Ill figure out how to deal with duplicates after that;
> thats the easy part).
>
> Thank you in advance for your help,
> Tyler Rinker
>
> PS if your idea is a for loop please explain it well or provide the
> code because I do not have a programming background and for loops
> are very difficult to wrap my head around.
>
> Running windows 7
> R version 2.14.0 (beta)
> [[alternative HTML version deleted]]
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
David Winsemius, MD
West Hartford, CT
More information about the R-help
mailing list