[R] filtering out unwanted words in a Term Document Matrix

Ingo Feinerer feinerer at logic.at
Mon May 16 12:27:12 CEST 2011


> Hi Y'all,
> 
> I am using the text mining package (tm). I am trying to filter out all of the words in a Term Document Matrix that are not in a list of words that I am interested in.  I am using the following code:
> 
> z<-tm_intersect(txt.dtm, c("communications", "safety", "climate", "blood", "surface", "cleanliness", "amenities", "monitoring", "staff", "competency", "policy", "procedure", "inconsistency", "physician", "orders", "treatment", "times", "care", "plan", "strategies", "concerns", "meetings", "equipment", "treatment", "options", "delivery", "care", "discharge", "welfare", "violations", "HIPPS", "professionalism", "lack", "boundaries crossing", "transportation", "benefits", "assistance", "beneficiary", "complaint", "grievance", "inquiry", "formal", "data", "processing", "concern", "facility", "abuse", "data", "request", "disruptive", "information", "patient", "discharge", "transfer", "physical", "ethics", "resolution", "professional","reimbursement", "financial", "request", "status", "educational", "material", "forms", "technical", "assistance", "staff", "related", "quality", "care","disruptive","behavior","special","needs","mental","illness","noncompliance","illegal", "immigrant", "abusive", "violent","litigation", "prisoner", "corporate", "lockout", "disposition", "discharge", "reason"))
> 
> I get the following error:
> 
>   "no applicable method for 'tm_intersect' applied to an object of class "c('TermDocumentMatrix', 'simple_triplet_matrix')" "
> 
> What am I doing wrong?  I'd greatly appreciate any ideas or thoughts on this!!!!  Thank you!!

You can directly subset the matrix, e.g.:

library(tm)
data(crude)
m <- TermDocumentMatrix(crude)
z <- m[c("oil", "zone"),]
inspect(z)

Ensure that you only try to subset for terms occurring in the matrix
as otherwise it will not work. You can get all terms via Terms(m).

Best regards,
  Ingo Feinerer



More information about the R-help mailing list