[R] Finding unique elements faster

apeshifter ch_koch at gmx.de
Tue Dec 9 11:02:57 CET 2014


Thank you all for your suggestions! I must say I am amazed by the number of
people who consider helping out another! Fells like it was a good idea to
start using R - back when I was still using Perl for such tasks, I'd been
happy to have this kind of support!

@ Gheorghe Postelnicu: Unfortunately, the data is not yet in a data frame
when this part of the program starts. At this point, I am trying to fill in
all the relevant vectors (all.word.pairs, word1, word2, freq.word1,
freq.word2, typefreq.w1, typefreq.w2, ...) and then combine them to a data
frame. I will try to get my head around the doParallel package package for
the foreach loop, since parallel computing would certainly be helpful. 

@ Jeff Newmiller: Sound interesting, but I fear the same problem applies as
for Gheorghe's suggestion. I will need a data frame first,for which I do not
have all the correct values... Will keep the package in mind, though, for
future projects.

@ Stefan Evert-3: I am not sure I understand what you mean in the second
example. Since the counting of types is exactly my problem at the moment, I
do not see how I could provide a function that would work more efficiently
in the context you are describing. The line of code that I was giving is
exactly my attempt at doint this... Sorry, I might just not be getting what
you are aiming at... :-/  However, your assumptions are quite correct. word1
and word2 do indeed contain word tokens, as does all.word.pairs. The reason
for this is that I need the word pairs within the vector to be in the same
order as they appeared in the original corpus files. Also, thank you for the
link. I will check this out when I am analysing collocates. However, I
didn't find notes on my specific problem in the slides. However, please do
not think I was not using reference material for designing my script. I was
in fact using  Gries 2009: "Quantitative Corpus Linguistics with R"
<http://www.amazon.de/Quantitative-Corpus-Linguistics-Practical-Introduction-ebook/dp/B001Y35H5A/ref=sr_1_1?ie=UTF8&qid=1418119630&sr=8-1&keywords=gries+quantitative+corpus+linguistics>  
for this. The trouble is that the methods in the book help as far as simple
n-gram frequency calculations are concerned (since, e.g. table() would just
do the trick), but methods for this size of repeated checks on tables are
not included.

Best,
Christopher



--
View this message in context: http://r.789695.n4.nabble.com/Finding-unique-elements-faster-tp4700539p4700582.html
Sent from the R help mailing list archive at Nabble.com.



More information about the R-help mailing list