[R] Problem with lsa package (data.frame) on Windows XP
Tine Stalmans
tine_stalmans at hotmail.com
Mon Aug 20 20:33:28 CEST 2007
Dear Uwe,
Thanks very much for your prompt reply.
I include the following pieces of information, alongside a zip file with two
folders where the corpus resides.
###############################
##Full reproducible code:
################################
library("lsa")
# load training text
matrix1 = textmatrix("C:\\Documents and Settings\\tine
stalmans.TINE.000\\LSA\\cuentos\\", stemming=TRUE, language="spanish",
minWordLength=2, minDocFreq=1, stopwords=NULL, vocabulary=NULL)
print(matrix1,bag_lines = 3, bag_cols = 3)
matrix1 = lw_bintf(matrix1) * gw_idf(matrix1) # weighting
space = lsa(matrix1, dims = dimcalc_share()) # create LSA space
#as.textmatrix(space)
# fold-in test and gold standard essays
matrix2 = textmatrix("C:\\Documents and Settings\\tine
stalmans.TINE.000\\LSA\\respuestas\\", stemming=TRUE, language="spanish",
minWordLength=2, minDocFreq=1, stopwords=NULL,vocabulary=rownames(matrix1))
matrix2 = lw_bintf(matrix2) # da NaN si se agrega el idf porque divide entre
0
matrix2fld = fold_in(matrix2, space)
r <- cor(matrix2fld[,"respId1.txt"], matrix2fld[,"respAl1.txt"], method =
"pearson") #use = "complete.obs", method = "pearson");
print(r)
######################
#end code
########################
I tried to run a traceback, however when including this command in the code,
it didn't change the original error message.
###########################
#R output, including error message:
###################################
>source("C:\\Documents and Settings\\tine stalmans.TINE.000\\LSA\\lsa.R")
$matrix
D1 D2 D3 D8 D9 D10 D13 D14 D15
1. 11 1 0 0 0 0 0 0 0 0
2. 1493 1 0 0 0 0 0 0 0 0
3. 1503 1 0 0 0 0 0 0 0 0
896. voy 0 0 0 0 2 0 1 0 0
897. vuelv 0 0 0 0 0 0 0 0 0
898. yo 0 0 0 0 0 0 0 0 0
1790. unic 0 0 0 0 0 0 0 0 1
1791. verific 0 0 0 0 0 0 0 0 1
1792. vier 0 0 0 0 0 0 0 0 1
$legend
[1] "D1 = paraR_1.txt" "D2 = paraR_10.txt" "D3 = paraR_11.txt"
[4] "D8 = paraR_2.txt" "D9 = paraR_3.txt" "D10 = paraR_4.txt"
[7] "D13 = paraR_7.txt" "D14 = paraR_8.txt" "D15 = paraR_9.txt"
Error in data.frame(docs = basename(file), terms = names(tab), Freq = tab,
:
arguments imply differing number of rows: 1, 0
In addition: There were 16 warnings (use warnings() to see them)
##########################
#end output
##############################
R version: R 2.5.1 (running on Windows XP)
LSA package: lsa_0.57
Rstem package 0.3-0 (available at www.omagehat.org/Rstem/)
Thanks in advance for your advice.
Tina.
>From: "Uwe Ligges" <ligges at statistik.uni-dortmund.de>
>To: "Walter Rojas" <walterrojas at mac.com>
>Cc: <r-help at stat.math.ethz.ch>
>Date: August 19, 2007 08:45:28 AM PDT
>Subject: Re: [R] Problem with lsa package (data.frame) on Windows XP
>
>Please specify reproducible examples, it is almost impossible to help
>otherwise. Also, please provide all error messages and a traceback().
>Please tell us versions of R and versions of the packages you are using.
>If you are sure this is an error in the package, please send that
>reproducible example to the package maintainer.
>
>Uwe Ligges
>
>
>Walter Rojas wrote:
>> Dear R team,
>>
>> The following piece of code (to use the lsa package) works fine on my
>> mac os x, but when I run the same code on Windows XP, it doesn't work
>> any more.
>>
>> ### code:
>> library("lsa")
>> matrix1 = textmatrix("C:\\Documents and Settings\\tine stalmans.TINE.
>> 000\\LSA\\cuentos\\", stemming=TRUE, language="spanish",
>> minWordLength=2, minDocFreq=1, stopwords=NULL, vocabulary=NULL)
>> print(matrix1,bag_lines = 3, bag_cols = 3)
>> matrix1 = lw_bintf(matrix1) * gw_idf(matrix1)
>> space = lsa(matrix1, dims = dimcalc_share())
>> as.textmatrix(space)
>>
>> ### the following line fails on windows XP
>> matrix2 = textmatrix("C:\\Documents and Settings\\tine stalmans.TINE.
>> 000\\LSA\\respuestas\\", stemming=TRUE, language="spanish",
>> minWordLength=2, minDocFreq=1, stopwords=NULL,vocabulary=rownames
>> (matrix1))
>> matrix2 = lw_bintf(matrix2)
>> matrix2fld = fold_in(matrix2, space)
>> r <- cor(matrix2fld[,"respId1.txt"], matrix2fld[,"respAl1.txt"],
>> method = "pearson")
>> print(r)
>>
>>
>> An error occurs when creating the second textmatrix with the
>> vocabulary of the first. The error I get is:
>>
>> in data.frame(docs = basename(file), terms = names(tab), Freq = tab, :
>> arguments imply differing number of rows: 1, 0
>>
>> When I change the vocabulary argument to NULL, it doesn't report this
>> error any more; however, then the code will fail on the fold_in
>> method further down.
>>
>> I found another user who reported this same problem on-line; however,
>> I didn't find any answers.
>>
>> Thank you very much in advance for your reply.
>> Tine.
>>
>> ______________________________________________
>> R-help at stat.math.ethz.ch mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>
>
_________________________________________________________________
Descubre la descarga digital con MSN Music. Más de un millón de canciones.
More information about the R-help
mailing list