[R] Problem with lsa package (data.frame) on Windows XP

Tine Stalmans tine_stalmans at hotmail.com
Mon Aug 20 20:33:28 CEST 2007


Dear Uwe,

Thanks very much for your prompt reply.

I include the following pieces of information, alongside a zip file with two 
folders where the corpus resides.

###############################
##Full reproducible code:
################################
library("lsa")

# load training  text
matrix1 = textmatrix("C:\\Documents and Settings\\tine 
stalmans.TINE.000\\LSA\\cuentos\\", stemming=TRUE, language="spanish", 
minWordLength=2, minDocFreq=1, stopwords=NULL, vocabulary=NULL)
print(matrix1,bag_lines = 3, bag_cols = 3)
matrix1 = lw_bintf(matrix1) * gw_idf(matrix1) # weighting
space = lsa(matrix1, dims = dimcalc_share()) # create LSA space
#as.textmatrix(space)

# fold-in test and gold standard essays
matrix2 = textmatrix("C:\\Documents and Settings\\tine 
stalmans.TINE.000\\LSA\\respuestas\\", stemming=TRUE, language="spanish", 
minWordLength=2, minDocFreq=1, stopwords=NULL,vocabulary=rownames(matrix1))
matrix2 = lw_bintf(matrix2) # da NaN si se agrega el idf porque divide entre 
0
matrix2fld = fold_in(matrix2, space)
r <- cor(matrix2fld[,"respId1.txt"], matrix2fld[,"respAl1.txt"], method = 
"pearson") #use = "complete.obs", method = "pearson");
print(r)

######################
#end code
########################


I tried to run a traceback, however when including this command in the code, 
it didn't change the original error message.

###########################
#R output, including error message:
###################################

>source("C:\\Documents and Settings\\tine stalmans.TINE.000\\LSA\\lsa.R")
$matrix
              D1 D2 D3 D8 D9 D10 D13 D14 D15
1. 11          1  0  0  0  0   0   0   0   0
2. 1493        1  0  0  0  0   0   0   0   0
3. 1503        1  0  0  0  0   0   0   0   0
896. voy       0  0  0  0  2   0   1   0   0
897. vuelv     0  0  0  0  0   0   0   0   0
898. yo        0  0  0  0  0   0   0   0   0
1790. unic     0  0  0  0  0   0   0   0   1
1791. verific  0  0  0  0  0   0   0   0   1
1792. vier     0  0  0  0  0   0   0   0   1

$legend
[1] "D1 = paraR_1.txt"  "D2 = paraR_10.txt" "D3 = paraR_11.txt"
[4] "D8 = paraR_2.txt"  "D9 = paraR_3.txt"  "D10 = paraR_4.txt"
[7] "D13 = paraR_7.txt" "D14 = paraR_8.txt" "D15 = paraR_9.txt"

Error in data.frame(docs = basename(file), terms = names(tab), Freq = tab,  
:
        arguments imply differing number of rows: 1, 0
In addition: There were 16 warnings (use warnings() to see them)

##########################
#end output
##############################

R version: R 2.5.1 (running on Windows XP)
LSA package: lsa_0.57
Rstem package 0.3-0 (available at www.omagehat.org/Rstem/)

Thanks in advance for your advice.

Tina.

 >From: "Uwe Ligges" <ligges at statistik.uni-dortmund.de>
 >To: "Walter Rojas" <walterrojas at mac.com>
 >Cc: <r-help at stat.math.ethz.ch>
 >Date: August 19, 2007 08:45:28 AM PDT
 >Subject: Re: [R] Problem with lsa package (data.frame) on Windows XP
 >
 >Please specify reproducible examples, it is almost impossible to help
 >otherwise. Also, please provide all error messages and a traceback().
 >Please tell us versions of R and versions of the packages you are using.
 >If you are sure this is an error in the package, please send that
 >reproducible example to the package maintainer.
 >
 >Uwe Ligges
 >
 >
 >Walter Rojas wrote:
 >> Dear R team,
 >>
 >> The following piece of code (to use the lsa package) works fine on my
 >> mac os x, but when I run the same code on Windows XP, it doesn't work
 >> any more.
 >>
 >> ### code:
 >> library("lsa")
 >> matrix1 = textmatrix("C:\\Documents and Settings\\tine stalmans.TINE.
 >> 000\\LSA\\cuentos\\", stemming=TRUE, language="spanish",
 >> minWordLength=2, minDocFreq=1, stopwords=NULL, vocabulary=NULL)
 >> print(matrix1,bag_lines = 3, bag_cols = 3)
 >> matrix1 = lw_bintf(matrix1) * gw_idf(matrix1)
 >> space = lsa(matrix1, dims = dimcalc_share())
 >> as.textmatrix(space)
 >>
 >> ### the following line fails on windows XP
 >> matrix2 = textmatrix("C:\\Documents and Settings\\tine stalmans.TINE.
 >> 000\\LSA\\respuestas\\", stemming=TRUE, language="spanish",
 >> minWordLength=2, minDocFreq=1, stopwords=NULL,vocabulary=rownames
 >> (matrix1))
 >> matrix2 = lw_bintf(matrix2)
 >> matrix2fld = fold_in(matrix2, space)
 >> r <- cor(matrix2fld[,"respId1.txt"], matrix2fld[,"respAl1.txt"],
 >> method = "pearson")
 >> print(r)
 >>
 >>
 >> An error occurs when creating the second textmatrix with the
 >> vocabulary of the first. The error I get is:
 >>
 >> in data.frame(docs = basename(file), terms = names(tab), Freq = tab,  :
 >>          arguments imply differing number of rows: 1, 0
 >>
 >> When I change the vocabulary argument to NULL, it doesn't report this
 >> error any more; however, then the code will fail on the fold_in
 >> method further down.
 >>
 >> I found another user who reported this same problem on-line; however,
 >> I didn't find any answers.
 >>
 >> Thank you very much in advance for your reply.
 >> Tine.
 >>
 >> ______________________________________________
 >> R-help at stat.math.ethz.ch mailing list
 >> https://stat.ethz.ch/mailman/listinfo/r-help
 >> PLEASE do read the posting guide 
http://www.R-project.org/posting-guide.html
 >> and provide commented, minimal, self-contained, reproducible code.
 >
 >

_________________________________________________________________
Descubre la descarga digital con MSN Music. Más de un millón de canciones. 



More information about the R-help mailing list