[R] Help with

Greg Snow 538280 at gmail.com
Thu Oct 18 19:35:41 CEST 2012


Another option would be to read the data using read.table or similar
to get the data into a data frame then use the xtabs function,
something like:

result <- xtabs( count ~ docID + wordID, data=mydf)



On Thu, Oct 18, 2012 at 6:44 AM, Rui Esteves <ruimaximo at gmail.com> wrote:
> Hi,
>
> I downloaded a dataset from UCI repositories named Bag of Words:
> http://archive.ics.uci.edu/ml/machine-learning-databases/bag-of-words/readme.txt
>
>
> The dataset is in a text file with the following structure:
> ---
>
> docID1 wordID1 count
> docID1 wordID2 count
> docID1 wordID3 count
> docID1 wordID4 count
> ...
> docID2 wordID2 count
> docID2 wordID5 count
> docID2 wordID6 count
> ---
>
> Where docIDx is an integer that identifies the document x; wordIDy is
> an integer that identifies the word y ; and count is an integer with
> the number of times that the wordIDy appears in the docIDx.
>
>
> Example:
>
> ---
>
> 1 1 3
> 1 2 54
> 1 3 11
> 1 4 17
> 2 1 5
> 2 4 78
> 2 5 20
> ---
>
> I would like to import the file into a matrix (not sparse) where:
>
> the wordIDy would correspond to the column [,y]
>
> the docIDx would correspond to the row [x,]
>
> the value in [x,y] would be the count of wordIDy in the docIDx
>
> So, for the previous example it would be like:
>
>
>     [,1][,2][,3][,4][,5]
>
> [1,]  3   54  11 17   0
>
> [2,]  5    0   0 78  20
>
>
> I don1t have a clue about how to do this.
>
> Can someone please help me?
>
> Thank you
>
> Rui
>
>         [[alternative HTML version deleted]]
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.



-- 
Gregory (Greg) L. Snow Ph.D.
538280 at gmail.com




More information about the R-help mailing list