[R] merging corpora and metadata
Henri-Paul Indiogine
hindiogine at gmail.com
Thu Nov 17 22:43:30 CET 2011
Greetings!
I loose all my metadata after concatenating corpora. This is an
example of what happens:
> meta(corpus.1)
MetaID cid fid selfirst selend fname
1 0 1 11 2169 2518 WCPD-2001-01-29-Pg217.scrb
2 0 1 14 9189 9702 WCPD-2003-01-13-Pg39.scrb
3 0 1 14 2109 2577 WCPD-2003-01-13-Pg39.scrb
....
....
17 0 1 114 17863 18256 WCPD-2007-04-30-Pg515.scrb
> meta(corpus.2)
MetaID cid fid selfirst selend fname
1 0 2 2 11016 11600 DCPD-200900595.scrb
2 0 2 6 19510 20098 DCPD-201000636.scrb
3 0 2 6 23935 24573 DCPD-201000636.scrb
....
....
94 0 2 127 16225 17128 WCPD-2009-01-12-Pg22-3.scrb
> tot.corpus <- c(corpus.1, corpus.2)
> meta(tot.corpus)
MetaID
1 0
2 0
3 0
....
....
111 0
>
This is from the structure of corpus.1
..$ MetaData:List of 2
.. ..$ create_date: POSIXlt[1:1], format: "2011-11-17 21:09:57"
.. ..$ creator : chr "henk"
..$ Children: NULL
..- attr(*, "class")= chr "MetaDataNode"
- attr(*, "DMetaData")='data.frame': 17 obs. of 6 variables:
..$ MetaID : num [1:17] 0 0 0 0 0 0 0 0 0 0 ...
..$ cid : int [1:17] 1 1 1 1 1 1 1 1 1 1 ...
..$ fid : int [1:17] 11 14 14 17 46 80 80 80 91 91 ...
..$ selfirst: num [1:17] 2169 9189 2109 8315 9439 ...
..$ selend : num [1:17] 2518 9702 2577 8881 10102 ...
..$ fname : chr [1:17] "WCPD-2001-01-29-Pg217.scrb"
"WCPD-2003-01-13-Pg39.scrb" "WCPD-2003-01-13-Pg39.scrb"
"WCPD-2004-05-17-Pg856.scrb" ...
- attr(*, "class")= chr [1:3] "VCorpus" "Corpus" "list"
Any idea on what I could do to keep the metadata in the merged corpus?
Thanks,
Henri-Paul
--
Henri-Paul Indiogine
Curriculum & Instruction
Texas A&M University
TutorFind Learning Centre
Email: hindiogine at gmail.com
Skype: hindiogine
Website: http://people.cehd.tamu.edu/~sindiogine
More information about the R-help
mailing list