[Rd] alternative read.arff function for the package foreign
Juan Manuel Barreneche
jumanbar at gmail.com
Thu Oct 22 02:45:23 CEST 2015
Hello everyone, I guess this is really directed to the R Core Team, but I
understand that this is the best channel to submit this (please correct me
if I'm wrong!).
I would like to submit a function to consideration, as an upgrade for the
current read.arff in package foreign. Code in github:
https://raw.githubusercontent.com/jumanbar/misc/master/R/read.arff.R
This function is a modified version of the one found in the foreign
package. This changes aim to correct a problem I found with the standard
read.arff: levels in factors do not match what's explicitly written in the
original arff file.
For example, if a nominal attribute in some arff datafile has this line in
the header:
@attribute X {'A', 'B', 'C'}
But the data only have instances of 'A' and 'B', but not 'C', then what R
imports is:
dat <- read.arff("data.arff")
levels(dat$X)
[1] "a" "b"
Not only the levels are in lowercase, but also there is one level which has
disappeared. This is troublesome, specially if I wish to export my data
frame to an arff file using write.arff.
With this version of read.arff, when dealing with the aforementioned case,
I get:
levels(dat$X)
[1] "A" "B" "C"
And also I can set a couple of parameters which can help me tune up my work
flow to better fit my needs (for example, reading only a limited number of
lines, since I just want to make a couple of fast tests and therefore, I
don't need the whole dataset).
Thanks for your time,
Juan Manuel
--
MSc. Juan M. Barreneche Sarasola
[[alternative HTML version deleted]]
More information about the R-devel
mailing list