[R] Example function for bigglm (biglm) data input from file
Yeh, Richard C
richard.c.yeh at bankofamerica.com
Mon Jan 22 20:01:53 CET 2007
This is to submit a commented example function for use in the data
argument to the bigglm(biglm) function, when you want to read the data
from a file (instead of a URL), or rescale or modify the data before
fitting the model. In the hope that this may be of help to someone out
there.
make.data <- function (filename, chunksize, ...) {
conn<-NULL;
function (reset=FALSE) {
if (reset) {
if (!is.null(conn)) {
close(conn);
};
# This is for a file.
# For other methods, see: help("connections")
# and replace the following definition of conn
# (and possibly the read.table call).
conn <<- file (description=filename, open="r");
} else {
# It's best that the file you use has no header
# line, because when you use the connection to
# read each excerpt, any header won't get re-read.
# If you choose to skip the first line, then the
# first line of each excerpt will be skipped.
rval <- read.table (conn, nrows=chunksize,
skip=0, header=FALSE,...);
if (nrow(rval)==0) {
# Then we have reached the end of the input.
# Clean up:
close(conn);
conn<<-NULL;
rval<-NULL;
} else {
# We did not reach the end of the input,
# so this function will return data.
# Here, you can define any derived fields
# or put instructions to rescale input data
# that you want done after the data are read
# but before they are used for fitting.
# For example:
rval$rescaled_column <- rval$original_column / 1000000.0;
# If you don't want to do anything like this,
# then delete this "else" clause, and make
# the end of the function resemble the URL
# example in bigglm.
};
return(rval);
}
}
};
a <- make.data ( filename = "myfile", chunksize = 1000000,
# In our definition of make.data, any remaining
# arguments get passed to the read.table function by
# the ... argument.
# Define column types:
colClasses = list ("character", "character",
"integer", "numeric", "numeric"),
# Define the column names in the call:
# (recall that we cannot rely on the file header)
col.names = c("fromState", "toState",
"first", "original_column", "second")
);
library(biglm);
bigglm (formula = toState ~ 1 + first + rescaled_column,
data = a, family = binomial(link='logit'),
weights = ~second);
summary(.Last.value)
NOTICE TO RECIPIENTS: Any information contained in or attach...{{dropped}}
More information about the R-help
mailing list