[R] R 1.2.1 - read.table - factors problem or is it a data.frame problem

gordon.harrington@uni.edu gordon.harrington at uni.edu
Wed Jan 31 10:11:37 CET 2001

Patrick Connolly refers to the read.table help manual page to show how to
coerce input columns to character or to numeric. Indeed coercion with a logical
vector will set the mode regardless of the column content. He also notes one
can set factors with factor(). 

However, the problem encountered is not one of setting factors but of unsetting
them. The manual states that variables of mode or type character will become
factors. My data input efforts showed no relationship between type and factor.
With no evident reason, most character variables did not become factors while
many real variables did. It is a bit disconcerting to get an output with
thousands of floating point factor levels or error messages that one's data are
of the wrong mode for any analysis whatsoever.

How does one unset mode assignment of factor and how does one avoid the problem
of automatic misassignment with other datasets?


> |> 
> |> R-1.2.1 Suse 7.0 binary
> |> 
> |> > fooframe <- read.table("foo", header=FALSE, as.is=c(1:22,398),
> |> col.names=foo.colheads)
> |> 
> |> cols 1-9 are alphabetic, 10-22 and 398 are numbers but unordered
> categorical |>      23-375 are numeric with and without decimal points
> |> 
> |> As I read the description the "as.is" index numbers should force those
> columns |> to be "character" and "factor". However only the 1-9 alpha
> become "character" |> but they did not become "factor". Everything else
> shows mode "numeric" but 
> Here is your explanation:
>    as.is: the default behavior of `read.table' is to convert
>           non-numeric variables to factors.  The variable `as.is'
>           controls this conversion.  Its value is either a vector of
>           logicals (values are recycled if necessary), or a vector of
>           numeric indices which specify which columns should be left as
>           character strings.
> Since your column 10, etc are not character, as.is will not have an
> effect on them.  I think it is simple enough to convert numeric
> columns into factors (as distinct from continuous variables) with
> factor().
> |> "is.factor" distributes TRUE to various variables in no pattern
> discernible to |> me either in distribution or in the data content of the
> columns. (I tried |> giving as.is a type vector but that just made
> everything "numeric" with no |> pattern to factors.) No "as.is" parameter
> still leaves the odd distribution of |> factors.
> |> 
> |> The main effects are that for some statistical functions on data
> subsets, one |> is warned one cannot perform the operations on categorical
> data while others |> stop for NA's. There are no NA's in the dataset!
> Running "unique" on each |> variate and collecting outside the frame shows
> adequate dispersion for analysis |> with no zero variances. "cor" will only
> run "pairwise" though "complete.cases" |> finds no NA's.
> |> 
> |> What am I missing?
> My guess is that something unplanned is happening when you try as.is
> on numeric columns.

Gordon M. Harrington		Mail:	3720 Village Place, #6308
Professor Emeritus			Waterloo, IA 50702-5848
University of Northern Iowa 	Phone:	319-291-8535
gordon.harrington at uni.edu	Fax:	319-291-8491
dryfly at aya.yale.edu			319-291-8324

r-help mailing list -- Read http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html
Send "info", "help", or "[un]subscribe"
(in the "body", not the subject !)  To: r-help-request at stat.math.ethz.ch

More information about the R-help mailing list