[R] Re: reading in columns of a data set as factors

Thu Apr 27 17:30:38 CEST 2000

Dear Dr. Venables,

Did you mean to write

template <- structure(c(as.list(rep("", 66)), as.list(rep(0, 
33)),names = names))

clus.df <- data.frame(scan("clus.dat", what = template))

(A second bracket seems to be missing after names =names). 

If so, this code certainly loads successfully, and is *much* faster than
mine, but gives rather weird results. clus.df doesn't return the table in
the form you would expect. For example, I get dim(clus.df) [1] 1031 198
(!) I don't understand the structure function, so can't fix it.

Looks like I am stuck with my method for the time being. Does the method I
am using (the loop) look OK, though?

I am planning to buy the V&R third edition, possibly both volumes, the
data analysis one and the programming one, though they are rather
expensive, because I am getting a little desperate. I have had serious
problems finding sufficiently detailed documentation about Splus/R. The
online help is often very cryptic, and much of the other available
documentation is either severely incomplete or outdated. The only other
option appears to be reading the source code, something which I don't feel
quite up to, and which is, in any case, an option currently only available
in R.

Thank you for your response.
                                   Sincerely, Faheem Mitha.

On Thu, 27 Apr 2000, Bill Venables wrote:

> > Dear people,
> > 
> > Replying to my own message here. The following appears to work. I read in
> > the data using read.table and then coerce the columns of the data frame to
> > factors afterwards.
> > 
> > clus.df <- read.table("clus.dat",header= FALSE, row.names=NULL,
> > col.names=names)
> > 
> 
> Ah.  So you don't have the column names as part of the file but
> in a separate vector, 'names'.  The way I would do it would be
> (warning: untested code...)
> 
> template <- structure(c(as.list(rep("", 66)), as.list(rep(0, 
> 33)),
> 			names = names)
> clus.df <- data.frame(scan("clus.dat", what = template))
> 
> I'd be surprised if it made much difference in time or resources,
> though.  Sometimes a loop is OK.
> 
> 
> 
> > attach(clus.df)
> > 
> > for(i in 1:66)
> > clus.df[,i] <- as.factor(clus.df[,i])
> > 
> > However, I don't find this completely satisfactory. For one thing, I have
> > somehow got the impression that loops in Splus/R should be avoided
> > whenever possible (gee, I wonder where I got that impression) and so is
> > there a more elegant way to do this (without loops)?
> > 
> >                                                    Faheem.
> > 
> > On Wed, 26 Apr 2000, Faheem Mitha wrote:
> > 
> > > Dear people,
> > > 
> > > I've spent some time trying to find a simple way to do the following. I
> > > can certainly think of complicated ways to do it...
> > > 
> > > I have a data set of 99 columns and 2000 rows. Each row corresponds to an
> > > individual item of data, each column corresponds to a variable. I want
> > > this data to be read into a data frame. The first 66 columns are binary,
> > > values 0, 1, and I want these to be coerced into factor form. The last
> > > 33 are ordinary numeric data. For
> > > concreteness let us call these variables b1... b66 and the last 33
> > > x1... x33.
> > [stuff snipped]
> > 
> 
> -- 
> Bill Venables,      Statistician,     CMIS Environmetrics Project
> CSIRO Marine Labs, PO Box 120, Cleveland, Qld,  AUSTRALIA.   4163
> Tel: +61 7 3826 7251           Email: Bill.Venables at cmis.csiro.au
> 
> Fax: +61 7 3826 7304      http://www.cmis.csiro.au/bill.venables/
> 
> 
> 
> 

-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-
r-help mailing list -- Read http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html
Send "info", "help", or "[un]subscribe"
(in the "body", not the subject !)  To: r-help-request at stat.math.ethz.ch
_._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._