[R] Re: reading in columns of a data set as factors

Bill Venables Bill.Venables at cmis.csiro.au
Fri Apr 28 02:26:08 CEST 2000

At 11:30 AM 4/27/00 -0400, Faheem Mitha wrote:
>Dear Dr. Venables,
>Did you mean to write
>template <- structure(c(as.list(rep("", 66)), as.list(rep(0, 
>33)),names = names))

Not quite!  That's what happens when I don't use emacs... Sigh


template <- structure(c(as.list(rep("", 66)),
			      as.list(rep(0, 33))),
		names = names)

The idea is to make a list with the names that you want for your columns
and values for the components giving the mode of the value to be read in.
So you could do it in two steps:

template <- c(as.list(rep("", 66)), as.list(rep(0, 33)))
names(template) <- names

but that looks even more obscure because you have chosen to call your
vector of names, naturally enough "names".  (It's a good idea to avoid
these obvious sorts of, uh, names for things...)

>clus.df <- data.frame(scan("clus.dat", what = template))
>(A second bracket seems to be missing after names =names). 

(It was to go before the comma before the names = names, actually.)

>If so, this code certainly loads successfully, and is *much* faster than
>mine, but gives rather weird results. clus.df doesn't return the table in
>the form you would expect. For example, I get dim(clus.df) [1] 1031 198
>(!) I don't understand the structure function, so can't fix it.

The code above is still untested, but if I have got my brackets right this
time I suspect it will work and as fast as the incorrect version.

>Looks like I am stuck with my method for the time being. Does the method I
>am using (the loop) look OK, though?

Yes it did.  If I were only doing it once I can't see that it matters all
that much if it goes a bit slow.  You want to speed things up that you are
going to be doing all the time.  I think the value of this example is that
it alerts people to the older function, scan(), which offers some greater
flexibilities than read.table() since it works at a lower level.

>I am planning to buy the V&R third edition, possibly both volumes, the
>data analysis one and the programming one, though they are rather

Let me repeat, they are (at Springer's fairly insistent request) two books,
not two volumes of a single work.  There are links of course, but
essentially each stands alone and the focus audience of each is rather
different.  The larger, earlier book is about data analysis and using
S-PLUS (or R, with a few exceptions like Trellis) to do it efficiently and
well.  The second book is about programming and has next to no explicit
data analysis in it.  It is very much a handbook for people who need to use
an S language (and software systems based on it) for programming, (where
'S' is the generic term covering all three current dialects of the language). 

>expensive, because I am getting a little desperate. 

At the time the first edition of MASS came out we were complimented on how
we had "kept the price below US$40".  At the time I winced: we had nothing
to do with pricing, of course, but I wasn't about to turn down a compliment
and run the risk of seeming to be fishing for more...  Now Springer has had
the screws put on it and have had to jack their prices through the roof I
guess we have to wear the odium a little, but really it has nothing to do
with us, honestly.  In fact when I saw the prices in the catalogue I was
probably more annoyed and disappointed than you were.

>I have had serious
>problems finding sufficiently detailed documentation about Splus/R. The
>online help is often very cryptic, and much of the other available
>documentation is either severely incomplete or outdated. The only other
>option appears to be reading the source code, something which I don't feel
>quite up to, and which is, in any case, an option currently only available
>in R.

You have to realize that writing good documentation or technical material
of any kind is jolly hard work, very time-consuming and can take a terrible
toll on a person's friends, home life and personal well being.  (Believe
me, but but by all means check it out if you are not convinced...)
Compared to writing good documentation, programming is an absolute cinch.
It is not surprising at all to me that in a volunteer project like R the
programming gets done first as the easier bit (though by no means easy!)
and the documentation is at a much less well-developed stage.  

Personally I find reading code pretty tedious too, but ultimately it is the
only really safe way of finding things out (and contrary to what you say,
much of the critical code is available in S-PLUS, too).  I rather take the
help information as a first cut and then use the code to fill in the bits
that I still cannot sort out.  Of course I have been in the game for quite
some time so I do sympathize with someone just coming in for the first
time, but we all had to start sometime...

>Thank you for your response.
>                                   Sincerely, Faheem Mitha.

I bet you didn't notice that you had a cc to R-help in your headers!  Oh
well, it's probably a good idea to get the issue out in the open: we do
need more volunteers to help with the R project and writing *good*
documentation is an urgent need, but it is much harder than writing code
and for some reason nowhere near as glamorous.

Bill Venables.
r-help mailing list -- Read http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html
Send "info", "help", or "[un]subscribe"
(in the "body", not the subject !)  To: r-help-request at stat.math.ethz.ch

More information about the R-help mailing list