[Rd] Behaviour of read.table with empty columns
John Fox
jfox at mcmaster.ca
Wed May 9 18:43:32 CEST 2007
Dear Brian (and Gabor),
Thanks -- that makes sense.
John
--------------------------------
John Fox, Professor
Department of Sociology
McMaster University
Hamilton, Ontario
Canada L8S 4M4
905-525-9140x23604
http://socserv.mcmaster.ca/jfox
--------------------------------
> -----Original Message-----
> From: r-devel-bounces at r-project.org
> [mailto:r-devel-bounces at r-project.org] On Behalf Of Prof Brian Ripley
> Sent: Wednesday, May 09, 2007 12:05 PM
> To: John Fox
> Cc: r-devel at r-project.org
> Subject: Re: [Rd] Behaviour of read.table with empty columns
>
> On Wed, 9 May 2007, John Fox wrote:
>
> > Dear r-devel list members,
> >
> > I stumbled across the following behaviour of read.table() recently:
> Suppose
> > that I have the data
> >
> > a " " ""
> > "" "" ""
> >
> > in a file or copied to the clipboard, and issue the command
> >
> >> DF <- read.table("clipboard")
> >> DF
> > V1 V2 V3
> > 1 a NA NA
> > 2 NA NA
> >
> >> is.na(DF)
> > V1 V2 V3
> > [1,] FALSE TRUE TRUE
> > [2,] FALSE TRUE TRUE
> >
> > I was surprised by the NAs. Note that they occur only when a column
> > consists entirely of empty strings or strings composed of blanks.
> >
> > On the other hand
> >
> >> data.frame(A=c("", "", ""))
> > A
> > 1
> > 2
> > 3
> >
> > works as I would have expected.
>
> How did you expect R to know that "" meant a character
> column? You are allowed to quote any type of column, so as
> far as read.table is concerned the columns is entirely empty
> and so its type is unknown. It defaults to the simplest
> possible type, logical.
>
> The answer is I think to use colClasses="character".
>
> It is probably slightly more accurate to say that if
> colClasses is not given, all columns are read as character
> columns, and then converted to the simplest possible type.
> In earlier versions of R you could get NULL columns (if there
> were no rows at all), but now the simplest is logical.
>
> Brian
>
> > A work-around for me was
> >
> >> DF[is.na(DF)] <- ""
> >> DF
> > V1 V2 V3
> > 1 a
> > 2
> >
> > But, as I said, I found the behaviour of read.table() puzzling.
> >
> > All this is with R 2.5.0 on a Windows XP Pro SP 2 system.
> >
> > Comments?
> >
> > Thanks,
> > John
> >
> > --------------------------------
> > John Fox, Professor
> > Department of Sociology
> > McMaster University
> > Hamilton, Ontario
> > Canada L8S 4M4
> > 905-525-9140x23604
> > http://socserv.mcmaster.ca/jfox
> >
> > ______________________________________________
> > R-devel at r-project.org mailing list
> > https://stat.ethz.ch/mailman/listinfo/r-devel
> >
>
> --
> Brian D. Ripley, ripley at stats.ox.ac.uk
> Professor of Applied Statistics, http://www.stats.ox.ac.uk/~ripley/
> University of Oxford, Tel: +44 1865 272861 (self)
> 1 South Parks Road, +44 1865 272866 (PA)
> Oxford OX1 3TG, UK Fax: +44 1865 272595
>
> ______________________________________________
> R-devel at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel
>
More information about the R-devel
mailing list