[Rd] Behaviour of read.table with empty columns

John Fox jfox at mcmaster.ca
Wed May 9 18:43:32 CEST 2007


Dear Brian (and Gabor),

Thanks -- that makes sense.

John

--------------------------------
John Fox, Professor
Department of Sociology
McMaster University
Hamilton, Ontario
Canada L8S 4M4
905-525-9140x23604
http://socserv.mcmaster.ca/jfox 
-------------------------------- 

> -----Original Message-----
> From: r-devel-bounces at r-project.org 
> [mailto:r-devel-bounces at r-project.org] On Behalf Of Prof Brian Ripley
> Sent: Wednesday, May 09, 2007 12:05 PM
> To: John Fox
> Cc: r-devel at r-project.org
> Subject: Re: [Rd] Behaviour of read.table with empty columns
> 
> On Wed, 9 May 2007, John Fox wrote:
> 
> > Dear r-devel list members,
> >
> > I stumbled across the following behaviour of read.table() recently: 
> Suppose
> > that I have the data
> >
> > a  " " ""
> > "" ""  ""
> >
> > in a file or copied to the clipboard, and issue the command
> >
> >> DF <- read.table("clipboard")
> >> DF
> >  V1 V2 V3
> > 1  a NA NA
> > 2    NA NA
> >
> >> is.na(DF)
> >        V1   V2   V3
> > [1,] FALSE TRUE TRUE
> > [2,] FALSE TRUE TRUE
> >
> > I was surprised by the NAs. Note that they occur only when a column 
> > consists entirely of empty strings or strings composed of blanks.
> >
> > On the other hand
> >
> >> data.frame(A=c("", "", ""))
> >  A
> > 1
> > 2
> > 3
> >
> > works as I would have expected.
> 
> How did you expect R to know that "" meant a character 
> column?  You are allowed to quote any type of column, so as 
> far as read.table is concerned the columns is entirely empty 
> and so its type is unknown.  It defaults to the simplest 
> possible type, logical.
> 
> The answer is I think to use colClasses="character".
> 
> It is probably slightly more accurate to say that if 
> colClasses is not given, all columns are read as character 
> columns, and then converted to the simplest possible type.  
> In earlier versions of R you could get NULL columns (if there 
> were no rows at all), but now the simplest is logical.
> 
> Brian
> 
> > A work-around for me was
> >
> >> DF[is.na(DF)] <- ""
> >> DF
> >  V1 V2 V3
> > 1  a
> > 2
> >
> > But, as I said, I found the behaviour of read.table() puzzling.
> >
> > All this is with R 2.5.0 on a Windows XP Pro SP 2 system.
> >
> > Comments?
> >
> > Thanks,
> > John
> >
> > --------------------------------
> > John Fox, Professor
> > Department of Sociology
> > McMaster University
> > Hamilton, Ontario
> > Canada L8S 4M4
> > 905-525-9140x23604
> > http://socserv.mcmaster.ca/jfox
> >
> > ______________________________________________
> > R-devel at r-project.org mailing list
> > https://stat.ethz.ch/mailman/listinfo/r-devel
> >
> 
> -- 
> Brian D. Ripley,                  ripley at stats.ox.ac.uk
> Professor of Applied Statistics,  http://www.stats.ox.ac.uk/~ripley/
> University of Oxford,             Tel:  +44 1865 272861 (self)
> 1 South Parks Road,                     +44 1865 272866 (PA)
> Oxford OX1 3TG, UK                Fax:  +44 1865 272595
> 
> ______________________________________________
> R-devel at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel
>



More information about the R-devel mailing list