[R] Reading word by word in a dataset

Thu Nov 4 14:49:52 CET 2004

On Thu, 4 Nov 2004, John wrote:

> Dear Andy,
> Why does my 'read.table()' NOT work in this example?
> I have the error, "subscript out of bounds", as you
> see below. My R version is 1.9.0.
                             ^^^^^

That is your problem. It works in the current version of R, 2.0.0. Using
colClasses=NULL was not documented in 1.9.0, and was not intended to work.

What does the posting guide say about this?

> > system("more mtx.ex.1")
> i1-apple 10$ New_York
> i2-banana 5$ London
> i3-strawberry 7$ Japan
> >
> > read.table("mtx.ex.1",
> colClasses=c("character","NULL","NULL"), fill=T)
> Error in data[[i]] : subscript out of bounds
> >
> > read.table("mtx.ex.1", colClasses=c("character",
> NULL, NULL), fill=T)
>              V1  V2       V3
> 1      i1-apple 10$ New_York
> 2     i2-banana  5$   London
> 3 i3-strawberry  7$    Japan
> >
> > read.table("mtx.ex.1", colClasses=c("character",
> NULL, NULL), fill=T)[,1]
> [1] "i1-apple"      "i2-banana"     "i3-strawberry"
> >
> 
> Cheers,
> 
> John
> 
> 
>  --- "Liaw, Andy" <andy_liaw at merck.com> wrote: 
> > Don't give up on read.table() just yet:
> > 
> > > read.table("clipboard", colClasses=c("character",
> > "NULL", "NULL"),
> > fill=TRUE)
> >              V1
> > 1      i1-apple
> > 2     i2-banana
> > 3 i3-strawberry
> > 
> > Andy
> > 
> > > From: Spencer Graves
> > > 
> > >       Uwe and Andy's solutions are great for many 
> > > applications but won't 
> > > work if not all rows have the same numbers of
> > fields.  Consider for 
> > > example the following modification of Lee's
> > example: 
> > > 
> > > i1-apple        10$   New_York
> > > i2-banana
> > > i3-strawberry   7$    Japan
> > > 
> > >       If I copy this to "clipboard" and run Andy's
> > code, I get the 
> > > following: 
> > > 
> > >  > read.table("clipboard",
> > colClasses=c("character", "NULL", "NULL"))
> > > Error in scan(file = file, what = what, sep = sep,
> > quote = 
> > > quote, dec = 
> > > dec,  :
> > >     line 2 did not have 3 elements
> > > 
> > >       We can get around this using "scan", then
> > splitting 
> > > things apart 
> > > similar to the way Uwe described: 
> > > 
> > >  > dat <-
> > > + scan("clipboard", character(0), sep="\n")
> > > Read 3 items
> > >  > dash <- regexpr("-", dat)
> > >  > dat2 <- substring(dat, pmax(0, dash)+1)
> > >  >
> > >  > blank <- regexpr(" ", dat2)
> > >  > if(any(blank<0))
> > > +   blank[blank<0] <- nchar(dat2[blank<0])
> > >  > substring(dat2, 1, blank)
> > > [1] "apple "      "banana"      "strawberry "
> > > 
> > >       hope this helps.  spencer graves
> > >     
> > > Uwe Ligges wrote:
> > > 
> > > > Liaw, Andy wrote:
> > > >
> > > >> Using R-2.0.0 on WinXPPro, cut-and-pasting the
> > data you have:
> > > >>
> > > >>
> > > >>> read.table("clipboard",
> > colClasses=c("character", "NULL", "NULL"))
> > > >>
> > > >>
> > > >>              V1
> > > >> 1      i1-apple
> > > >> 2     i2-banana
> > > >> 3 i3-strawberry
> > > >
> > > >
> > > >
> > > > ... and if only the words after "-" are of
> > interest, the 
> > > statement can 
> > > > be followed by
> > > >
> > > >  sapply(strsplit(...., "-"), "[", 2)
> > > >
> > > >
> > > > Uwe Ligges
> > > >
> > > >
> > > >
> > > >> HTH,
> > > >> Andy
> > > >>
> > > >>
> > > >>> From: j lee
> > > >>>
> > > >>> Hello All,
> > > >>>
> > > >>> I'd like to read first words in lines into a
> > new file.
> > > >>> If I have a data file the following, how can I
> > get the
> > > >>> first words: apple, banana, strawberry?
> > > >>>
> > > >>> i1-apple        10$   New_York
> > > >>> i2-banana       5$    London
> > > >>> i3-strawberry   7$    Japan
> > > >>>
> > > >>> Is there any similar question already posted
> > to the
> > > >>> list? I am a bit new to R, having a few months
> > of
> > > >>> experience now.
> > > >>>
> > > >>> Cheers,
> > > >>>
> > > >>> John

-- 
Brian D. Ripley,                  ripley at stats.ox.ac.uk
Professor of Applied Statistics,  http://www.stats.ox.ac.uk/~ripley/
University of Oxford,             Tel:  +44 1865 272861 (self)
1 South Parks Road,                     +44 1865 272866 (PA)
Oxford OX1 3TG, UK                Fax:  +44 1865 272595