[R] Reading word by word in a dataset

Thu Nov 4 14:02:03 CET 2004

Dear Andy,
Why does my 'read.table()' NOT work in this example?
I have the error, "subscript out of bounds", as you
see below. My R version is 1.9.0.

> system("more mtx.ex.1")
i1-apple 10$ New_York
i2-banana 5$ London
i3-strawberry 7$ Japan
>
> read.table("mtx.ex.1",
colClasses=c("character","NULL","NULL"), fill=T)
Error in data[[i]] : subscript out of bounds
>
> read.table("mtx.ex.1", colClasses=c("character",
NULL, NULL), fill=T)
             V1  V2       V3
1      i1-apple 10$ New_York
2     i2-banana  5$   London
3 i3-strawberry  7$    Japan
>
> read.table("mtx.ex.1", colClasses=c("character",
NULL, NULL), fill=T)[,1]
[1] "i1-apple"      "i2-banana"     "i3-strawberry"
>

Cheers,

John

 --- "Liaw, Andy" <andy_liaw at merck.com> wrote: 
> Don't give up on read.table() just yet:
> 
> > read.table("clipboard", colClasses=c("character",
> "NULL", "NULL"),
> fill=TRUE)
>              V1
> 1      i1-apple
> 2     i2-banana
> 3 i3-strawberry
> 
> Andy
> 
> > From: Spencer Graves
> > 
> >       Uwe and Andy's solutions are great for many 
> > applications but won't 
> > work if not all rows have the same numbers of
> fields.  Consider for 
> > example the following modification of Lee's
> example: 
> > 
> > i1-apple        10$   New_York
> > i2-banana
> > i3-strawberry   7$    Japan
> > 
> >       If I copy this to "clipboard" and run Andy's
> code, I get the 
> > following: 
> > 
> >  > read.table("clipboard",
> colClasses=c("character", "NULL", "NULL"))
> > Error in scan(file = file, what = what, sep = sep,
> quote = 
> > quote, dec = 
> > dec,  :
> >     line 2 did not have 3 elements
> > 
> >       We can get around this using "scan", then
> splitting 
> > things apart 
> > similar to the way Uwe described: 
> > 
> >  > dat <-
> > + scan("clipboard", character(0), sep="\n")
> > Read 3 items
> >  > dash <- regexpr("-", dat)
> >  > dat2 <- substring(dat, pmax(0, dash)+1)
> >  >
> >  > blank <- regexpr(" ", dat2)
> >  > if(any(blank<0))
> > +   blank[blank<0] <- nchar(dat2[blank<0])
> >  > substring(dat2, 1, blank)
> > [1] "apple "      "banana"      "strawberry "
> > 
> >       hope this helps.  spencer graves
> >     
> > Uwe Ligges wrote:
> > 
> > > Liaw, Andy wrote:
> > >
> > >> Using R-2.0.0 on WinXPPro, cut-and-pasting the
> data you have:
> > >>
> > >>
> > >>> read.table("clipboard",
> colClasses=c("character", "NULL", "NULL"))
> > >>
> > >>
> > >>              V1
> > >> 1      i1-apple
> > >> 2     i2-banana
> > >> 3 i3-strawberry
> > >
> > >
> > >
> > > ... and if only the words after "-" are of
> interest, the 
> > statement can 
> > > be followed by
> > >
> > >  sapply(strsplit(...., "-"), "[", 2)
> > >
> > >
> > > Uwe Ligges
> > >
> > >
> > >
> > >> HTH,
> > >> Andy
> > >>
> > >>
> > >>> From: j lee
> > >>>
> > >>> Hello All,
> > >>>
> > >>> I'd like to read first words in lines into a
> new file.
> > >>> If I have a data file the following, how can I
> get the
> > >>> first words: apple, banana, strawberry?
> > >>>
> > >>> i1-apple        10$   New_York
> > >>> i2-banana       5$    London
> > >>> i3-strawberry   7$    Japan
> > >>>
> > >>> Is there any similar question already posted
> to the
> > >>> list? I am a bit new to R, having a few months
> of
> > >>> experience now.
> > >>>
> > >>> Cheers,
> > >>>
> > >>> John
> > >>>
> > >>> ______________________________________________
> > >>> R-help at stat.math.ethz.ch mailing list
> > >>> https://stat.ethz.ch/mailman/listinfo/r-help
> > >>> PLEASE do read the posting guide! 
> > >>> http://www.R-project.org/posting-guide.html
> > >>>
> > >>>
> > >>
> > >>
> > >> ______________________________________________
> > >> R-help at stat.math.ethz.ch mailing list
> > >> https://stat.ethz.ch/mailman/listinfo/r-help
> > >> PLEASE do read the posting guide! 
> > >> http://www.R-project.org/posting-guide.html
> > >
> > >
> > > ______________________________________________
> > > R-help at stat.math.ethz.ch mailing list
> > > https://stat.ethz.ch/mailman/listinfo/r-help
> > > PLEASE do read the posting guide! 
> > > http://www.R-project.org/posting-guide.html
> > 
> > 
> > -- 
> > Spencer Graves, PhD, Senior Development Engineer
> > O:  (408)938-4420;  mobile:  (408)655-4567
> > 
> > ______________________________________________
> > R-help at stat.math.ethz.ch mailing list
> > https://stat.ethz.ch/mailman/listinfo/r-help
> > PLEASE do read the posting guide! 
> > http://www.R-project.org/posting-guide.html
> > 
> > 
> 
> 
>
------------------------------------------------------------------------------
> Notice:  This e-mail message, together with any
> attachments, contains information of Merck & Co.,
> Inc. (One Merck Drive, Whitehouse Station, New
> Jersey, USA 08889), and/or its affiliates (which may
> be known outside the United States as Merck Frosst,
> Merck Sharp & Dohme or MSD and in Japan, as Banyu)
> that may be confidential, proprietary copyrighted
> and/or legally privileged. It is intended solely for
> the use of the individual or entity named on this
> message.  If you are not the intended recipient, and
> have received this message in error, please notify
> us immediately by reply e-mail and then delete it
> from your system.
>
------------------------------------------------------------------------------
>