[R] Reading word by word in a dataset

Tony Plate tplate at acm.org
Mon Nov 1 22:15:36 CET 2004


Trying to make it work when not all rows have the same numbers of fields 
seems like a good place to use the "flush" argument to scan() (to skip 
everything after the first field on the line):

With the following copied to the clipboard:

i1-apple        10$   New_York
i2-banana
i3-strawberry   7$    Japan

do:

 > scan("clipboard", "", flush=T)
Read 3 items
[1] "i1-apple"      "i2-banana"     "i3-strawberry"
 > sub("^[A-Za-z0-9]*-", "", scan("clipboard", "", flush=T))
Read 3 items
[1] "apple"      "banana"     "strawberry"
 >

-- Tony Plate

At Monday 01:59 PM 11/1/2004, Spencer Graves wrote:
>      Uwe and Andy's solutions are great for many applications but won't 
> work if not all rows have the same numbers of fields.  Consider for 
> example the following modification of Lee's example:
>i1-apple        10$   New_York
>i2-banana
>i3-strawberry   7$    Japan
>
>      If I copy this to "clipboard" and run Andy's code, I get the following:
> > read.table("clipboard", colClasses=c("character", "NULL", "NULL"))
>Error in scan(file = file, what = what, sep = sep, quote = quote, dec = 
>dec,  :
>    line 2 did not have 3 elements
>
>      We can get around this using "scan", then splitting things apart 
> similar to the way Uwe described:
> > dat <-
>+ scan("clipboard", character(0), sep="\n")
>Read 3 items
> > dash <- regexpr("-", dat)
> > dat2 <- substring(dat, pmax(0, dash)+1)
> >
> > blank <- regexpr(" ", dat2)
> > if(any(blank<0))
>+   blank[blank<0] <- nchar(dat2[blank<0])
> > substring(dat2, 1, blank)
>[1] "apple "      "banana"      "strawberry "
>
>      hope this helps.  spencer graves
>
>Uwe Ligges wrote:
>
>>Liaw, Andy wrote:
>>
>>>Using R-2.0.0 on WinXPPro, cut-and-pasting the data you have:
>>>
>>>
>>>>read.table("clipboard", colClasses=c("character", "NULL", "NULL"))
>>>
>>>
>>>              V1
>>>1      i1-apple
>>>2     i2-banana
>>>3 i3-strawberry
>>
>>
>>
>>... and if only the words after "-" are of interest, the statement can be 
>>followed by
>>
>>  sapply(strsplit(...., "-"), "[", 2)
>>
>>
>>Uwe Ligges
>>
>>
>>
>>>HTH,
>>>Andy
>>>
>>>
>>>>From: j lee
>>>>
>>>>Hello All,
>>>>
>>>>I'd like to read first words in lines into a new file.
>>>>If I have a data file the following, how can I get the
>>>>first words: apple, banana, strawberry?
>>>>
>>>>i1-apple        10$   New_York
>>>>i2-banana       5$    London
>>>>i3-strawberry   7$    Japan
>>>>
>>>>Is there any similar question already posted to the
>>>>list? I am a bit new to R, having a few months of
>>>>experience now.
>>>>
>>>>Cheers,
>>>>
>>>>John
>>>>
>>>>______________________________________________
>>>>R-help at stat.math.ethz.ch mailing list
>>>>https://stat.ethz.ch/mailman/listinfo/r-help
>>>>PLEASE do read the posting guide! 
>>>>http://www.R-project.org/posting-guide.html
>>>>
>>>
>>>
>>>______________________________________________
>>>R-help at stat.math.ethz.ch mailing list
>>>https://stat.ethz.ch/mailman/listinfo/r-help
>>>PLEASE do read the posting guide! 
>>>http://www.R-project.org/posting-guide.html
>>
>>
>>______________________________________________
>>R-help at stat.math.ethz.ch mailing list
>>https://stat.ethz.ch/mailman/listinfo/r-help
>>PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
>
>
>--
>Spencer Graves, PhD, Senior Development Engineer
>O:  (408)938-4420;  mobile:  (408)655-4567
>
>______________________________________________
>R-help at stat.math.ethz.ch mailing list
>https://stat.ethz.ch/mailman/listinfo/r-help
>PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html




More information about the R-help mailing list