[R] Reading word by word in a dataset
Tony Plate
tplate at acm.org
Mon Nov 1 22:15:36 CET 2004
Trying to make it work when not all rows have the same numbers of fields
seems like a good place to use the "flush" argument to scan() (to skip
everything after the first field on the line):
With the following copied to the clipboard:
i1-apple 10$ New_York
i2-banana
i3-strawberry 7$ Japan
do:
> scan("clipboard", "", flush=T)
Read 3 items
[1] "i1-apple" "i2-banana" "i3-strawberry"
> sub("^[A-Za-z0-9]*-", "", scan("clipboard", "", flush=T))
Read 3 items
[1] "apple" "banana" "strawberry"
>
-- Tony Plate
At Monday 01:59 PM 11/1/2004, Spencer Graves wrote:
> Uwe and Andy's solutions are great for many applications but won't
> work if not all rows have the same numbers of fields. Consider for
> example the following modification of Lee's example:
>i1-apple 10$ New_York
>i2-banana
>i3-strawberry 7$ Japan
>
> If I copy this to "clipboard" and run Andy's code, I get the following:
> > read.table("clipboard", colClasses=c("character", "NULL", "NULL"))
>Error in scan(file = file, what = what, sep = sep, quote = quote, dec =
>dec, :
> line 2 did not have 3 elements
>
> We can get around this using "scan", then splitting things apart
> similar to the way Uwe described:
> > dat <-
>+ scan("clipboard", character(0), sep="\n")
>Read 3 items
> > dash <- regexpr("-", dat)
> > dat2 <- substring(dat, pmax(0, dash)+1)
> >
> > blank <- regexpr(" ", dat2)
> > if(any(blank<0))
>+ blank[blank<0] <- nchar(dat2[blank<0])
> > substring(dat2, 1, blank)
>[1] "apple " "banana" "strawberry "
>
> hope this helps. spencer graves
>
>Uwe Ligges wrote:
>
>>Liaw, Andy wrote:
>>
>>>Using R-2.0.0 on WinXPPro, cut-and-pasting the data you have:
>>>
>>>
>>>>read.table("clipboard", colClasses=c("character", "NULL", "NULL"))
>>>
>>>
>>> V1
>>>1 i1-apple
>>>2 i2-banana
>>>3 i3-strawberry
>>
>>
>>
>>... and if only the words after "-" are of interest, the statement can be
>>followed by
>>
>> sapply(strsplit(...., "-"), "[", 2)
>>
>>
>>Uwe Ligges
>>
>>
>>
>>>HTH,
>>>Andy
>>>
>>>
>>>>From: j lee
>>>>
>>>>Hello All,
>>>>
>>>>I'd like to read first words in lines into a new file.
>>>>If I have a data file the following, how can I get the
>>>>first words: apple, banana, strawberry?
>>>>
>>>>i1-apple 10$ New_York
>>>>i2-banana 5$ London
>>>>i3-strawberry 7$ Japan
>>>>
>>>>Is there any similar question already posted to the
>>>>list? I am a bit new to R, having a few months of
>>>>experience now.
>>>>
>>>>Cheers,
>>>>
>>>>John
>>>>
>>>>______________________________________________
>>>>R-help at stat.math.ethz.ch mailing list
>>>>https://stat.ethz.ch/mailman/listinfo/r-help
>>>>PLEASE do read the posting guide!
>>>>http://www.R-project.org/posting-guide.html
>>>>
>>>
>>>
>>>______________________________________________
>>>R-help at stat.math.ethz.ch mailing list
>>>https://stat.ethz.ch/mailman/listinfo/r-help
>>>PLEASE do read the posting guide!
>>>http://www.R-project.org/posting-guide.html
>>
>>
>>______________________________________________
>>R-help at stat.math.ethz.ch mailing list
>>https://stat.ethz.ch/mailman/listinfo/r-help
>>PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
>
>
>--
>Spencer Graves, PhD, Senior Development Engineer
>O: (408)938-4420; mobile: (408)655-4567
>
>______________________________________________
>R-help at stat.math.ethz.ch mailing list
>https://stat.ethz.ch/mailman/listinfo/r-help
>PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
More information about the R-help
mailing list