[R] Reading word by word in a dataset
Spencer Graves
spencer.graves at pdf.com
Mon Nov 1 23:53:18 CET 2004
Dear Andy & Tony:
That's great. Unfortunately, I still spend most of my life in the
S-Plus world, and read.table in S-Plus 6.2 does not have the "fill"
argument. However, Tony's solution (and my ugly hack) work in both
S-Plus 6.2 and R 2.0.0.
Thanks again.
Spencer Graves
Tony Plate wrote:
> Trying to make it work when not all rows have the same numbers of
> fields seems like a good place to use the "flush" argument to scan()
> (to skip everything after the first field on the line):
>
> With the following copied to the clipboard:
>
> i1-apple 10$ New_York
> i2-banana
> i3-strawberry 7$ Japan
>
> do:
>
> > scan("clipboard", "", flush=T)
> Read 3 items
> [1] "i1-apple" "i2-banana" "i3-strawberry"
> > sub("^[A-Za-z0-9]*-", "", scan("clipboard", "", flush=T))
> Read 3 items
> [1] "apple" "banana" "strawberry"
> >
>
> -- Tony Plate
>
> At Monday 01:59 PM 11/1/2004, Spencer Graves wrote:
>
>> Uwe and Andy's solutions are great for many applications but
>> won't work if not all rows have the same numbers of fields. Consider
>> for example the following modification of Lee's example:
>> i1-apple 10$ New_York
>> i2-banana
>> i3-strawberry 7$ Japan
>>
>> If I copy this to "clipboard" and run Andy's code, I get the
>> following:
>> > read.table("clipboard", colClasses=c("character", "NULL", "NULL"))
>> Error in scan(file = file, what = what, sep = sep, quote = quote, dec
>> = dec, :
>> line 2 did not have 3 elements
>>
>> We can get around this using "scan", then splitting things apart
>> similar to the way Uwe described:
>> > dat <-
>> + scan("clipboard", character(0), sep="\n")
>> Read 3 items
>> > dash <- regexpr("-", dat)
>> > dat2 <- substring(dat, pmax(0, dash)+1)
>> >
>> > blank <- regexpr(" ", dat2)
>> > if(any(blank<0))
>> + blank[blank<0] <- nchar(dat2[blank<0])
>> > substring(dat2, 1, blank)
>> [1] "apple " "banana" "strawberry "
>>
>> hope this helps. spencer graves
>>
>> Uwe Ligges wrote:
>>
>>> Liaw, Andy wrote:
>>>
>>>> Using R-2.0.0 on WinXPPro, cut-and-pasting the data you have:
>>>>
>>>>
>>>>> read.table("clipboard", colClasses=c("character", "NULL", "NULL"))
>>>>
>>>>
>>>>
>>>> V1
>>>> 1 i1-apple
>>>> 2 i2-banana
>>>> 3 i3-strawberry
>>>
>>>
>>>
>>>
>>> ... and if only the words after "-" are of interest, the statement
>>> can be followed by
>>>
>>> sapply(strsplit(...., "-"), "[", 2)
>>>
>>>
>>> Uwe Ligges
>>>
>>>
>>>
>>>> HTH,
>>>> Andy
>>>>
>>>>
>>>>> From: j lee
>>>>>
>>>>> Hello All,
>>>>>
>>>>> I'd like to read first words in lines into a new file.
>>>>> If I have a data file the following, how can I get the
>>>>> first words: apple, banana, strawberry?
>>>>>
>>>>> i1-apple 10$ New_York
>>>>> i2-banana 5$ London
>>>>> i3-strawberry 7$ Japan
>>>>>
>>>>> Is there any similar question already posted to the
>>>>> list? I am a bit new to R, having a few months of
>>>>> experience now.
>>>>>
>>>>> Cheers,
>>>>>
>>>>> John
>>>>>
>>>>> ______________________________________________
>>>>> R-help at stat.math.ethz.ch mailing list
>>>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>>>> PLEASE do read the posting guide!
>>>>> http://www.R-project.org/posting-guide.html
>>>>>
>>>>
>>>>
>>>> ______________________________________________
>>>> R-help at stat.math.ethz.ch mailing list
>>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>>> PLEASE do read the posting guide!
>>>> http://www.R-project.org/posting-guide.html
>>>
>>>
>>>
>>> ______________________________________________
>>> R-help at stat.math.ethz.ch mailing list
>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>> PLEASE do read the posting guide!
>>> http://www.R-project.org/posting-guide.html
>>
>>
>>
>> --
>> Spencer Graves, PhD, Senior Development Engineer
>> O: (408)938-4420; mobile: (408)655-4567
>>
>> ______________________________________________
>> R-help at stat.math.ethz.ch mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide!
>> http://www.R-project.org/posting-guide.html
>
>
> ______________________________________________
> R-help at stat.math.ethz.ch mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide!
> http://www.R-project.org/posting-guide.html
--
Spencer Graves, PhD, Senior Development Engineer
O: (408)938-4420; mobile: (408)655-4567
More information about the R-help
mailing list