[R] Pasting data into scan()
Gabor Grothendieck
ggrothendieck at gmail.com
Tue May 2 03:34:39 CEST 2006
On 5/1/06, Murray Jorgensen <maj at stats.waikato.ac.nz> wrote:
> The file TENSILE.DAT from the Hand et al "Handbook of Small Data Sets"
> looks like this:
>
> 0.023 0.032 0.054 0.069 0.081 0.094
> 0.105 0.127 0.148 0.169 0.188 0.216
> 0.255 0.277 0.311 0.361 0.376 0.395
> 0.432 0.463 0.481 0.519 0.529 0.567
> 0.642 0.674 0.752 0.823 0.887 0.926
>
> except that my mail client has replaced the tab separators by blanks. If
> I paste this data into R 2.2.1 what I get is
>
> > strength <- scan()
> 1: 0.0230.0320.0540.0690.0810.094
> 1: 0.1050.1270.1480.1690.1880.216
> Error in scan() : scan() expected 'a real', got
> '0.0230.0320.0540.0690.0810.094'
> > 0.2550.2770.3110.3610.3760.395
> Error: syntax error in "0.2550.2770"
> > 0.4320.4630.4810.5190.5290.567
> Error: syntax error in "0.4320.4630"
> > 0.6420.6740.7520.8230.8870.926
> Error: syntax error in "0.6420.6740"
>
> Aha! I thought, what I need is scan(sep = "\t")
> but this generates the same error messages.
1. If your situation is that you have separators
but don't know what they are try this.
It replaces all characters that don't appear
in numbers with a space:
L <- readLines("clipboard")
L <- gsub("[^-0-9.]", " ", L)
scan(textConnection(L))
2. If the separators are completely lost you may still be
able to recover the data if you can assume that every
number is of the form d.ddd where d is a digt. Just
search for that pattern and replace it with itself
and a space:
L <- readLines("clipboard")
L <- gsub("([0-9][.][0-9][0-9][0-9])", "\\1 ", L)
scan(textConnection(L))
3. Doing a google search for tensile.dat finds a data set
that looks like yours. Try this:
URL <- "http://statistics.byu.edu/resources/files/datasets/tensile.dat"
scan(URL)
More information about the R-help
mailing list