[R] Pasting data into scan()

Gabor Grothendieck ggrothendieck at gmail.com
Tue May 2 03:34:39 CEST 2006


On 5/1/06, Murray Jorgensen <maj at stats.waikato.ac.nz> wrote:
> The file TENSILE.DAT from the Hand et al "Handbook of Small Data Sets"
> looks like this:
>
> 0.023   0.032   0.054   0.069   0.081   0.094
> 0.105   0.127   0.148   0.169   0.188   0.216
> 0.255   0.277   0.311   0.361   0.376   0.395
> 0.432   0.463   0.481   0.519   0.529   0.567
> 0.642   0.674   0.752   0.823   0.887   0.926
>
> except that my mail client has replaced the tab separators by blanks. If
> I paste this data into R 2.2.1 what I get is
>
>  > strength <- scan()
> 1: 0.0230.0320.0540.0690.0810.094
> 1: 0.1050.1270.1480.1690.1880.216
> Error in scan() : scan() expected 'a real', got
> '0.0230.0320.0540.0690.0810.094'
>  > 0.2550.2770.3110.3610.3760.395
> Error: syntax error in "0.2550.2770"
>  > 0.4320.4630.4810.5190.5290.567
> Error: syntax error in "0.4320.4630"
>  > 0.6420.6740.7520.8230.8870.926
> Error: syntax error in "0.6420.6740"
>
> Aha! I thought, what I need is     scan(sep = "\t")
> but this generates the same error messages.

1. If your situation is that you have separators
but don't know what they are try this.
It replaces all characters that don't appear
in numbers with a space:

L <- readLines("clipboard")
L <- gsub("[^-0-9.]", " ", L)
scan(textConnection(L))

2. If the separators are completely lost you may still be
able to recover the data if you can assume that every
number is of the form d.ddd where d is a digt.  Just
search for that pattern and replace it with itself
and a space:

L <- readLines("clipboard")
L <- gsub("([0-9][.][0-9][0-9][0-9])", "\\1 ", L)
scan(textConnection(L))

3. Doing a google search for tensile.dat finds a data set
that looks like yours.  Try this:

URL <- "http://statistics.byu.edu/resources/files/datasets/tensile.dat"
scan(URL)




More information about the R-help mailing list