[R] Can I improve the efficiency of my scan() command?

Fri Apr 11 23:14:20 CEST 2003

On Sat, 12 Apr 2003, Ko-Kang Kevin Wang wrote:

> Hi,
>
> Suppose I use the following codes to read in a data set.
>
> ###############################################
> > rating <- scan("../Data/Rating.csv",
> +                what = list(
> +                  usage = "",
> +                  mileage = 0,
> +                  sex = "",
> +                  excess = "",
> +                  ncd = "",
> +                  primage = "",
> +                  minage = "",
> +                  drivers = "",
> +                  district = "",
> +                  cargroup = "",
> +                  car.age = 0,
> +                  wsclms = "",
> +                  adclms = "",
> +                  ftclms = "",
> +                  pdclms = "",
> +                  piclms = "",
> +                  adincur = 0,
> +                  pdincur = 0,
> +                  wsincur = 0,
> +                  ftincur = 0,
> +                  piincur = 0,
> +                  record = 0,
> +                  days = 0,
> +                  minagen = 0,
> +                  primagen = 0),
> +                sep=",", quiet = TRUE, skip = 1)
> > rating.df <- as.data.frame(rating)
> > rating.df <- rating.df[, c(-6, -7, -22)]
> > attach(rating.df)
> > summary(rating.df)
<snip>
> #########################################################################
>
> It worked all right, but I'm just wondering if there is a more efficient
> way (it takes about 10 minutes to run the above scripts, for my 300,000 x
> 25 CSV file)?
>

It should be quicker not to convert to a data frame.  You can just keep
the data as a list of vectors and lapply() the summary() function.

	-thomas