[R] Can I improve the efficiency of my scan() command?
Thomas Lumley
tlumley at u.washington.edu
Fri Apr 11 23:14:20 CEST 2003
On Sat, 12 Apr 2003, Ko-Kang Kevin Wang wrote:
> Hi,
>
> Suppose I use the following codes to read in a data set.
>
> ###############################################
> > rating <- scan("../Data/Rating.csv",
> + what = list(
> + usage = "",
> + mileage = 0,
> + sex = "",
> + excess = "",
> + ncd = "",
> + primage = "",
> + minage = "",
> + drivers = "",
> + district = "",
> + cargroup = "",
> + car.age = 0,
> + wsclms = "",
> + adclms = "",
> + ftclms = "",
> + pdclms = "",
> + piclms = "",
> + adincur = 0,
> + pdincur = 0,
> + wsincur = 0,
> + ftincur = 0,
> + piincur = 0,
> + record = 0,
> + days = 0,
> + minagen = 0,
> + primagen = 0),
> + sep=",", quiet = TRUE, skip = 1)
> > rating.df <- as.data.frame(rating)
> > rating.df <- rating.df[, c(-6, -7, -22)]
> > attach(rating.df)
> > summary(rating.df)
<snip>
> #########################################################################
>
> It worked all right, but I'm just wondering if there is a more efficient
> way (it takes about 10 minutes to run the above scripts, for my 300,000 x
> 25 CSV file)?
>
It should be quicker not to convert to a data frame. You can just keep
the data as a list of vectors and lapply() the summary() function.
-thomas
More information about the R-help
mailing list