[R] Can I improve the efficiency of my scan() command?

Sat Apr 12 00:07:49 CEST 2003

Ko-Kang Kevin Wang wrote:
> Hi,
> 
> Suppose I use the following codes to read in a data set.
> 
> ###############################################
> 
>>rating <- scan("../Data/Rating.csv",
> 
> +                what = list(
 > +                  usage = "",
 > +                  mileage = 0,
 > +                  sex = "",
 > +                  excess = "",
 > +                  ncd = "",
 > +                  primage = "",
 > +                  minage = "",
 > +                  drivers = "",
 > +                  district = "",
 > +                  cargroup = "",
 > +                  car.age = 0,
 > +                  wsclms = "",

[...]
>                              
> #########################################################################
> 
> It worked all right, but I'm just wondering if there is a more efficient 
> way (it takes about 10 minutes to run the above scripts, for my 300,000 x 
> 25 CSV file)?
> 
> For example, the CSV file has 25 columns but I don't need 3 of them (6, 7, 
> and 22).  What I have done is to scan them in anyway, convert the list 
> into a data frame then remove the 3 columns.  Just wonder if it is 
> possible to simply ignore them in scan() to make the process faster?
> 


It might not make a lot of difference in your case where you are
reading many fields and want to ignore a few, but if you want to read
a few out of many, it would help to preprocess the input file using,
for example, awk as in the following, which would pick up fields 1, 2,
and 4:

 > con <- pipe("awk -F , '{print $1,$3 $4}' ../Data/Rating.csv")
 > rating <- scan(con, what = list(
+                  usage = "",
+                  mileage = 0,
+                  excess = "")
+            , quiet = TRUE, skip = 1)
 > close(con)

I do this sort of thing a lot using various utilities; so I've defined
the following function to take care of opening and closing the
connection:

scanpipe <- function(x,...) {
   con <- pipe(x)
   out <- scan(con,...)
   close(con)
   out
}


-- 
-----------------------------------------------------------------
Pierre Kleiber             Email: pkleiber at honlab.nmfs.hawaii.edu
Fishery Biologist                     Tel: 808 983-5399/737-7544
NOAA FISHERIES - Honolulu Laboratory         Fax: 808 983-2902
2570 Dole St., Honolulu, HI 96822-2396
-----------------------------------------------------------------