[R] Can I improve the efficiency of my scan() command?
Liaw, Andy
andy_liaw at merck.com
Sat Apr 12 00:28:22 CEST 2003
> From: Pierre Kleiber [mailto:pkleiber at honlab.nmfs.hawaii.edu]
>
> Ko-Kang Kevin Wang wrote:
[snipped]
> >
> > It worked all right, but I'm just wondering if there is a
> more efficient
> > way (it takes about 10 minutes to run the above scripts,
> for my 300,000 x
> > 25 CSV file)?
> >
> > For example, the CSV file has 25 columns but I don't need 3
> of them (6, 7,
> > and 22). What I have done is to scan them in anyway,
> convert the list
> > into a data frame then remove the 3 columns. Just wonder if it is
> > possible to simply ignore them in scan() to make the process faster?
> >
>
>
> It might not make a lot of difference in your case where you are
> reading many fields and want to ignore a few, but if you want to read
> a few out of many, it would help to preprocess the input file using,
> for example, awk as in the following, which would pick up fields 1, 2,
> and 4:
>
> > con <- pipe("awk -F , '{print $1,$3 $4}' ../Data/Rating.csv")
> > rating <- scan(con, what = list(
> + usage = "",
> + mileage = 0,
> + excess = "")
> + , quiet = TRUE, skip = 1)
> > close(con)
Or even pipe("cut -d, -f1,3-4 ...")
Andy
>
> I do this sort of thing a lot using various utilities; so I've defined
> the following function to take care of opening and closing the
> connection:
>
> scanpipe <- function(x,...) {
> con <- pipe(x)
> out <- scan(con,...)
> close(con)
> out
> }
>
>
> --
> -----------------------------------------------------------------
> Pierre Kleiber Email: pkleiber at honlab.nmfs.hawaii.edu
> Fishery Biologist Tel: 808 983-5399/737-7544
> NOAA FISHERIES - Honolulu Laboratory Fax: 808 983-2902
> 2570 Dole St., Honolulu, HI 96822-2396
> -----------------------------------------------------------------
>
> ______________________________________________
> R-help at stat.math.ethz.ch mailing list
> https://www.stat.math.ethz.ch/mailman/listinfo/r-help
>
------------------------------------------------------------------------------
More information about the R-help
mailing list