[R] how to make the code more efficient using lapply

Fri May 25 09:21:44 CEST 2018

Hi Stephen,
I am not sure that the "for loop" is the source of slowness.
You seem to be doing a lot of unnecessary work each time through the loop.
e.g. no need to check if it's the last file, just move that section outside
of the loop.
It will be executed when the loop finishes. As it is you are calling
list.files() each time
through the loop which could be slow.

In any case here's a possible way to do it. Warning: untested!

f <- function(fn) {
  temp<-read_xlsx(fn,sheet=1,range=cell_cols(c(1,30,38:42)))
  temp<-temp[temp$Id %in% c("geneA","geneB","geneC"),]
}
myL <- lapply( X=list.files(), FUN=f )
temp.df.all<-do.call("rbind",myL)
names(temp.df.all)<-gsub("^.*] ","",names(temp.df.all))
write_xlsx(temp.df.all, path="output.xlsx")

HTH,
Eric

On Fri, May 25, 2018 at 9:24 AM, Stephen HonKit Wong <stephen66 using gmail.com>
wrote:

> Dear All,
>
> I have a following for-loop code which is basically intended to read in
> many excel files (each file has many columns and rows) in a directory and
> extract the some rows and columns out of each file and then combine them
> together into a dataframe. I use for loop which can do the work but quite
> slow. How to make it faster using lapply function ?  Thanks in advance!
>
>
>
> temp.df<-c() # create an empty list to store the extracted result from each
> excel file inside for-loop
>
>
> for (i in list.files()) {  # loop through each excel file in the directory
>
>   temp<-read_xlsx(i,sheet=1,range=cell_cols(c(1,30,38:42)))  # from
> package
> "readxl" to read in excel file
>
>   temp<-temp[grep("^geneA$|^geneB$|^geneC$",temp$Id),]   # extract rows
> based on temp$id
>
>   names(temp)<-gsub("^.*] ","",names(temp)) # clean up column names
>
>   temp.df<-append(temp.df, list(as.data.frame(temp))) # change the
> dataframe to list, so it can be append to list.
>
>   if (i == list.files()[length(list.files())]){ # if it is last excel
> file,
> then combine all the rows in the list into a dataframe because they all
> have same column names
>
>     temp.df.all<-do.call("rbind",temp.df)
>
>     write_xlsx(temp.df.all, path="output.xlsx")  # write_xlsx from package
> writexl.
>
>   }
>
>   }
>
>
>
>
> *Stephen*
>
>         [[alternative HTML version deleted]]
>
> ______________________________________________
> R-help using r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/
> posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

	[[alternative HTML version deleted]]