[R] how to make the code more efficient using lapply

Fri May 25 08:24:12 CEST 2018

Dear All,

I have a following for-loop code which is basically intended to read in
many excel files (each file has many columns and rows) in a directory and
extract the some rows and columns out of each file and then combine them
together into a dataframe. I use for loop which can do the work but quite
slow. How to make it faster using lapply function ?  Thanks in advance!

temp.df<-c() # create an empty list to store the extracted result from each
excel file inside for-loop

for (i in list.files()) {  # loop through each excel file in the directory

  temp<-read_xlsx(i,sheet=1,range=cell_cols(c(1,30,38:42)))  # from package
"readxl" to read in excel file

  temp<-temp[grep("^geneA$|^geneB$|^geneC$",temp$Id),]   # extract rows
based on temp$id

  names(temp)<-gsub("^.*] ","",names(temp)) # clean up column names

  temp.df<-append(temp.df, list(as.data.frame(temp))) # change the
dataframe to list, so it can be append to list.

  if (i == list.files()[length(list.files())]){ # if it is last excel file,
then combine all the rows in the list into a dataframe because they all
have same column names

    temp.df.all<-do.call("rbind",temp.df)

    write_xlsx(temp.df.all, path="output.xlsx")  # write_xlsx from package
writexl.

  }

  }

*Stephen*

	[[alternative HTML version deleted]]