[R] readxl question

Thu Aug 27 17:46:50 CEST 2020

I clearly didn't read well enough. As Petr pointed out, there is also 
the col_names argument.

```
# Solution 4a

map_dfr(files, function(cur_file, ranges){
   map_dfc(ranges, function(cur_range, df){
     read_excel(cur_file, sheet = 1, col_names = cur_range, range = 
cur_range)
   }, df = df)
}, ranges = ranges, .id = "filename")

```

On 2020-08-27 17:33, Ulrik Stervbo via R-help wrote:
> Hi Thomas,
> 
> I am not familiar with the use of the range argument, but it seems to
> me that the cell value becomes the column name. This might be fine,
> but you might get into trouble if you have repeated cell values since
> as.data.frame() will fix these.
> 
> I am also not sure about what you want, but this seems to capture your
> example (reading the same cells in a number of files):
> 
> ```
> library(readxl)
> 
> # Create test set
> path <- readxl_example("geometry.xls")
> 
> read_xls(path) # See the content
> 
> example_file1 <- tempfile(fileext = ".xls")
> example_file2 <- tempfile(fileext = ".xls")
> 
> file.copy(path, example_file1, overwrite = TRUE)
> file.copy(path, example_file2, overwrite = TRUE)
> 
> # Solve the problem using loops
> files <- c(example_file1, example_file2)
> ranges <- c("B4", "C5", "D6")
> 
> fr <- lapply(ranges, function(cur_range, files){
>   x <- lapply(files, read_xls, sheet = 1, range = cur_range)
>   t(as.data.frame(x))
> }, files = files)
> 
> # Loop over fr and save content if needed
> ```
> 
> A couple of variations over the theme, where the cell content is
> accessed after reading the file. This will not work well if the data
> in the excel files does not start at A1, but if you can adjust for
> this it should work just fine
> 
> ```
> # Solution #2
> 
> # Read the whole excel file, and access just the column - row
> # This will give really unexpected results if the data does not start 
> in the
> # cell A1 as is the case for geometry.xls. Also, it does not work with 
> ranges
> # spaning more than a single cell
> files <- rep(readxl_example("datasets.xlsx"), 3)
> ranges <- c("B4", "C5", "D6")
> 
> # Loop over the files to avoid re-reading
> fr <- lapply(files, function(cur_file, ranges){
>   df <- read_excel(cur_file, sheet = 1)
>   x <- lapply(ranges, function(cur_range, df){
>     cr <- cellranger::as.cell_addr(cur_range, strict = FALSE)
>     df[cr$row, cr$col][[1]]
>   }, df = df)
>   as.data.frame(setNames(x, ranges))
> 
> }, ranges = ranges)
> 
> # Solution 3
> # Like solution 2 but using purr
> 
> library(purrr)
> 
> files <- rep(readxl_example("datasets.xlsx"), 3)
> ranges <- c("B4", "C5", "D6")
> 
> map_dfr(files, function(cur_file, ranges){
>   map_dfc(ranges, function(cur_range, df){
>     df <- read_excel(cur_file, sheet = 1)
>     cr <- cellranger::as.cell_addr(cur_range, strict = FALSE)
>     setNames(df[cr$row, cr$col], cur_range)
>   }, df = df)
> 
> }, ranges = ranges)
> 
> # Solution 4
> # Like solution 3, but with the addition of the file name and producing 
> a single
> # data.frame at the end
> 
> library(purrr)
> 
> path <- readxl_example("datasets.xls")
> example_file1 <- tempfile(fileext = "_1.xls")
> example_file2 <- tempfile(fileext = "_2.xls")
> example_file3 <- tempfile(fileext = "_3.xls")
> 
> file.copy(path, example_file1, overwrite = TRUE)
> file.copy(path, example_file2, overwrite = TRUE)
> file.copy(path, example_file3, overwrite = TRUE)
> 
> files <- c(example_file1, example_file2, example_file3)
> 
> # Name the file paths with the file names. We can them make use of the 
> .id
> # argument to map_dfr()
> files <- setNames(files, basename(files))
> ranges <- c("B4", "C5", "D6")
> 
> map_dfr(files, function(cur_file, ranges){
>   map_dfc(ranges, function(cur_range, df){
>     df <- read_excel(cur_file, sheet = 1)
>     cr <- cellranger::as.cell_addr(cur_range, strict = FALSE)
>     setNames(df[cr$row, cr$col], cur_range)
>   }, df = df)
> }, ranges = ranges, .id = "filename")
> ```
> 
> HTH
> Ulrik
> 
> On 2020-08-26 15:38, PIKAL Petr wrote:
>> Hi
>> 
>> As OP has only about 250 files and in read_excel you cannot specify 
>> several
>> ranges at once, reading those values separately and concatenating them
>> together in one step seems to be the most efficient way. One probably 
>> could
>> design such function, but time spent on the function performing the 
>> task
>> only once is probably bigger than performing 250*3 reads.
>> 
>> I see inefficiency in writing each column into separate text file and
>> coppying it back to Excel file.
>> 
>> Cheers
>> Petr
>> 
>>> -----Original Message-----
>>> From: Upton, Stephen (Steve) (CIV) <scupton using nps.edu>
>>> Sent: Wednesday, August 26, 2020 2:44 PM
>>> To: PIKAL Petr <petr.pikal using precheza.cz>; Thomas Subia 
>>> <tgs77m using yahoo.com>
>>> Cc: r-help using r-project.org
>>> Subject: RE: [R] readxl question
>>> 
>>> From your example, it appears you are reading in the same excel file 
>>> for
>>> each function to get a value. I would look at creating a function 
>>> that
>>> extracts what you need from each file all at once, rather than 
>>> separate
>>> reads.
>>> 
>>> Stephen C. Upton
>>> SEED (Simulation Experiments & Efficient Designs) Center for Data 
>>> Farming
>>> SEED Center website: https://harvest.nps.edu
>>> 
>>> -----Original Message-----
>>> From: R-help [mailto:r-help-bounces using r-project.org] On Behalf Of PIKAL 
>>> Petr
>>> Sent: Wednesday, August 26, 2020 3:50 AM
>>> To: Thomas Subia <tgs77m using yahoo.com>
>>> Cc: r-help using r-project.org
>>> Subject: Re: [R] readxl question
>>> 
>>> NPS WARNING: *external sender* verify before acting.
>>> 
>>> 
>>> Hi
>>> 
>>> 
>>> Are you sure that your command read values from respective cells?
>>> 
>>> I tried it and got empty data frame with names
>>> > WO <- lapply(files, read_excel, sheet=1, range=("B3"))
>>> > as.data.frame(WO)
>>> [1] ano                 TP303               X96
>>> [4] X0                  X3.7519999999999998 X26.7
>>> <0 rows> (or 0-length row.names)
>>> 
>>> To get data, col_names argument should be set to FALSE WO <- 
>>> lapply(files,
>>> read_excel, sheet=1, range=("B3"), col_names=FALSE)
>>> WO2 <- lapply(files, read_excel, sheet=1, range=("B5"), 
>>> col_names=FALSE)
>>> 
>>> After that unlist and one rbind together with t should be enough to 
>>> give
>> you
>>> one table WO <- unlist(WO)
>>> WO2 <- unlist(WO2)
>>> result <- t(rbind(WO, WO2))
>>> result
>>>      WO      WO2
>>> ...1 "ano"   "ano"
>>> ...1 "TP303" "261119/2"
>>> ...1 "96"    "288"
>>> ...1 "0"     "192"
>>> ...1 "3.752" "25.92094"
>>> ...1 "26.7"  "38.6"
>>> >
>>> 
>>> And instead txt document you could do
>>> 
>>> write.table(result, "result.xls", sep = "\t", row.names = F)
>>> 
>>> And now "result.xls" is directly readable with Excel
>>> 
>>> Cheers
>>> Petr
>>> 
>>> >
>>> > -----Original Message-----
>>> > From: R-help <r-help-bounces using r-project.org> On Behalf Of Thomas Subia
>>> > via R-help
>>> > Sent: Saturday, August 22, 2020 6:25 AM
>>> > To: r-help using r-project.org
>>> > Subject: [R] readxl question
>>> >
>>> > Colleagues,
>>> >
>>> >
>>> >
>>> > I have 250 Excel files in a directory. Each of those files has the
>>> > same
>>> layout.
>>> > The problem is that the data in each Excel data is not in rectangular
>>> form. I've
>>> > been using readxl to extract the data which I need.
>>> > Each of my metrics are stored in a particular cell. For each metric, I
>>> create text
>>> > files which stores my metrics.
>>> >
>>> >
>>> >
>>> > library(plyr)
>>> >
>>> > library(readxl)
>>> >
>>> >
>>> >
>>> > files <- list.files(pattern="*.xls", full.names = FALSE)
>>> >
>>> >
>>> >
>>> > # Extract Work Order
>>> >
>>> > WO <- lapply(files, read_excel, sheet="Sheet1", range=("B9")) WO_list
>>> > <-
>>> > as.data.frame(WO) trans_WO <- t(WO_list) write.table(trans_WO
>>> > ,"WO.txt")
>>> >
>>> >
>>> >
>>> > # Extract bubble 14_1
>>> >
>>> > BUBBLE_14_1 <- lapply(files, read_excel, sheet="Sheet1",
>>> > range=("c46")) BUBBLE_14_1_list <- as.data.frame(BUBBLE_14_1)
>>> >
>>> > trans_BUBBLE_14_1 <- t(BUBBLE_14_1_list)
>>> >
>>> >
>>> >
>>> > write.table(trans_BUBBLE_14_1,"BUBBLE_14_1.txt")
>>> >
>>> >
>>> >
>>> >
>>> >
>>> > # Extract bubble 14_2
>>> >
>>> > BUBBLE_14_2 <- lapply(files, read_excel, sheet="Sheet1",
>>> > range=("c62")) BUBBLE_14_2_list <- as.data.frame(BUBBLE_14_2)
>>> >
>>> > trans_BUBBLE_14_2 <- t(BUBBLE_14_2_list)
>>> >
>>> > write.table(trans_BUBBLE_14_2,"BUBBLE_14_2.txt")
>>> >
>>> >
>>> >
>>> > After the text files have been created, I cut and paste the contents
>>> > of
>>> each
>>> > text file to Excel.
>>> >
>>> > This has worked fine if the number of cells I am extracting from a
>>> > file is
>>> small.
>>> >
>>> > If the number gets larger, this method is inefficient.
>>> >
>>> >
>>> >
>>> > Any advice on how to do this would be appreciated.
>>> >
>>> >
>>> >
>>> > All the best,
>>> >
>>> >
>>> >
>>> > Thomas Subia
>>> >
>>> >
>>> >       [[alternative HTML version deleted]]
>>> >
>>> > ______________________________________________
>>> > R-help using r-project.org mailing list -- To UNSUBSCRIBE and more, see
>>> > https://stat.ethz.ch/mailman/listinfo/r-help
>>> > PLEASE do read the posting guide http://www.R-project.org/posting-
>>> > guide.html and provide commented, minimal, self-contained,
>>> > reproducible code.
>> 
>> ______________________________________________
>> R-help using r-project.org mailing list -- To UNSUBSCRIBE and more, see
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide 
>> http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
> 
> ______________________________________________
> R-help using r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide 
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.