[R] readxl question
Ulrik Stervbo
U|r|k@Stervbo @end|ng |rom ruhr-un|-bochum@de
Thu Aug 27 17:46:50 CEST 2020
I clearly didn't read well enough. As Petr pointed out, there is also
the col_names argument.
```
# Solution 4a
map_dfr(files, function(cur_file, ranges){
map_dfc(ranges, function(cur_range, df){
read_excel(cur_file, sheet = 1, col_names = cur_range, range =
cur_range)
}, df = df)
}, ranges = ranges, .id = "filename")
```
On 2020-08-27 17:33, Ulrik Stervbo via R-help wrote:
> Hi Thomas,
>
> I am not familiar with the use of the range argument, but it seems to
> me that the cell value becomes the column name. This might be fine,
> but you might get into trouble if you have repeated cell values since
> as.data.frame() will fix these.
>
> I am also not sure about what you want, but this seems to capture your
> example (reading the same cells in a number of files):
>
> ```
> library(readxl)
>
> # Create test set
> path <- readxl_example("geometry.xls")
>
> read_xls(path) # See the content
>
> example_file1 <- tempfile(fileext = ".xls")
> example_file2 <- tempfile(fileext = ".xls")
>
> file.copy(path, example_file1, overwrite = TRUE)
> file.copy(path, example_file2, overwrite = TRUE)
>
> # Solve the problem using loops
> files <- c(example_file1, example_file2)
> ranges <- c("B4", "C5", "D6")
>
> fr <- lapply(ranges, function(cur_range, files){
> x <- lapply(files, read_xls, sheet = 1, range = cur_range)
> t(as.data.frame(x))
> }, files = files)
>
> # Loop over fr and save content if needed
> ```
>
> A couple of variations over the theme, where the cell content is
> accessed after reading the file. This will not work well if the data
> in the excel files does not start at A1, but if you can adjust for
> this it should work just fine
>
> ```
> # Solution #2
>
> # Read the whole excel file, and access just the column - row
> # This will give really unexpected results if the data does not start
> in the
> # cell A1 as is the case for geometry.xls. Also, it does not work with
> ranges
> # spaning more than a single cell
> files <- rep(readxl_example("datasets.xlsx"), 3)
> ranges <- c("B4", "C5", "D6")
>
> # Loop over the files to avoid re-reading
> fr <- lapply(files, function(cur_file, ranges){
> df <- read_excel(cur_file, sheet = 1)
> x <- lapply(ranges, function(cur_range, df){
> cr <- cellranger::as.cell_addr(cur_range, strict = FALSE)
> df[cr$row, cr$col][[1]]
> }, df = df)
> as.data.frame(setNames(x, ranges))
>
> }, ranges = ranges)
>
> # Solution 3
> # Like solution 2 but using purr
>
> library(purrr)
>
> files <- rep(readxl_example("datasets.xlsx"), 3)
> ranges <- c("B4", "C5", "D6")
>
> map_dfr(files, function(cur_file, ranges){
> map_dfc(ranges, function(cur_range, df){
> df <- read_excel(cur_file, sheet = 1)
> cr <- cellranger::as.cell_addr(cur_range, strict = FALSE)
> setNames(df[cr$row, cr$col], cur_range)
> }, df = df)
>
> }, ranges = ranges)
>
> # Solution 4
> # Like solution 3, but with the addition of the file name and producing
> a single
> # data.frame at the end
>
> library(purrr)
>
> path <- readxl_example("datasets.xls")
> example_file1 <- tempfile(fileext = "_1.xls")
> example_file2 <- tempfile(fileext = "_2.xls")
> example_file3 <- tempfile(fileext = "_3.xls")
>
> file.copy(path, example_file1, overwrite = TRUE)
> file.copy(path, example_file2, overwrite = TRUE)
> file.copy(path, example_file3, overwrite = TRUE)
>
> files <- c(example_file1, example_file2, example_file3)
>
> # Name the file paths with the file names. We can them make use of the
> .id
> # argument to map_dfr()
> files <- setNames(files, basename(files))
> ranges <- c("B4", "C5", "D6")
>
> map_dfr(files, function(cur_file, ranges){
> map_dfc(ranges, function(cur_range, df){
> df <- read_excel(cur_file, sheet = 1)
> cr <- cellranger::as.cell_addr(cur_range, strict = FALSE)
> setNames(df[cr$row, cr$col], cur_range)
> }, df = df)
> }, ranges = ranges, .id = "filename")
> ```
>
> HTH
> Ulrik
>
> On 2020-08-26 15:38, PIKAL Petr wrote:
>> Hi
>>
>> As OP has only about 250 files and in read_excel you cannot specify
>> several
>> ranges at once, reading those values separately and concatenating them
>> together in one step seems to be the most efficient way. One probably
>> could
>> design such function, but time spent on the function performing the
>> task
>> only once is probably bigger than performing 250*3 reads.
>>
>> I see inefficiency in writing each column into separate text file and
>> coppying it back to Excel file.
>>
>> Cheers
>> Petr
>>
>>> -----Original Message-----
>>> From: Upton, Stephen (Steve) (CIV) <scupton using nps.edu>
>>> Sent: Wednesday, August 26, 2020 2:44 PM
>>> To: PIKAL Petr <petr.pikal using precheza.cz>; Thomas Subia
>>> <tgs77m using yahoo.com>
>>> Cc: r-help using r-project.org
>>> Subject: RE: [R] readxl question
>>>
>>> From your example, it appears you are reading in the same excel file
>>> for
>>> each function to get a value. I would look at creating a function
>>> that
>>> extracts what you need from each file all at once, rather than
>>> separate
>>> reads.
>>>
>>> Stephen C. Upton
>>> SEED (Simulation Experiments & Efficient Designs) Center for Data
>>> Farming
>>> SEED Center website: https://harvest.nps.edu
>>>
>>> -----Original Message-----
>>> From: R-help [mailto:r-help-bounces using r-project.org] On Behalf Of PIKAL
>>> Petr
>>> Sent: Wednesday, August 26, 2020 3:50 AM
>>> To: Thomas Subia <tgs77m using yahoo.com>
>>> Cc: r-help using r-project.org
>>> Subject: Re: [R] readxl question
>>>
>>> NPS WARNING: *external sender* verify before acting.
>>>
>>>
>>> Hi
>>>
>>>
>>> Are you sure that your command read values from respective cells?
>>>
>>> I tried it and got empty data frame with names
>>> > WO <- lapply(files, read_excel, sheet=1, range=("B3"))
>>> > as.data.frame(WO)
>>> [1] ano TP303 X96
>>> [4] X0 X3.7519999999999998 X26.7
>>> <0 rows> (or 0-length row.names)
>>>
>>> To get data, col_names argument should be set to FALSE WO <-
>>> lapply(files,
>>> read_excel, sheet=1, range=("B3"), col_names=FALSE)
>>> WO2 <- lapply(files, read_excel, sheet=1, range=("B5"),
>>> col_names=FALSE)
>>>
>>> After that unlist and one rbind together with t should be enough to
>>> give
>> you
>>> one table WO <- unlist(WO)
>>> WO2 <- unlist(WO2)
>>> result <- t(rbind(WO, WO2))
>>> result
>>> WO WO2
>>> ...1 "ano" "ano"
>>> ...1 "TP303" "261119/2"
>>> ...1 "96" "288"
>>> ...1 "0" "192"
>>> ...1 "3.752" "25.92094"
>>> ...1 "26.7" "38.6"
>>> >
>>>
>>> And instead txt document you could do
>>>
>>> write.table(result, "result.xls", sep = "\t", row.names = F)
>>>
>>> And now "result.xls" is directly readable with Excel
>>>
>>> Cheers
>>> Petr
>>>
>>> >
>>> > -----Original Message-----
>>> > From: R-help <r-help-bounces using r-project.org> On Behalf Of Thomas Subia
>>> > via R-help
>>> > Sent: Saturday, August 22, 2020 6:25 AM
>>> > To: r-help using r-project.org
>>> > Subject: [R] readxl question
>>> >
>>> > Colleagues,
>>> >
>>> >
>>> >
>>> > I have 250 Excel files in a directory. Each of those files has the
>>> > same
>>> layout.
>>> > The problem is that the data in each Excel data is not in rectangular
>>> form. I've
>>> > been using readxl to extract the data which I need.
>>> > Each of my metrics are stored in a particular cell. For each metric, I
>>> create text
>>> > files which stores my metrics.
>>> >
>>> >
>>> >
>>> > library(plyr)
>>> >
>>> > library(readxl)
>>> >
>>> >
>>> >
>>> > files <- list.files(pattern="*.xls", full.names = FALSE)
>>> >
>>> >
>>> >
>>> > # Extract Work Order
>>> >
>>> > WO <- lapply(files, read_excel, sheet="Sheet1", range=("B9")) WO_list
>>> > <-
>>> > as.data.frame(WO) trans_WO <- t(WO_list) write.table(trans_WO
>>> > ,"WO.txt")
>>> >
>>> >
>>> >
>>> > # Extract bubble 14_1
>>> >
>>> > BUBBLE_14_1 <- lapply(files, read_excel, sheet="Sheet1",
>>> > range=("c46")) BUBBLE_14_1_list <- as.data.frame(BUBBLE_14_1)
>>> >
>>> > trans_BUBBLE_14_1 <- t(BUBBLE_14_1_list)
>>> >
>>> >
>>> >
>>> > write.table(trans_BUBBLE_14_1,"BUBBLE_14_1.txt")
>>> >
>>> >
>>> >
>>> >
>>> >
>>> > # Extract bubble 14_2
>>> >
>>> > BUBBLE_14_2 <- lapply(files, read_excel, sheet="Sheet1",
>>> > range=("c62")) BUBBLE_14_2_list <- as.data.frame(BUBBLE_14_2)
>>> >
>>> > trans_BUBBLE_14_2 <- t(BUBBLE_14_2_list)
>>> >
>>> > write.table(trans_BUBBLE_14_2,"BUBBLE_14_2.txt")
>>> >
>>> >
>>> >
>>> > After the text files have been created, I cut and paste the contents
>>> > of
>>> each
>>> > text file to Excel.
>>> >
>>> > This has worked fine if the number of cells I am extracting from a
>>> > file is
>>> small.
>>> >
>>> > If the number gets larger, this method is inefficient.
>>> >
>>> >
>>> >
>>> > Any advice on how to do this would be appreciated.
>>> >
>>> >
>>> >
>>> > All the best,
>>> >
>>> >
>>> >
>>> > Thomas Subia
>>> >
>>> >
>>> > [[alternative HTML version deleted]]
>>> >
>>> > ______________________________________________
>>> > R-help using r-project.org mailing list -- To UNSUBSCRIBE and more, see
>>> > https://stat.ethz.ch/mailman/listinfo/r-help
>>> > PLEASE do read the posting guide http://www.R-project.org/posting-
>>> > guide.html and provide commented, minimal, self-contained,
>>> > reproducible code.
>>
>> ______________________________________________
>> R-help using r-project.org mailing list -- To UNSUBSCRIBE and more, see
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide
>> http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>
> ______________________________________________
> R-help using r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
More information about the R-help
mailing list