[R] regex

Ivan Calandra c@|@ndr@ @end|ng |rom rgzm@de
Tue Sep 17 08:48:43 CEST 2019


Dear useRs,

I still have problems using regular expressions. I have two problems for 
which I have found workarounds, but I'm sure there are better ways of 
doing it.

1) list CSV files with "_w_" in the name

Here is a sample of the files in the folder:
myfiles <- c("BU-072_1_E1_RE_SEC-01_local_a_0.2_0.2.csv", 
"BU-072_1_E1_RE_SEC-01_local_a_0.2_0.6.csv","BU-072_1_E1_RE_SEC-01_local_a_0.4_1.0.csv", 
"BU-072_1_E1_RE_SEC-01_local_a_1.0_0.2.csv","BU-072_1_E1_RE_SEC-01_local_a_1.0_0.6.csv", 
"BU-072_1_E1_RE_SEC-01_local_w_0.2_0.2.csv","BU-072_1_E1_RE_SEC-01_local_w_0.2_0.6.csv", 
"BU-072_1_E1_RE_SEC-01_local_w_0.4_1.0.csv","BU-072_1_E1_RE_SEC-01_local_w_1.0_0.2.csv", 
"BU-072_1_E1_RE_SEC-01_local_w_1.0_0.6.csv","BU-072_1_E1_RE_SEC-01_local_w_1.0_1.0.csv", 
"BU-072_1_E1_RE_SEC-01_local_a_0.2_0.2.xls","BU-072_1_E1_RE_SEC-01_local_a_0.2_0.6.xls", 
"BU-072_1_E1_RE_SEC-01_local_a_0.4_1.0.xls","BU-072_1_E1_RE_SEC-01_local_a_1.0_0.2.xls", 
"BU-072_1_E1_RE_SEC-01_local_a_1.0_0.6.xls","BU-072_1_E1_RE_SEC-01_local_w_0.2_0.2.xls", 
"BU-072_1_E1_RE_SEC-01_local_w_0.2_0.6.xls","BU-072_1_E1_RE_SEC-01_local_w_0.4_1.0.xls", 
"BU-072_1_E1_RE_SEC-01_local_w_1.0_0.2.xls","BU-072_1_E1_RE_SEC-01_local_w_1.0_0.6.xls", 
"BU-072_1_E1_RE_SEC-01_local_w_1.0_1.0.xls")

Here is what I did: CSVs <- list.files(path=..., pattern="\\.csv$") 
w.files <- CSVs[grep(pattern="_w_", CSVs)]

Of course, what I would like to do is list only the interesting files 
from the beginning, rather than subsetting the whole list of files. In 
other words, having a pattern that includes both "\\.csv$" and "_w_" in 
the list.files() call. I tried "_w_&\\.csv$" but it returns an empty vector.

2) The units of the variables are given in the original headers. I would 
like to extract the units. This is what I did: headers <- c("dist to 
origin on curve [mm]","segment on section [mm]", "angle 1 [degree]", 
"angle 2 [degree]","angle 3 [degree]") units.var <- 
gsub(pattern="^.*\\[|\\]$", "", headers)

It seems to be to overly complicated using gsub(). Isn't there a way to 
extract what is interesting rather than deleting what is not?

Thank you for your help! Best, Ivan

-- 
Dr. Ivan Calandra
TraCEr, laboratory for Traceology and Controlled Experiments
MONREPOS Archaeological Research Centre and
Museum for Human Behavioural Evolution
Schloss Monrepos
56567 Neuwied, Germany
+49 (0) 2631 9772-243
https://www.researchgate.net/profile/Ivan_Calandra



More information about the R-help mailing list