[R] Stringr / Regular Expressions advice

Sarah Goslee sarah.goslee at gmail.com
Thu Jun 26 22:46:14 CEST 2014


Hi,

On Thu, Jun 26, 2014 at 12:17 PM, VINCENT DEAN BOYCE
<vincentdeanboyce at gmail.com> wrote:
> Hello,
>
> Using R,  I've loaded a .cvs file comprised of several hundred rows and 3
> columns of data. The data within maps the output of a triaxial
> accelerometer, a sensor which measures an object's acceleration along the
> x,y and z axes. The data for each respective column sequentially
> oscillates, and ranges numerically from 100 to 500.

If your data are numeric, why are you using stringr?

It would be easier to provide you with an answer if we knew what your
data looked like.

dput(head(yourdata, 20))

and paste that into your non-HTML email.

> I want create a function that parses the data and detects patterns across
> the three columns.
>
> For instance, I would like to detect instances when the values for the x,y
> and z columns equal 150, 200, 300 respectively. Additionally, when a match
> is detected, I would like to know how many times the pattern appears.

That's easy enough:

fakedata <- data.frame(matrix(c(
100, 100, 200,
150, 200, 300,
100, 350, 100,
400, 200, 300,
200, 500, 200,
150, 200, 300,
150, 200, 300),
ncol=3, byrow=TRUE))

v.to.match <- c(150, 200, 300)

v.matches <- apply(fakedata, 1, function(x)all(x == v.to.match))

# which rows match
which(v.matches)

# how many rows match
sum(v.matches)

> I have been successful using str_detect to provide a Boolean, however it
> seems to only work on a single vector, i.e, "400" , not a range of values
> i.e "400 - 450". See below:

This is where I get confused, and where we need sample data. Are your
data numeric, as you state above, or some other format?

If your data are character, and like "400 - 450", you can still match
them with the code I suggested above.

> # this works
>> vals <- str_detect (string = data_log$x_reading, pattern = "400")
>
> # this also works, but doesn't detect the particular range, rather the
> existence of the numbers
>> vals <- str_detect (string = data_log$x_reading, pattern = "[400-450]")

Are you trying to match any numeric value in the range 400-450? Again,
actual data.

> Also, it appears that I can only apply it to a single column, not to all
> three columns. However I may be mistaken.

You answer your own question unwittingly - apply().

Sarah

-- 
Sarah Goslee
http://www.functionaldiversity.org



More information about the R-help mailing list