[R] data frame manipulation and regex
arnaud Gaboury
arnaud.gaboury at gmail.com
Wed Apr 28 14:30:18 CEST 2010
TY so much david. We are getting close. But I need to keep "USD" in my
object name (i.e "STANDARD LEAD USD")
***************************
Arnaud Gaboury
Mobile: +41 79 392 79 56
BBM: 255B488F
***************************
> -----Original Message-----
> From: David Winsemius [mailto:dwinsemius at comcast.net]
> Sent: Wednesday, April 28, 2010 2:25 PM
> To: arnaud Gaboury
> Cc: r-help at r-project.org
> Subject: Re: [R] data frame manipulation and regex
>
>
> On Apr 28, 2010, at 5:14 AM, arnaud Gaboury wrote:
>
> > Dear group,
> >
> > Here is my data.frame :
> >
> > avprix <-
> > structure(list(DESCRIPTION = c("CORN Jul/10", "CORN May/10",
> > "ROBUSTA COFFEE (10) Jul/10", "SOYBEANS Jul/10", "SPCL HIGH GRADE
> > ZINC USD
> > Jul/10",
> > "STANDARD LEAD USD Jul/10"), prix = c(-1.5, -1082, 11084, 1983.5,
> > -2464, -118), quantity = c(0, -3, 8, 2, -1, 0)), .Names =
> > c("DESCRIPTION",
> > "prix", "quantity"), row.names = c(NA, -6L), class = "data.frame")
> >
> >> avprix
> > DESCRIPTION prix quantity
> > 1 CORN Jul/10 -1.5 0
> > 2 CORN May/10 -1082.0 -3
> > 3 ROBUSTA COFFEE (10) Jul/10 11084.0 8
> > 4 SOYBEANS Jul/10 1983.5 2
> > 5 SPCL HIGH GRADE ZINC USD Jul/10 -2464.0 -1
> > 6 STANDARD LEAD USD Jul/10 -118.0 0
> >
> > I need to remove the date (i.e. Jul/10 in this example) for each
> > element of
> > the DESCRIPTION column that contains the USD symbol. I am trying to
> > do this
> > using regular expressions, but must admit I am going nowhere.
> > My elements in the DESCRIPTION column and the dates can change every
> > day.
>
> This searches for the pattern USD and then replaces any three
> characters , forward-slash, any two characters:
> > sub("USD+.*(.../..)", "", avprix$DESCRIPTION)
> [1] "CORN Jul/10" "CORN May/10" "ROBUSTA
> COFFEE (10) Jul/10"
> [4] "SOYBEANS Jul/10" "SPCL HIGH GRADE ZINC "
> "STANDARD LEAD "
>
> This tightens up the matching by requiring that that the characters
> after the slash be digits:
>
> > sub("USD+.*(.../\\d{2})", "", avprix$DESCRIPTION)
> [1] "CORN Jul/10" "CORN May/10" "ROBUSTA
> COFFEE (10) Jul/10"
> [4] "SOYBEANS Jul/10" "SPCL HIGH GRADE ZINC "
> "STANDARD LEAD "
>
> -- David.
>
>
> >
> >
> > TY for any help.
> >
> > ______________________________________________
> > R-help at r-project.org mailing list
> > https://stat.ethz.ch/mailman/listinfo/r-help
> > PLEASE do read the posting guide http://www.R-project.org/posting-
> guide.html
> > and provide commented, minimal, self-contained, reproducible code.
>
> David Winsemius, MD
> West Hartford, CT
More information about the R-help
mailing list