[R] Extract

Val v@|kremk @end|ng |rom gm@||@com
Fri Jul 19 20:23:48 CEST 2024


Thank you and sorry for the confusion.
The desired result should have 8 variables as a comma separated in
each line.  The string variable  is  considered as one variable.
The output of your script is wfine for me.  Thank you!

On Fri, Jul 19, 2024 at 1:00 PM Ebert,Timothy Aaron <tebert using ufl.edu> wrote:
>
> The desired result is odd.
> 1) It looks like the string is duplicated in the desired result. The first line of data has "15, xc, Ab",  and the desired result has "15, xc, Ab, 15, xc, Ab"
> 2) The example has S1 through S5, but the desired result has data for eight variables in the first line (not five).
> 3) The desired result has a different number of variables for each line.
> 4) Are you assuming that all missing data is at the end of the string? If there are 5 variables (S1 .... S5), do you know that "15, xc, Ab" is S1 = 15, S2 = 'xc', and S3 = 'Ab' rather than S2=15, S4='xc' and S5='Ab' ?
>
> This isn't exactly what you asked for, but maybe I was confused somewhere. This approach puts string data into variables in order. In this approach one mixes string and numeric data. The string is not duplicated.
>
> library(tidyr)
>
> dat <- read.csv(text="Year,Sex,string
> 2002,F,15 xc Ab
> 2003,F,14
> 2004,M,18 xb 25 35 21
> 2005,M,13 25
> 2006,M,14 ac 256 AV 35
> 2007,F,11", header=TRUE, stringsAsFactors=FALSE)
>
> # split the 'string' column based on spaces
> dat_separated <- dat |>
>   separate(string, into = paste0("S", 1:5), sep = " ",
>            fill = "right", extra = "merge")
>
> Tim
>
>
> -----Original Message-----
> From: R-help <r-help-bounces using r-project.org> On Behalf Of Val
> Sent: Friday, July 19, 2024 12:52 PM
> To: r-help using R-project.org (r-help using r-project.org) <r-help using r-project.org>
> Subject: [R] Extract
>
> [External Email]
>
> Hi All,
>
> I want to extract new variables from a string and add it to the dataframe.
> Sample data is csv file.
>
> dat<-read.csv(text="Year, Sex,string
> 2002,F,15 xc Ab
> 2003,F,14
> 2004,M,18 xb 25 35 21
> 2005,M,13 25
> 2006,M,14 ac 256 AV 35
> 2007,F,11",header=TRUE)
>
> The string column has  a maximum of five variables. Some rows have all and others may not have all the five variables. If missing then  fill it with NA, Desired result is shown below,
>
>
> Year,Sex,string, S1, S2, S3 S4,S5
> 2002,F,15 xc Ab, 15,xc,Ab, NA, NA
> 2003,F,14, 14,NA,NA,NA,NA
> 2004,M,18 xb 25 35 21,18, xb, 25, 35, 21
> 2005,M,13 25,13, 25,NA,NA,NA
> 2006,M,14 ac 256 AV 35, 14, ac, 256, AV, 35
> 2007,F,11, 11,NA,NA,NA,NA
>
> Any help?
> Thank you in advance.
>
> ______________________________________________
> R-help using r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.r-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.



More information about the R-help mailing list