[R] find unique and summerize
Rui Barradas
ruipbarradas at sapo.pt
Sat Feb 3 06:26:16 CET 2018
Hello,
Thanks for the reproducible example.
See if the following does what you want.
IDNum <- sub("^(\\d+).*", "\\1", mydata$ID)
Country <- sub("^\\d+(.*)", "\\1", mydata$ID)
tbl1 <- table(Country, IDNum)
addmargins(tbl1)
tbl2 <- xtabs(Y ~ Country + IDNum, mydata)
addmargins(tbl2)
Hope this helps,
Rui Barradas
On 2/3/2018 3:00 AM, Val wrote:
> Hi all,
>
> I have a data set need to be summarized by unique ID (count and sum of a
> variable)
> A unique individual ID (country name Abbreviation followed by an integer
> numbers) may have observation in several countries. Then the ID was
> changed by adding the country code as a prefix and new ID was constructed
> or recorded like (country code, + the original unique ID Example
> original ID "CAN1540164" , if this ID has an observation in CANADA then
> the ID was changed to "1CAN1540164". From this new ID I want get out
> the country code get the original unique ID and summarize the data by
> unique ID and country code
>
> The data set look like
> mydata <- read.table(textConnection("GR ID iflag Y
> A 1CAN1540164 1 20
> A 1CAN1540164 1 12
> A 1CAN1540164 1 15
> A 44CAN1540164 1 30
> A 44CAN1540164 1 24
> A 44CAN1540164 1 25
> A 44CAN1540164 1 11
> A 33CAN1540164 1 12
> A 33CAN1540164 1 23
> A 33CAN1540164 1 65
> A 33CAN1540164 1 41
> A 358CAN1540164 1 28
> A 358CAN1540164 1 32
> A 358CAN1540164 1 41
> A 358CAN1540164 1 54
> A 358CAN1540164 1 29
> A 358CAN1540164 1 64
> B 1USA1540165 1 125
> B 1USA1540165 1 165
> B 44USA1540165 1 171
> B 33USA1540165 1 254
> B 33USA1540165 1 241
> B 33USA1540165 1 262
> B 358USA1540165 1 321
> C 358FIN1540166 1 225 "),header = TRUE ,stringsAsFactors = FALSE)
>
> From the above data there are three unique IDs and four country codes (1,
> 44, 33 and 358)
>
> I want the following two tables
>
> Table 1. count the unique ID by country code
> 1 44 33 358 TOT
> CAN1540164 3 4 4 6 17
> USA1540165 2 1 3 1 7
> FIN1540166 - - - 1 1
> TOT 5 5 7 8 25
>
>
> Table 2 Sum of Y variable by unique ID and country. code
>
> 1 44 33 358 TOT
> CAN1540164 47 90 141 248 526
> USA1540165 290 171 757 321 1539
> FIN1540166 - - - 225 225
> TOT 337 261 898 794 2290
>
>
> How do I do it in R?
>
> The first step is to get the unique country codes unique ID by splitting
> the new ID
>
> Thank you in advance
>
> [[alternative HTML version deleted]]
>
> ______________________________________________
> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>
More information about the R-help
mailing list