[R] find unique and summerize
Val
valkremk at gmail.com
Sat Feb 3 04:00:30 CET 2018
Hi all,
I have a data set need to be summarized by unique ID (count and sum of a
variable)
A unique individual ID (country name Abbreviation followed by an integer
numbers) may have observation in several countries. Then the ID was
changed by adding the country code as a prefix and new ID was constructed
or recorded like (country code, + the original unique ID Example
original ID "CAN1540164" , if this ID has an observation in CANADA then
the ID was changed to "1CAN1540164". From this new ID I want get out
the country code get the original unique ID and summarize the data by
unique ID and country code
The data set look like
mydata <- read.table(textConnection("GR ID iflag Y
A 1CAN1540164 1 20
A 1CAN1540164 1 12
A 1CAN1540164 1 15
A 44CAN1540164 1 30
A 44CAN1540164 1 24
A 44CAN1540164 1 25
A 44CAN1540164 1 11
A 33CAN1540164 1 12
A 33CAN1540164 1 23
A 33CAN1540164 1 65
A 33CAN1540164 1 41
A 358CAN1540164 1 28
A 358CAN1540164 1 32
A 358CAN1540164 1 41
A 358CAN1540164 1 54
A 358CAN1540164 1 29
A 358CAN1540164 1 64
B 1USA1540165 1 125
B 1USA1540165 1 165
B 44USA1540165 1 171
B 33USA1540165 1 254
B 33USA1540165 1 241
B 33USA1540165 1 262
B 358USA1540165 1 321
C 358FIN1540166 1 225 "),header = TRUE ,stringsAsFactors = FALSE)
>From the above data there are three unique IDs and four country codes (1,
44, 33 and 358)
I want the following two tables
Table 1. count the unique ID by country code
1 44 33 358 TOT
CAN1540164 3 4 4 6 17
USA1540165 2 1 3 1 7
FIN1540166 - - - 1 1
TOT 5 5 7 8 25
Table 2 Sum of Y variable by unique ID and country. code
1 44 33 358 TOT
CAN1540164 47 90 141 248 526
USA1540165 290 171 757 321 1539
FIN1540166 - - - 225 225
TOT 337 261 898 794 2290
How do I do it in R?
The first step is to get the unique country codes unique ID by splitting
the new ID
Thank you in advance
[[alternative HTML version deleted]]
More information about the R-help
mailing list