[R] find unique and summerize
Val
valkremk at gmail.com
Mon Feb 5 02:20:49 CET 2018
Thank you so much Rui!
On Sun, Feb 4, 2018 at 12:20 AM, Rui Barradas <ruipbarradas at sapo.pt> wrote:
> Hello,
>
> Please always cc the list.
>
> As for the question, I believe the following does it.
>
> a <- strsplit(mydata$ID, "[[:alpha:]]+")
> b <- strsplit(mydata$ID, "[[:digit:]]+")
>
> a <- sapply(a, `[`, 1)
> c <- sapply(a, `[`, 2)
> b <- sapply(b, function(x) x[x != ""])
>
> c2 <- sprintf("%010d", as.integer(c))
>
> newID <- paste0(a, b, c2)
>
>
> Hope this helps,
>
> Rui Barradas
>
> On 2/4/2018 2:01 AM, Val wrote:
>
>> Thank you so much again for your help!
>>
>> I have one more question related to this.
>>
>> 1. How do I further split this "358USA1540165 " into three parts.
>> a) 358
>> b) USA
>> c) 1540165
>>
>> I want to add leading zeros to the third part like "0001540165"
>> and then combine b and c to get this USA1540165
>> so USA1540165 changed to USA1540165
>>
>> The other one is that the data set has several country codes and if I
>> want to limit my data set to only certain country codes , how do I do that.
>>
>> Thank you again
>>
>>
>>
>>
>> On Sat, Feb 3, 2018 at 1:05 PM, Rui Barradas <ruipbarradas at sapo.pt
>> <mailto:ruipbarradas at sapo.pt>> wrote:
>>
>> Hello,
>>
>> As for the first question, instead of writing a xlsx file, maybe it
>> is easier to write a csv file and then open it with Excel.
>>
>> tbl2 <- addmargins(tbl1)
>> write.csv(tbl2, "tt1.csv")
>>
>> As for the second question, the following does it.
>>
>> inx <- apply(tbl1, 1, function(x) all(x != 0))
>> tbl1b <- addmargins(tbl1[inx, ])
>> tbl1b
>>
>>
>> Hope this helps,
>>
>> Rui Barradas
>>
>> On 2/3/2018 4:42 PM, Val wrote:
>>
>> Thank you so much Rui.
>>
>> 1. How do I export this table to excel file?
>> I used this
>> tbl1 <- table(Country, IDNum)
>> tbl2=addmargins(tbl1)
>> write.xlsx(tbl2,"tt1.xlsx"),sheetName="summary",
>> row.names=FALSE)
>> The above did not give me that table.
>>
>>
>> 2. I want select those unique Ids that do have records in all
>> countries.
>> From the above data set, this ID "FIN1540166" should be
>> excluded from the summary table and the table looks like as follow
>>
>> IDNum Country 1 33 358 44 Sum CAN1540164 47 141 248 90 526
>> USA1540165 290 757 321 171 1539 Sum 337 898 569 261 2065
>>
>> Thank you again
>>
>>
>> On Fri, Feb 2, 2018 at 11:26 PM, Rui Barradas
>> <ruipbarradas at sapo.pt <mailto:ruipbarradas at sapo.pt>
>> <mailto:ruipbarradas at sapo.pt <mailto:ruipbarradas at sapo.pt>>>
>> wrote:
>>
>> Hello,
>>
>> Thanks for the reproducible example.
>> See if the following does what you want.
>>
>> IDNum <- sub("^(\\d+).*", "\\1", mydata$ID)
>> Country <- sub("^\\d+(.*)", "\\1", mydata$ID)
>>
>> tbl1 <- table(Country, IDNum)
>> addmargins(tbl1)
>>
>> tbl2 <- xtabs(Y ~ Country + IDNum, mydata)
>> addmargins(tbl2)
>>
>>
>> Hope this helps,
>>
>> Rui Barradas
>>
>>
>> On 2/3/2018 3:00 AM, Val wrote:
>>
>> Hi all,
>>
>> I have a data set need to be summarized by unique ID
>> (count and
>> sum of a
>> variable)
>> A unique individual ID (country name Abbreviation
>> followed by
>> an integer
>> numbers) may have observation in several countries.
>> Then the ID was
>> changed by adding the country code as a prefix and
>> new ID was
>> constructed
>> or recorded like (country code, + the original unique
>> ID Example
>> original ID "CAN1540164" , if this ID has an
>> observation in
>> CANADA then
>> the ID was changed to "1CAN1540164". From this new
>> ID I
>> want get out
>> the country code get the original unique ID and
>> summarize
>> the data by
>> unique ID and country code
>>
>> The data set look like
>> mydata <- read.table(textConnection("GR ID iflag Y
>> A 1CAN1540164 1 20
>> A 1CAN1540164 1 12
>> A 1CAN1540164 1 15
>> A 44CAN1540164 1 30
>> A 44CAN1540164 1 24
>> A 44CAN1540164 1 25
>> A 44CAN1540164 1 11
>> A 33CAN1540164 1 12
>> A 33CAN1540164 1 23
>> A 33CAN1540164 1 65
>> A 33CAN1540164 1 41
>> A 358CAN1540164 1 28
>> A 358CAN1540164 1 32
>> A 358CAN1540164 1 41
>> A 358CAN1540164 1 54
>> A 358CAN1540164 1 29
>> A 358CAN1540164 1 64
>> B 1USA1540165 1 125
>> B 1USA1540165 1 165
>> B 44USA1540165 1 171
>> B 33USA1540165 1 254
>> B 33USA1540165 1 241
>> B 33USA1540165 1 262
>> B 358USA1540165 1 321
>> C 358FIN1540166 1 225 "),header = TRUE
>> ,stringsAsFactors = FALSE)
>>
>> From the above data there are three unique IDs and
>> four
>> country codes (1,
>> 44, 33 and 358)
>>
>> I want the following two tables
>>
>> Table 1. count the unique ID by country code
>> 1 44 33 358 TOT
>> CAN1540164 3 4 4 6 17
>> USA1540165 2 1 3 1 7
>> FIN1540166 - - - 1 1
>> TOT 5 5 7 8 25
>>
>>
>> Table 2 Sum of Y variable by unique ID and country. code
>>
>> 1 44 33
>> 358 TOT
>> CAN1540164 47 90 141 248 526
>> USA1540165 290 171 757 321 1539
>> FIN1540166 - - - 225
>> 225
>> TOT 337 261 898 794
>> 2290
>>
>>
>> How do I do it in R?
>>
>> The first step is to get the unique country codes
>> unique ID
>> by splitting
>> the new ID
>>
>> Thank you in advance
>>
>> [[alternative HTML version deleted]]
>>
>> ______________________________________________
>> R-help at r-project.org <mailto:R-help at r-project.org>
>> <mailto:R-help at r-project.org <mailto:R-help at r-project.org>>
>> mailing list
>> -- To UNSUBSCRIBE and more, see
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> <https://stat.ethz.ch/mailman/listinfo/r-help>
>> <https://stat.ethz.ch/mailman/listinfo/r-help
>> <https://stat.ethz.ch/mailman/listinfo/r-help>>
>> PLEASE do read the posting guide
>> http://www.R-project.org/posting-guide.html
>> <http://www.R-project.org/posting-guide.html>
>> <http://www.R-project.org/posting-guide.html
>> <http://www.R-project.org/posting-guide.html>>
>> and provide commented, minimal, self-contained,
>> reproducible code.
>>
>>
>>
>>
[[alternative HTML version deleted]]
More information about the R-help
mailing list