[R] counting duplicate items that occur in multiple groups
Bert Gunter
bgunter@4567 @end|ng |rom gm@||@com
Wed Nov 18 01:06:10 CET 2020
z <- with(Data2, tapply(Vendor,Account, I))
n <- vapply(z,length,1)
data.frame (Vendor = unlist(z),
Account = rep(names(z),n),
NumVen = rep(n,n)
)
## which gives:
Vendor Account NumVen
A1 V1 A1 1
A21 V2 A2 3
A22 V3 A2 3
A23 V1 A2 3
A3 V4 A3 1
A4 V2 A4 1
Of course this also works for Data1
Bill may be able to come up with a slicker version, however.
Bert Gunter
"The trouble with having an open mind is that people keep coming along and
sticking things into it."
-- Opus (aka Berkeley Breathed in his "Bloom County" comic strip )
On Tue, Nov 17, 2020 at 3:34 PM Tom Woolman <twoolman using ontargettek.com>
wrote:
> Yes, good catch. Thanks
>
>
> Quoting Bert Gunter <bgunter.4567 using gmail.com>:
>
> > Why 0's in the data frame? Shouldn't that be 1 (vendor with that
> account)?
> >
> > Bert
> > Bert Gunter
> >
> > "The trouble with having an open mind is that people keep coming along
> and
> > sticking things into it."
> > -- Opus (aka Berkeley Breathed in his "Bloom County" comic strip )
> >
> >
> > On Tue, Nov 17, 2020 at 3:29 PM Tom Woolman <twoolman using ontargettek.com>
> > wrote:
> >
> >> Hi Bill. Sorry to be so obtuse with the example data, I was trying
> >> (too hard) not to share any actual values so I just created randomized
> >> values for my example; of course I should have specified that the
> >> random values would not provide the expected problem pattern. I should
> >> have just used simple dummy codes as Bill Dunlap did.
> >>
> >> So per Bill's example data for Data1, the expected (hoped for) output
> >> should be:
> >>
> >> Vendor Account Num_Vendors_Sharing_Bank_Acct
> >> 1 V1 A1 0
> >> 2 V2 A2 3
> >> 3 V3 A2 3
> >> 4 V4 A2 3
> >>
> >>
> >> Where the new calculated variable is Num_Vendors_Sharing_Bank_Acct.
> >> The value is 3 for V2, V3 and V4 because they each share bank account
> >> A2.
> >>
> >>
> >> Likewise, in the Data2 frame, the same logic applies:
> >>
> >> Vendor Account Num_Vendors_Sharing_Bank_Acct
> >> 1 V1 A1 0
> >> 2 V2 A2 3
> >> 3 V3 A2 3
> >> 4 V1 A2 3
> >> 5 V4 A3 0
> >> 6 V2 A4 0
> >>
> >>
> >>
> >>
> >>
> >>
> >> Thanks!
> >>
> >>
> >> Quoting Bill Dunlap <williamwdunlap using gmail.com>:
> >>
> >> > What should the result be for
> >> > Data1 <- data.frame(Vendor=c("V1","V2","V3","V4"),
> >> > Account=c("A1","A2","A2","A2"))
> >> > ?
> >> >
> >> > Must each vendor have only one account? If not, what should the
> result
> >> be
> >> > for
> >> > Data2 <- data.frame(Vendor=c("V1","V2","V3","V1","V4","V2"),
> >> > Account=c("A1","A2","A2","A2","A3","A4"))
> >> > ?
> >> >
> >> > -Bill
> >> >
> >> > On Tue, Nov 17, 2020 at 1:20 PM Tom Woolman <twoolman using ontargettek.com
> >
> >> > wrote:
> >> >
> >> >> Hi everyone. I have a dataframe that is a collection of Vendor IDs
> >> >> plus a bank account number for each vendor. I'm trying to find a way
> >> >> to count the number of duplicate bank accounts that occur in more
> than
> >> >> one unique Vendor_ID, and then assign the count value for each row in
> >> >> the dataframe in a new variable.
> >> >>
> >> >> I can do a count of bank accounts that occur within the same vendor
> >> >> using dplyr and group_by and count, but I can't figure out a way to
> >> >> count duplicates among multiple Vendor_IDs.
> >> >>
> >> >>
> >> >> Dataframe example code:
> >> >>
> >> >>
> >> >> #Create a sample data frame:
> >> >>
> >> >> set.seed(1)
> >> >>
> >> >> Data <- data.frame(Vendor_ID = sample(1:10000), Bank_Account_ID =
> >> >> sample(1:10000))
> >> >>
> >> >>
> >> >>
> >> >>
> >> >> Thanks in advance for any help.
> >> >>
> >> >> ______________________________________________
> >> >> R-help using r-project.org mailing list -- To UNSUBSCRIBE and more, see
> >> >> https://stat.ethz.ch/mailman/listinfo/r-help
> >> >> PLEASE do read the posting guide
> >> >> http://www.R-project.org/posting-guide.html
> >> >> and provide commented, minimal, self-contained, reproducible code.
> >> >>
> >>
> >> ______________________________________________
> >> R-help using r-project.org mailing list -- To UNSUBSCRIBE and more, see
> >> https://stat.ethz.ch/mailman/listinfo/r-help
> >> PLEASE do read the posting guide
> >> http://www.R-project.org/posting-guide.html
> >> and provide commented, minimal, self-contained, reproducible code.
> >>
>
>
>
>
[[alternative HTML version deleted]]
More information about the R-help
mailing list