[R] counting duplicate items that occur in multiple groups
Bert Gunter
bgunter@4567 @end|ng |rom gm@||@com
Wed Nov 18 00:33:04 CET 2020
Why 0's in the data frame? Shouldn't that be 1 (vendor with that account)?
Bert
Bert Gunter
"The trouble with having an open mind is that people keep coming along and
sticking things into it."
-- Opus (aka Berkeley Breathed in his "Bloom County" comic strip )
On Tue, Nov 17, 2020 at 3:29 PM Tom Woolman <twoolman using ontargettek.com>
wrote:
> Hi Bill. Sorry to be so obtuse with the example data, I was trying
> (too hard) not to share any actual values so I just created randomized
> values for my example; of course I should have specified that the
> random values would not provide the expected problem pattern. I should
> have just used simple dummy codes as Bill Dunlap did.
>
> So per Bill's example data for Data1, the expected (hoped for) output
> should be:
>
> Vendor Account Num_Vendors_Sharing_Bank_Acct
> 1 V1 A1 0
> 2 V2 A2 3
> 3 V3 A2 3
> 4 V4 A2 3
>
>
> Where the new calculated variable is Num_Vendors_Sharing_Bank_Acct.
> The value is 3 for V2, V3 and V4 because they each share bank account
> A2.
>
>
> Likewise, in the Data2 frame, the same logic applies:
>
> Vendor Account Num_Vendors_Sharing_Bank_Acct
> 1 V1 A1 0
> 2 V2 A2 3
> 3 V3 A2 3
> 4 V1 A2 3
> 5 V4 A3 0
> 6 V2 A4 0
>
>
>
>
>
>
> Thanks!
>
>
> Quoting Bill Dunlap <williamwdunlap using gmail.com>:
>
> > What should the result be for
> > Data1 <- data.frame(Vendor=c("V1","V2","V3","V4"),
> > Account=c("A1","A2","A2","A2"))
> > ?
> >
> > Must each vendor have only one account? If not, what should the result
> be
> > for
> > Data2 <- data.frame(Vendor=c("V1","V2","V3","V1","V4","V2"),
> > Account=c("A1","A2","A2","A2","A3","A4"))
> > ?
> >
> > -Bill
> >
> > On Tue, Nov 17, 2020 at 1:20 PM Tom Woolman <twoolman using ontargettek.com>
> > wrote:
> >
> >> Hi everyone. I have a dataframe that is a collection of Vendor IDs
> >> plus a bank account number for each vendor. I'm trying to find a way
> >> to count the number of duplicate bank accounts that occur in more than
> >> one unique Vendor_ID, and then assign the count value for each row in
> >> the dataframe in a new variable.
> >>
> >> I can do a count of bank accounts that occur within the same vendor
> >> using dplyr and group_by and count, but I can't figure out a way to
> >> count duplicates among multiple Vendor_IDs.
> >>
> >>
> >> Dataframe example code:
> >>
> >>
> >> #Create a sample data frame:
> >>
> >> set.seed(1)
> >>
> >> Data <- data.frame(Vendor_ID = sample(1:10000), Bank_Account_ID =
> >> sample(1:10000))
> >>
> >>
> >>
> >>
> >> Thanks in advance for any help.
> >>
> >> ______________________________________________
> >> R-help using r-project.org mailing list -- To UNSUBSCRIBE and more, see
> >> https://stat.ethz.ch/mailman/listinfo/r-help
> >> PLEASE do read the posting guide
> >> http://www.R-project.org/posting-guide.html
> >> and provide commented, minimal, self-contained, reproducible code.
> >>
>
> ______________________________________________
> R-help using r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>
[[alternative HTML version deleted]]
More information about the R-help
mailing list