[R] counting duplicate items that occur in multiple groups

Wed Nov 18 00:33:04 CET 2020

Why 0's in the data frame? Shouldn't that be 1 (vendor with that account)?

Bert
Bert Gunter

"The trouble with having an open mind is that people keep coming along and
sticking things into it."
-- Opus (aka Berkeley Breathed in his "Bloom County" comic strip )

On Tue, Nov 17, 2020 at 3:29 PM Tom Woolman <twoolman using ontargettek.com>
wrote:

> Hi Bill. Sorry to be so obtuse with the example data, I was trying
> (too hard) not to share any actual values so I just created randomized
> values for my example; of course I should have specified that the
> random values would not provide the expected problem pattern. I should
> have just used simple dummy codes as Bill Dunlap did.
>
> So per Bill's example data for Data1, the expected (hoped for) output
> should be:
>
>   Vendor Account Num_Vendors_Sharing_Bank_Acct
> 1     V1      A1      0
> 2     V2      A2      3
> 3     V3      A2      3
> 4     V4      A2      3
>
>
> Where the new calculated variable is Num_Vendors_Sharing_Bank_Acct.
> The value is 3 for V2, V3 and V4 because they each share bank account
> A2.
>
>
> Likewise, in the Data2 frame, the same logic applies:
>
>   Vendor Account Num_Vendors_Sharing_Bank_Acct
> 1     V1      A1     0
> 2     V2      A2     3
> 3     V3      A2     3
> 4     V1      A2     3
> 5     V4      A3     0
> 6     V2      A4     0
>
>
>
>
>
>
> Thanks!
>
>
> Quoting Bill Dunlap <williamwdunlap using gmail.com>:
>
> > What should the result be for
> >   Data1 <- data.frame(Vendor=c("V1","V2","V3","V4"),
> > Account=c("A1","A2","A2","A2"))
> > ?
> >
> > Must each vendor have only one account?  If not, what should the result
> be
> > for
> >    Data2 <- data.frame(Vendor=c("V1","V2","V3","V1","V4","V2"),
> > Account=c("A1","A2","A2","A2","A3","A4"))
> > ?
> >
> > -Bill
> >
> > On Tue, Nov 17, 2020 at 1:20 PM Tom Woolman <twoolman using ontargettek.com>
> > wrote:
> >
> >> Hi everyone.  I have a dataframe that is a collection of Vendor IDs
> >> plus a bank account number for each vendor. I'm trying to find a way
> >> to count the number of duplicate bank accounts that occur in more than
> >> one unique Vendor_ID, and then assign the count value for each row in
> >> the dataframe in a new variable.
> >>
> >> I can do a count of bank accounts that occur within the same vendor
> >> using dplyr and group_by and count, but I can't figure out a way to
> >> count duplicates among multiple Vendor_IDs.
> >>
> >>
> >> Dataframe example code:
> >>
> >>
> >> #Create a sample data frame:
> >>
> >> set.seed(1)
> >>
> >> Data <- data.frame(Vendor_ID = sample(1:10000), Bank_Account_ID =
> >> sample(1:10000))
> >>
> >>
> >>
> >>
> >> Thanks in advance for any help.
> >>
> >> ______________________________________________
> >> R-help using r-project.org mailing list -- To UNSUBSCRIBE and more, see
> >> https://stat.ethz.ch/mailman/listinfo/r-help
> >> PLEASE do read the posting guide
> >> http://www.R-project.org/posting-guide.html
> >> and provide commented, minimal, self-contained, reproducible code.
> >>
>
> ______________________________________________
> R-help using r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

	[[alternative HTML version deleted]]