[R] counting duplicate items that occur in multiple groups

Wed Nov 18 00:34:47 CET 2020

Yes, good catch. Thanks

Quoting Bert Gunter <bgunter.4567 using gmail.com>:

> Why 0's in the data frame? Shouldn't that be 1 (vendor with that account)?
>
> Bert
> Bert Gunter
>
> "The trouble with having an open mind is that people keep coming along and
> sticking things into it."
> -- Opus (aka Berkeley Breathed in his "Bloom County" comic strip )
>
>
> On Tue, Nov 17, 2020 at 3:29 PM Tom Woolman <twoolman using ontargettek.com>
> wrote:
>
>> Hi Bill. Sorry to be so obtuse with the example data, I was trying
>> (too hard) not to share any actual values so I just created randomized
>> values for my example; of course I should have specified that the
>> random values would not provide the expected problem pattern. I should
>> have just used simple dummy codes as Bill Dunlap did.
>>
>> So per Bill's example data for Data1, the expected (hoped for) output
>> should be:
>>
>>   Vendor Account Num_Vendors_Sharing_Bank_Acct
>> 1     V1      A1      0
>> 2     V2      A2      3
>> 3     V3      A2      3
>> 4     V4      A2      3
>>
>>
>> Where the new calculated variable is Num_Vendors_Sharing_Bank_Acct.
>> The value is 3 for V2, V3 and V4 because they each share bank account
>> A2.
>>
>>
>> Likewise, in the Data2 frame, the same logic applies:
>>
>>   Vendor Account Num_Vendors_Sharing_Bank_Acct
>> 1     V1      A1     0
>> 2     V2      A2     3
>> 3     V3      A2     3
>> 4     V1      A2     3
>> 5     V4      A3     0
>> 6     V2      A4     0
>>
>>
>>
>>
>>
>>
>> Thanks!
>>
>>
>> Quoting Bill Dunlap <williamwdunlap using gmail.com>:
>>
>> > What should the result be for
>> >   Data1 <- data.frame(Vendor=c("V1","V2","V3","V4"),
>> > Account=c("A1","A2","A2","A2"))
>> > ?
>> >
>> > Must each vendor have only one account?  If not, what should the result
>> be
>> > for
>> >    Data2 <- data.frame(Vendor=c("V1","V2","V3","V1","V4","V2"),
>> > Account=c("A1","A2","A2","A2","A3","A4"))
>> > ?
>> >
>> > -Bill
>> >
>> > On Tue, Nov 17, 2020 at 1:20 PM Tom Woolman <twoolman using ontargettek.com>
>> > wrote:
>> >
>> >> Hi everyone.  I have a dataframe that is a collection of Vendor IDs
>> >> plus a bank account number for each vendor. I'm trying to find a way
>> >> to count the number of duplicate bank accounts that occur in more than
>> >> one unique Vendor_ID, and then assign the count value for each row in
>> >> the dataframe in a new variable.
>> >>
>> >> I can do a count of bank accounts that occur within the same vendor
>> >> using dplyr and group_by and count, but I can't figure out a way to
>> >> count duplicates among multiple Vendor_IDs.
>> >>
>> >>
>> >> Dataframe example code:
>> >>
>> >>
>> >> #Create a sample data frame:
>> >>
>> >> set.seed(1)
>> >>
>> >> Data <- data.frame(Vendor_ID = sample(1:10000), Bank_Account_ID =
>> >> sample(1:10000))
>> >>
>> >>
>> >>
>> >>
>> >> Thanks in advance for any help.
>> >>
>> >> ______________________________________________
>> >> R-help using r-project.org mailing list -- To UNSUBSCRIBE and more, see
>> >> https://stat.ethz.ch/mailman/listinfo/r-help
>> >> PLEASE do read the posting guide
>> >> http://www.R-project.org/posting-guide.html
>> >> and provide commented, minimal, self-contained, reproducible code.
>> >>
>>
>> ______________________________________________
>> R-help using r-project.org mailing list -- To UNSUBSCRIBE and more, see
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide
>> http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>>