[R] counting duplicate items that occur in multiple groups

Bert Gunter bgunter@4567 @end|ng |rom gm@||@com
Wed Nov 18 01:06:10 CET 2020


z <- with(Data2, tapply(Vendor,Account, I))
n <- vapply(z,length,1)
data.frame (Vendor = unlist(z),
   Account = rep(names(z),n),
   NumVen = rep(n,n)
)

## which gives:

   Vendor Account NumVen
A1      V1      A1      1
A21     V2      A2      3
A22     V3      A2      3
A23     V1      A2      3
A3      V4      A3      1
A4      V2      A4      1

Of course this also works for Data1

Bill may be able to come up with a slicker version, however.

Bert Gunter

"The trouble with having an open mind is that people keep coming along and
sticking things into it."
-- Opus (aka Berkeley Breathed in his "Bloom County" comic strip )


On Tue, Nov 17, 2020 at 3:34 PM Tom Woolman <twoolman using ontargettek.com>
wrote:

> Yes, good catch. Thanks
>
>
> Quoting Bert Gunter <bgunter.4567 using gmail.com>:
>
> > Why 0's in the data frame? Shouldn't that be 1 (vendor with that
> account)?
> >
> > Bert
> > Bert Gunter
> >
> > "The trouble with having an open mind is that people keep coming along
> and
> > sticking things into it."
> > -- Opus (aka Berkeley Breathed in his "Bloom County" comic strip )
> >
> >
> > On Tue, Nov 17, 2020 at 3:29 PM Tom Woolman <twoolman using ontargettek.com>
> > wrote:
> >
> >> Hi Bill. Sorry to be so obtuse with the example data, I was trying
> >> (too hard) not to share any actual values so I just created randomized
> >> values for my example; of course I should have specified that the
> >> random values would not provide the expected problem pattern. I should
> >> have just used simple dummy codes as Bill Dunlap did.
> >>
> >> So per Bill's example data for Data1, the expected (hoped for) output
> >> should be:
> >>
> >>   Vendor Account Num_Vendors_Sharing_Bank_Acct
> >> 1     V1      A1      0
> >> 2     V2      A2      3
> >> 3     V3      A2      3
> >> 4     V4      A2      3
> >>
> >>
> >> Where the new calculated variable is Num_Vendors_Sharing_Bank_Acct.
> >> The value is 3 for V2, V3 and V4 because they each share bank account
> >> A2.
> >>
> >>
> >> Likewise, in the Data2 frame, the same logic applies:
> >>
> >>   Vendor Account Num_Vendors_Sharing_Bank_Acct
> >> 1     V1      A1     0
> >> 2     V2      A2     3
> >> 3     V3      A2     3
> >> 4     V1      A2     3
> >> 5     V4      A3     0
> >> 6     V2      A4     0
> >>
> >>
> >>
> >>
> >>
> >>
> >> Thanks!
> >>
> >>
> >> Quoting Bill Dunlap <williamwdunlap using gmail.com>:
> >>
> >> > What should the result be for
> >> >   Data1 <- data.frame(Vendor=c("V1","V2","V3","V4"),
> >> > Account=c("A1","A2","A2","A2"))
> >> > ?
> >> >
> >> > Must each vendor have only one account?  If not, what should the
> result
> >> be
> >> > for
> >> >    Data2 <- data.frame(Vendor=c("V1","V2","V3","V1","V4","V2"),
> >> > Account=c("A1","A2","A2","A2","A3","A4"))
> >> > ?
> >> >
> >> > -Bill
> >> >
> >> > On Tue, Nov 17, 2020 at 1:20 PM Tom Woolman <twoolman using ontargettek.com
> >
> >> > wrote:
> >> >
> >> >> Hi everyone.  I have a dataframe that is a collection of Vendor IDs
> >> >> plus a bank account number for each vendor. I'm trying to find a way
> >> >> to count the number of duplicate bank accounts that occur in more
> than
> >> >> one unique Vendor_ID, and then assign the count value for each row in
> >> >> the dataframe in a new variable.
> >> >>
> >> >> I can do a count of bank accounts that occur within the same vendor
> >> >> using dplyr and group_by and count, but I can't figure out a way to
> >> >> count duplicates among multiple Vendor_IDs.
> >> >>
> >> >>
> >> >> Dataframe example code:
> >> >>
> >> >>
> >> >> #Create a sample data frame:
> >> >>
> >> >> set.seed(1)
> >> >>
> >> >> Data <- data.frame(Vendor_ID = sample(1:10000), Bank_Account_ID =
> >> >> sample(1:10000))
> >> >>
> >> >>
> >> >>
> >> >>
> >> >> Thanks in advance for any help.
> >> >>
> >> >> ______________________________________________
> >> >> R-help using r-project.org mailing list -- To UNSUBSCRIBE and more, see
> >> >> https://stat.ethz.ch/mailman/listinfo/r-help
> >> >> PLEASE do read the posting guide
> >> >> http://www.R-project.org/posting-guide.html
> >> >> and provide commented, minimal, self-contained, reproducible code.
> >> >>
> >>
> >> ______________________________________________
> >> R-help using r-project.org mailing list -- To UNSUBSCRIBE and more, see
> >> https://stat.ethz.ch/mailman/listinfo/r-help
> >> PLEASE do read the posting guide
> >> http://www.R-project.org/posting-guide.html
> >> and provide commented, minimal, self-contained, reproducible code.
> >>
>
>
>
>

	[[alternative HTML version deleted]]



More information about the R-help mailing list