[R] Seeking to validate data quality requirements - should I develop a package?

Architector Data Tools dtwadd at googlemail.com
Fri Aug 4 18:17:35 CEST 2017


Thanks Bert, I will definately look through rseek, and reuse wherever
possible. Scanning through the first few pages, maybe "datacheck" can
provide something. But I have in mind a complete DQ package, a sports car
with 4  good wheels ;) and still seems likely that I will need to develop
something at this point.

Regards,
David

On Fri, 4 Aug 2017, 3:15 pm Bert Gunter, <bgunter.4567 at gmail.com> wrote:

> Sounds like you'll be reinventing square wheels.
>
> Searching "data quality package" on rseek.org brought up many hits.
>
> Cheers,
> Bert
>
>
> Bert Gunter
>
> "The trouble with having an open mind is that people keep coming along
> and sticking things into it."
> -- Opus (aka Berkeley Breathed in his "Bloom County" comic strip )
>
>
> On Fri, Aug 4, 2017 at 2:56 AM, Architector Data Tools via R-help
> <r-help at r-project.org> wrote:
> > I am planning to develop an R package to manage all aspects of data
> > quality. I am very experienced in data quality, but fairly new to R. I
> > have tried to find a suitable data quality package, and am surprised
> > not to find much to suit my requirements.  Developing the package
> > would be an ambitious effort, involving several contributors (that I
> > have already identified, and who also do not have much R experience
> > yet). So I am seeking some confidence that the effort is worthwhile.
> >
> > The package will be highly configurable so it can be applied to pretty
> > much any situation, and will implement sophisticated data quality
> > capabilities, including:
> >
> > (a) DEFINITION: integration with a data dictionary (perhaps metaData),
> > and with highly configurable and expressive data quality rules
> >
> > (b) MONITORING & DETECTION: automated data quality monitoring and
> > alerting against any data source. Automatically raise and update
> > quality issues
> >
> > (c) ANALYSIS & ROOT CAUSE: data quality dashboard, alerts,
> > drill-downs, plot trends, including perhaps a machine learning aspect
> > that detects noteworthy events in quality measurements for inclusion
> > in executive reports
> >
> > (d) WORKFLOW: basic data quality management workflow (i.e. implement
> > 'inbox' and 'actions', probably via Shiny)
> >
> > The requirements will be drawn from my professional experience (as
> > interim head of data quality at a global bank), although this project
> > is not sponsored either by my employer or any of my consulting
> > clients. I do, however, expect the package to be of interest to
> > financial service organisations who rely on good quality data for
> > their financial and risk models, and for any other process that relies
> > on good data.
> >
> > To sum up, if anyone can point to a data quality package that means I
> > don’t have to develop one that would be great. Alternatively, any
> > comments of support would also be very useful!
> >
> > David
> >
> > David Twaddell
> > Architector Data Tools
> > Tel: +44 20 3239 1099 | +44 7447 936 984
> > Web: www.architector.co.uk
> >
> > ______________________________________________
> > R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
> > https://stat.ethz.ch/mailman/listinfo/r-help
> > PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> > and provide commented, minimal, self-contained, reproducible code.
>

	[[alternative HTML version deleted]]



More information about the R-help mailing list