[R] New vocabulary on a Friday afternoon. Was: Improving data processing efficiency

Greg Snow Greg.Snow at imail.org
Fri Jun 6 21:14:20 CEST 2008


I still like the number 4 option, so I think we need to come up with a formal definition for a "junk" of data.  I read somewhere that Tukey coined the word "bit" as it applies to computers, we can share the credit/blame for "junks" of data.

My proposal for a statistical/data definition of the work junk:

Junk (noun):
A quantity of data just large enough to get the client excited about the "great" dataset they provided, but not large enough to make any useful conclusions.

Example sentence:  We just received another junk of data from the boss, who gets to give him the bad news that it still does not prove his pet theory?


--
Gregory (Greg) L. Snow Ph.D.
Statistical Data Center
Intermountain Healthcare
greg.snow at imail.org
(801) 408-8111



> -----Original Message-----
> From: Patrick Burns [mailto:pburns at pburns.seanet.com]
> Sent: Friday, June 06, 2008 12:58 PM
> To: Gabor Grothendieck
> Cc: Greg Snow; r-help at r-project.org
> Subject: Re: [R] Improving data processing efficiency
>
> My guess is that number 2 is closest to the mark.
> Typing too fast is unfortunately not one of my habitual attributes.
>
> Gabor Grothendieck wrote:
> > On Fri, Jun 6, 2008 at 2:28 PM, Greg Snow
> <Greg.Snow at imail.org> wrote:
> >
> >>> -----Original Message-----
> >>> From: r-help-bounces at r-project.org
> >>> [mailto:r-help-bounces at r-project.org] On Behalf Of Patrick Burns
> >>> Sent: Friday, June 06, 2008 12:04 PM
> >>> To: Daniel Folkinshteyn
> >>> Cc: r-help at r-project.org
> >>> Subject: Re: [R] Improving data processing efficiency
> >>>
> >>> That is going to be situation dependent, but if you have a
> >>> reasonable upper bound, then that will be much easier and not far
> >>> from optimal.
> >>>
> >>> If you pick the possibly too small route, then increasing
> the size
> >>> in largish junks is much better than adding a row at a time.
> >>>
> >> Pat,
> >>
> >> I am unfamiliar with the use of the word "junk" as a unit
> of measure for data objects.  I figure there are a few
> different possibilities:
> >>
> >> 1. You are using the term intentionally meaning that you
> suggest he increases the size in terms of old cars and broken
> pianos rather than used up pens and broken pencils.
> >>
> >> 2. This was a Freudian slip based on your opinion of some
> datasets you have seen.
> >>
> >> 3. Somewhere between your mind and the final product
> "jumps/chunks" became "junks" (possibly a microsoft
> "correction", or just typing too fast combined with number 2).
> >>
> >> 4. "junks" is an official measure of data/object size that
> I need to learn more about (the history of the term possibly
> being related to 2 and 3 above).
> >>
> >>
> >
> > 5. Chinese sailing vessel.
> > http://en.wikipedia.org/wiki/Junk_(ship)
> >
> > ______________________________________________
> > R-help at r-project.org mailing list
> > https://stat.ethz.ch/mailman/listinfo/r-help
> > PLEASE do read the posting guide
> > http://www.R-project.org/posting-guide.html
> > and provide commented, minimal, self-contained, reproducible code.
> >
> >
> >
>



More information about the R-help mailing list