[R-sig-Fedora] Planning for R-repo...

Tue Mar 20 00:02:35 CET 2012

On 03/19/2012 05:12 AM, Pierre-Yves Chibon wrote:

> I think the idea of R-repo is more in line with what you have in
> mind.  R2spec is a tool to generate spec file and make them as
> compliant as possible with Fedora packaging guidelines. I think it
> should remain this way.  R-repo aims at generating and providing RPM
> in an automatic way. This has pros and cons but it will be up to the
> user/sysadmin to weight them and decide to use R-repo or not.

I agree.  The critical piece I'd pick out of this is: We would like
the R-repo to be close to Fedora's guidelines, but we don't
necessarily expect to get it all the way there.

> My vision of R-repo is:
> - Generate a first layer of spec files automatically (done)
> - Build them (done)
> - Fix the one that do not build (so by hand) (started, still far from
> done)
> - Perform the updates as they are needed (I have one script but we would
> need to finish step 3 first)

Do you mean that R-repo is a published tree of specfiles?  I was
envisioning that R-repo would be a live repository of RPMs, which we
would (Hopefully, eventually) maintain not too far behind the state of
CRAN.

Then, in parallel to it, R2spec / R2rpm would be the toolset that
we're using to maintain and automate that build process.

> So hopefully we would face the dependency issues only in stage 1 and 3
> (and eventually 4). So if package X requires something outside R
> repository, you will detect it at stage 3 and fix it once and for all.

"Fix it once and for all".  Here's what I think that means:

1) human discovers the dependency (e.g. "Oh, GDD reqires a package
    called 'libgd' to run, and 'libgd-devel' to build")

2) Note that human discovery in a table or data file, shipped with
    (R-repo?  cran2spec?)

3) Make sure that cran2spec can read that table to supply the
    dependency at specfile generate time.  Then, when
    GDD-1.2.whatevers-next comes out, we don't have to rediscover it.

Is that how you envision it?

> I might have an idealistic vision of the reality here, so I went for a
> simple scheme.

I belive that, to be useful to the community, we have to have a
toolset that can build "most of" CRAN and friends without "too much"
effort.

How much is most of?  How much effort is too much?  I'll make up
numbers, you can shoot them down.

R2spec is at V4.1; so let's say that for version 4.3, "Enough" is all
the packages that pingou and asr care about installing on their
machines (and for me that includes ggplot2, so that's a lot...).

Then, "too much" means there shouldn't be any human intervention: That
means we set it up on a brand clean fedora or RHEL machine, say "GO!"..

% R2trop_plus_rpms ggplot2

and however many hours later we've got our preference list of
packages, and all their dependencies, as brand new signed RPMs, ready
to copy into my repo at work so my stats customers can just yum
install from them.

Then, say, by version 4.5 the target is (again, I'm making this up)
80% of all of CRAN can be built without human intervention.

if we got to that stage, I think we'd have a very good understanding
of what the rest of the problems are.  But while I'm fantasizing, I'll
pretend that 5.0 can build every package except for 20 Problem
Children.

> How does this sound for you?

I think we are talking about very similar things.  It remains to
determine wether the differences are critical (mostly, to you: you're
in charge).

I think that every by-hand tweak of a generated specfile is a loss.
This means that we need to engage the attention of one of the humans
in charge of the repository every time that package has a revision.
This adds up to "R-repo is always behind, because pingou has a job all
day. :)"

I do not say we must have -zero- by-hand fixes.  But I think we should
work very very hard to avoid them.  So:

> Basically, I think what you were saying makes sense and is the most
> sensible approach. First get auto-generated spec files, then review
> them by hand to fix build problems.

Yes; but I suggest that the review should result in changes to R2spec,
so that it generates the correct spec next time, and for everyone.

In my imagining, R2rpm eventually becomes: A process that is smart
enough to make the RPM correctly most of the time, plus a list of
known exceptions and tweaks we have to drop in for things we can't
figure out automatically.  That way anyone could build their own repo.

> For the dependencies, the two steps approach you were referring to
> is probably the best approach. First build with the minimum
> dependencies then rebuild as the other have become available. Now
> that we have the first round in place and available we can start
> working on the second run.

I am working on a concrete plan for this; I will do a little testing
to make sure it can function, and then I will write it out in detail,
submitted for your consideration.

- Allen S. Rout