[R] Proposal: Archive UseR conference presentations at www.r-project.org/useR-yyyy
Friedrich Leisch
friedrich.leisch at stat.uni-muenchen.de
Wed Sep 26 09:05:47 CEST 2007
>>>>> On Tue, 25 Sep 2007 09:17:23 -0500,
>>>>> hadley wickham (hw) wrote:
>> > but I can understand your desire to do
>> > that. Perhaps just taking a static snapshot using something like
>> > wget, and hosting that on the R-project website would be a good
>> > compromise.
>>
>> Hmm, wouldn't it be easier if the hosting institution would make a tgz
>> file? wget over HTTP is rather bad in resolving links etc
> Really? I've always found it to be rather excellent.
Sorry, my statement was rather ambigous. wget is excellent, but
mirroring via HTTP is terrible (administering CRAN for a couple of
years gives you more experience than you ever wanted to have in that
arena).
With "links" I meant symbolic links on the server filesystem. HTTP
does not make a difference between a symbolic link and a real
file/directory, so for every symbolic link you get a copy of the
target (and there is nothing wget can do about it AFAIK).
> The reason I suggest it is that unless you have some way to generate a
> static copy of the site, you'll need to ensure that the R-project
> supports any dynamic content. e.g. for example the user 2008 site
> uses some (fairly vanilla) php for including the header and
> footer.
I don't care how the tgz file I get is created, but probably it is
better if the local authors create (and check) it rather than I do
it. So no problem if the tarball is created using wget ... but I'd
rather prefer not to do it myself.
>> we could include a note on the top page that this is only a snapshot
>> copy and have a link to the original site (in case something changes
>> there).
> That's reasonable, although it would be even better to have it on
> every page.
Again, if the authors create a tarball, they can put the note wherever
they like. I thought of adding a link "local copy from 200x-yy-zz" to
the list of conferences at www.R-project.org next to the links to the
original sites.
>> > The one problem is setting up a redirect so that existing links and
>> > google searches aren't broken. This would need to be put in place at
>> > least 6 months before the old website closed.
>>
>> Yes, very good point, I didn't think about that. But the R site is
>> searched very often, so material there appears rather quickly on
>> Google searches. Ad bookmarks: I don't want to remove the old site,
>> just have an archive copy at a central location.
> In that case, should it be labelled no-index as it's just a cache of
> material that should be available elsewhere? We need some
> machine-readable way of indicating where the canonical resource is.
> It's always frustrated me a little that when googling for r
> documentation, you find hundreds of the same page hosted at different
> sites.
Well, 2 copies are not as bad as hundreds. But material might get
found faster on the www.R-project.org site, because that ranks
surprisingly high in many google searches.
Best,
Fritz
More information about the R-help
mailing list