[Rd] Comments in the DESCRIPTION file

Hervé Pagès hpages at fhcrc.org
Sat Dec 8 02:10:09 CET 2012


Hi Simon,

On 12/06/2012 05:59 PM, Simon Urbanek wrote:
> On Dec 6, 2012, at 8:36 PM, Hervé Pagès wrote:
>
>> On 12/06/2012 04:53 PM, William Dunlap wrote:
>>> Why not just use some tag that R doesn't already use, say "Comment:", instead
>>> of a #?  If you allow # in position one of a line to mean a comment then people
>>> may expect # to be used as a comment anywhere on a line.
>>
>> I would stick to whatever the DCF spec say, if there is such thing.
>> If the spec says # on position 1 means a comment then I think read.dcf()
>> should do that. Then the function can be used to read any DCF file,
>> not just DESCRIPTION files.
>>
>
> DCF itself doesn't define the meaning of # -- it only defines that no field name is allowed to start with #. In fact the same document says that lines starting with # are not permitted in general DCF files -- they are only permitted in Debian's source package control files. That leaves the status of # as comments somewhat confusing. My interpretation would be that generic DCF doesn't allow # but specific formats derived from DCF may choose to interpret it that way. In either case the current behavior of read.dcf() definitely satisfies the DCF definition.

Not if the definition says that no field name is allowed to start
with #:

   > read.dcf("toto.dcf")
        #Package Version
   [1,] "toto"   "0.0.0"

> As both Brian and Bill pointed out, the proper way to do that is to define a data field with data/value as the comment.

which maybe works OK for inserting comments in DESCRIPTION files,
but not so well for inserting inter-record comments in DCF files with
multiple records.

In Bioconductor we maintain a big DCF file that we use to automatically
re-generate a collection of annotation packages at each release. The
file looks like:

# Annotation packages for Human

Package: hcg110.db
Version: 2.8.0
PkgTemplate: NCBICHIP.DB

Package: hgfocus.db
Version: 2.8.0
PkgTemplate: NCBICHIP.DB

# Annotation packages for Mouse

Package: mgu74a.db
Version: 2.8.0
PkgTemplate: NCBICHIP.DB

Package: mgu74av2.db
Version: 2.8.0
PkgTemplate: NCBICHIP.DB

The problem if you put those comments in key/value pairs is that
it contaminates the output of read.dcf() with fake records:

 > read.dcf("toto.dcf")
      Note                            Package       Version PkgTemplate
[1,] "Annotation packages for Human" NA            NA      NA
[2,] NA                              "hcg110.db"   "2.8.0" "NCBICHIP.DB"
[3,] NA                              "hgfocus.db"  "2.8.0" "NCBICHIP.DB"
[4,] "Annotation packages for Mouse" NA            NA      NA
[5,] NA                              "mgu74a.db"   "2.8.0" "NCBICHIP.DB"
[6,] NA                              "mgu74av2.db" "2.8.0" "NCBICHIP.DB"

The file really has 4 records of data and it'd be good to be able to add
inter-record comments without altering the number of records.

This is the reason why we use a "comment aware" version of read.dcf().

I can see why maybe you wouldn't like having people start using # to
insert comment lines in their DESCRIPTION file and I agree that it
should probably be discouraged. So maybe support for # comments could
be made optional in read.dcf() thru an extra arg, and would be disabled
by default?

Thanks,
H.

>
> Cheers,
> Simon
>
>
>
>> Cheers,
>> H.
>>
>>>
>>> (It may also mess up some dcf parsing code that I've written - it checks that lines
>>> after tagged lines are either empty, the start of a new description, or start with a space,
>>> a continuation of the previous line.)
>>>
>>> Bill Dunlap
>>> Spotfire, TIBCO Software
>>> wdunlap tibco.com
>>>
>>>
>>>> -----Original Message-----
>>>> From: r-devel-bounces at r-project.org [mailto:r-devel-bounces at r-project.org] On Behalf
>>>> Of Hervé Pagès
>>>> Sent: Thursday, December 06, 2012 3:47 PM
>>>> To: Duncan Murdoch
>>>> Cc: christophe.genolini at u-paris10.fr; r-devel at r-project.org; Christophe Genolini
>>>> Subject: Re: [Rd] Comments in the DESCRIPTION file
>>>>
>>>>
>>>>
>>>> On 12/06/2012 03:41 PM, Hervé Pagès wrote:
>>>>> Hi,
>>>>>
>>>>> Wouldn't be hard to patch read.dcf() though.
>>>>>
>>>>> FWIW here's the "comment aware" version of read.dcf() I've been using
>>>>> for years:
>>>>>
>>>>>     .removeCommentLines <- function(infile=stdin(), outfile=stdout())
>>>>>     {
>>>>>       if (is.character(infile)) {
>>>>>           infile <- file(infile, "r")
>>>>>           on.exit(close(infile))
>>>>>       }
>>>>>       if (is.character(outfile)) {
>>>>>           outfile <- file(outfile, "w")
>>>>>           on.exit({close(infile); close(outfile)})
>>>>>       }
>>>>>       while (TRUE) {
>>>>>           lines <- readLines(infile, n=25000L)
>>>>>           if (length(lines) == 0L)
>>>>>               return()
>>>>>           keep_it <- substr(lines, 1L, 1L) != "#"
>>>>>           writeLines(lines[keep_it], outfile)
>>>>>       }
>>>>>     }
>>>>>
>>>>>     read.dcf2 <- function(file, ...)
>>>>>     {
>>>>>       clean_file <- file.path(tempdir(), "clean.dcf")
>>>>
>>>> mmh, would certainly be better to just use tempfile() here.
>>>>
>>>> H.
>>>>
>>>>>       .removeCommentLines(file, clean_file)
>>>>>       on.exit(file.remove(clean_file))
>>>>>       read.dcf(clean_file, ...)
>>>>>     }
>>>>>
>>>>> Cheers,
>>>>> H.
>>>>>
>>>>> On 11/07/2012 01:53 AM, Duncan Murdoch wrote:
>>>>>> On 12-11-07 4:26 AM, Christophe Genolini wrote:
>>>>>>> Hi all,
>>>>>>>
>>>>>>> Is it possible to add comments in the DESCRIPTION file?
>>>>>>
>>>>>>
>>>>>> The read.dcf function is used to read the DESCRIPTION file, and it
>>>>>> doesn't support comments.  (The current Debian control format
>>>>>> description does appear to support comments with leading # markers, but
>>>>>> R's read.dcf function doesn't support these.)
>>>>>>
>>>>>> You could probably get away with something like
>>>>>>
>>>>>> #: this is a comment
>>>>>>
>>>>>> since unrecognized fields are ignored, but I think this fact is
>>>>>> undocumented so I would say it's safer to assume that comments are not
>>>>>> supported.
>>>>>>
>>>>>> Duncan Murdoch
>>>>>>
>>>>>> ______________________________________________
>>>>>> R-devel at r-project.org mailing list
>>>>>> https://stat.ethz.ch/mailman/listinfo/r-devel
>>>>>
>>>>
>>>> --
>>>> Hervé Pagès
>>>>
>>>> Program in Computational Biology
>>>> Division of Public Health Sciences
>>>> Fred Hutchinson Cancer Research Center
>>>> 1100 Fairview Ave. N, M1-B514
>>>> P.O. Box 19024
>>>> Seattle, WA 98109-1024
>>>>
>>>> E-mail: hpages at fhcrc.org
>>>> Phone:  (206) 667-5791
>>>> Fax:    (206) 667-1319
>>>>
>>>> ______________________________________________
>>>> R-devel at r-project.org mailing list
>>>> https://stat.ethz.ch/mailman/listinfo/r-devel
>>
>> --
>> Hervé Pagès
>>
>> Program in Computational Biology
>> Division of Public Health Sciences
>> Fred Hutchinson Cancer Research Center
>> 1100 Fairview Ave. N, M1-B514
>> P.O. Box 19024
>> Seattle, WA 98109-1024
>>
>> E-mail: hpages at fhcrc.org
>> Phone:  (206) 667-5791
>> Fax:    (206) 667-1319
>>
>> ______________________________________________
>> R-devel at r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-devel
>>
>>
>

-- 
Hervé Pagès

Program in Computational Biology
Division of Public Health Sciences
Fred Hutchinson Cancer Research Center
1100 Fairview Ave. N, M1-B514
P.O. Box 19024
Seattle, WA 98109-1024

E-mail: hpages at fhcrc.org
Phone:  (206) 667-5791
Fax:    (206) 667-1319



More information about the R-devel mailing list