[R] Building package - tab delimited example data issue

Peter Dalgaard P.Dalgaard at biostat.ku.dk
Thu Dec 6 16:03:48 CET 2007


Johannes Graumann wrote:
> Johannes Graumann wrote:
>
>   
>> On Thursday 06 December 2007 11:52:46 Peter Dalgaard wrote:
>>     
>>> Johannes Graumann wrote:
>>>       
>>>> Hello,
>>>>
>>>> I'm trying to integrate example data in the shape of a tab delimited
>>>> ASCII file into my package and therefore dropped it into the data
>>>> subdirectory. The build works out just fine, but when I attempt to
>>>> install I get:
>>>>
>>>> ** building package indices ...
>>>> Error in scan(file, what, nmax, sep, dec, quote, skip, nlines,
>>>> na.strings,  :
>>>>   line 1 did not have 500 elements
>>>> Calls: <Anonymous> ... <Anonymous> -> switch -> assign -> read.table ->
>>>> scan Execution halted
>>>> ERROR: installing package indices failed
>>>> ** Removing '/usr/local/lib/R/site-library/MaxQuantUtils'
>>>> ** Restoring previous '/usr/local/lib/R/site-library/MaxQuantUtils'
>>>>
>>>> Accordingly the check delivers:
>>>>
>>>> ...
>>>> * checking whether package 'MaxQuantUtils' can be installed ... ERROR
>>>>
>>>> Can anyone tell me what I'm doing wrong? build/install witout the ASCII
>>>> file works just fine.
>>>>
>>>> Joh
>>>>         
>>> If you had looked at help(data), you would have found a list of which
>>> file formats it supports and how they are read. Hint: TAB-delimited
>>> files are not among them. *Whitespace* separated files work, using
>>> read.table(filename, header=TRUE), but that is not a superset of
>>> TAB-delimited data if there are empty fields.
>>>
>>> A nice trick is to figure out how to read the data from the command line
>>> and drop the relevant code into a mydata.R file (assuming that the
>>> actual data file is mydata.txt). This gets executed when the data is
>>> loaded (by data(mydata) or when building the lazyload database) because
>>> .R files have priority over .txt.
>>>
>>> This is quite general and allows a nice way of incorporating data
>>>
>>> management while retaining the original data source:
>>>       
>>>> more ISwR/data/stroke.R
>>>>         
>>> stroke <-  read.csv2("stroke.csv", na.strings=".")
>>> names(stroke) <- tolower(names(stroke))
>>> stroke <-  within(stroke,{
>>>     sex <- factor(sex,levels=0:1,labels=c("Female","Male"))
>>>     dgn <- factor(dgn)
>>>     coma <- factor(coma, levels=0:1, labels=c("No","Yes"))
>>>     minf <- factor(minf, levels=0:1, labels=c("No","Yes"))
>>>     diab <- factor(diab, levels=0:1, labels=c("No","Yes"))
>>>     han <- factor(han, levels=0:1, labels=c("No","Yes"))
>>>     died <- as.Date(died, format="%d.%m.%Y")
>>>     dstr <- as.Date(dstr,format="%d.%m.%Y")
>>>     dead <- !is.na(died) & died < as.Date("1996-01-01")
>>>     died[!dead] <- NA
>>> })
>>>
>>>       
>>>> head ISwR/data/stroke.csv
>>>>         
>>> SEX;DIED;DSTR;AGE;DGN;COMA;DIAB;MINF;HAN
>>> 1;7.01.1991;2.01.1991;76;INF;0;0;1;0
>>> 1;.;3.01.1991;58;INF;0;0;0;0
>>> 1;2.06.1991;8.01.1991;74;INF;0;0;1;1
>>> 0;13.01.1991;11.01.1991;77;ICH;0;1;0;1
>>> 0;23.01.1996;13.01.1991;76;INF;0;1;0;1
>>> 1;13.01.1991;13.01.1991;48;ICH;1;0;0;1
>>> 0;1.12.1993;14.01.1991;81;INF;0;0;0;1
>>> 1;12.12.1991;14.01.1991;53;INF;0;0;1;1
>>> 0;.;15.01.1991;73;ID;0;0;0;1
>>>       
>> Thanks for your help. Very insightfull and your version of "RTFM" was not
>> to harsh either ;0)
>> Part of what I want to achieve with the inclusion of the file is to be
>> able to showcase a read-in function for the particular data type. Is there
>> a slick way - sticking to your example - to reference the 'stroke.csv'
>> directly? I'd like to put in the example of some function.Rd something
>> analogous to # Use function to read in file:
>> result <- function(<link to 'stroke.csv' in installed ISwR package>)
>> Without having to resort to accepting the example as "No Run".
>>     
>
> Answering to myself and staying with the same example:
>         system.file("data/stroke.csv",package="ISwR")
> allows direct access to the example file (name).
>
>   
Yes, but...

This works right until you turn on LazyData for your package, then you
end up with only

00Index  Rdata.rdb  Rdata.rds  Rdata.rdx

in the data directory. Use the "inst" source subdir for files you want
to have installed explicitly.

Also, in principle, it is

system.file("data", "stroke.csv", package="ISwR")


although platforms that do not understand "/" as the path separator are
rare nowadays.

-- 
   O__  ---- Peter Dalgaard             Øster Farimagsgade 5, Entr.B
  c/ /'_ --- Dept. of Biostatistics     PO Box 2099, 1014 Cph. K
 (*) \(*) -- University of Copenhagen   Denmark          Ph:  (+45) 35327918
~~~~~~~~~~ - (p.dalgaard at biostat.ku.dk)                  FAX: (+45) 35327907



More information about the R-help mailing list