[R] why must a named colClasses in read.table be in correct order
Andreas Leha
andreas.leha at med.uni-goettingen.de
Thu Jul 9 05:15:16 CEST 2015
Hi Henrik,
Thank you very much for looking into this. And thanks for the patch!
Yes, let's hope this is a typo that gets fixed.
Regards,
Andreas
Henrik Bengtsson <henrik.bengtsson at ucsf.edu> writes:
> Thanks for insisting; I was wrong and I'm happy to see that there is
> indeed code intended for named 'colClasses', which even goes back to
> 2004. But as you report, then names only work when
> length(colClasses) < cols (which also explains why I though it was not
> supported). I'm not sure if that _strictly less than_ test is
> intentional or a mistake, but I would propose the following patch:
>
> [HB-X201]{hb}: svn diff src\library\utils\R\readtable.R
> Index: src/library/utils/R/readtable.R
> ===================================================================
> --- src/library/utils/R/readtable.R (revision 68642)
> +++ src/library/utils/R/readtable.R (working copy)
> @@ -139,7 +139,7 @@
> if (rlabp) col.names <- c("row.names", col.names)
>
> nmColClasses <- names(colClasses)
> - if(length(colClasses) < cols)
> + if(length(colClasses) <= cols)
> if(is.null(nmColClasses)) {
> colClasses <- rep_len(colClasses, cols)
> } else {
>
>
> Your example works with this patch. I've made it source():able so you
> can try it out (if you cannot source() https://, then download the
> file an source it locally):
>
> source("https://gist.githubusercontent.com/HenrikBengtsson/ed1eeb41a1b4d6c43b47/raw/ebe58f76e518dd014423bea466a5c93d2efd3c99/readtable-fix.R")
>
> kkk <- c("a\tb",
> "3.14\tx")
>
> colClasses <- c(a="numeric", b="character")
> data <- read.table(textConnection(kkk),
> sep="\t",
> header = TRUE,
> colClasses = colClasses)
> str(data)
> ### 'data.frame': 1 obs. of 2 variables:
> ### $ a: num 3.14
> ### $ b: chr "x"
>
> ## Does not work with utils::read.table(), but with patch
> data <- read.table(textConnection(kkk),
> sep="\t",
> header = TRUE,
> colClasses = rev(colClasses))
> str(data)
> ### 'data.frame': 1 obs. of 2 variables:
> ### $ a: num 3.14
> ### $ b: chr "x"
>
> Let's hope that the above is a (10-year old) typo, and changing a < to
> a <= adds support for named 'colClasses', which is a really useful
> functionality.
>
> /Henrik
>
> On Wed, Jul 8, 2015 at 6:42 PM, Andreas Leha
> <andreas.leha at med.uni-goettingen.de> wrote:
>> Hi Henrik,
>>
>> Thanks for your reply.
>>
>> I am not (yet) convinced, though. The help page for read.table
>> mentions named colClasses and if I specify colClasses for not all
>> columns, the names are taken into account:
>>
>> --8<---------------cut here---------------start------------->8---
>> kkk <- c("a\tb",
>> "3.14\tx")
>> str(read.table(textConnection(kkk),
>> sep="\t",
>> header = TRUE))
>>
>> str(read.table(textConnection(kkk),
>> sep="\t",
>> header = TRUE,
>> colClasses=c(b="character")))
>> --8<---------------cut here---------------end--------------->8---
>>
>> What am I missing?
>>
>> Best,
>> Andreas
>>
>>
>>
>> On 09/07/2015 02:21, Henrik Bengtsson wrote:
>>> read.table() does not make use of names(colClasses) - only its values.
>>> Because of this, ordering is critical, as you noted. It shouldn't be
>>> too hard to add support for a named `colClasses` argument of
>>> utils::read.table(), but someone needs to convince the R core team
>>> that this is a good idea.
>>>
>>> As an alternative, see R.filesets::readDataFrame() for a
>>> read.table()-like function that matches names(colClasses) to column
>>> names, if they exists.
>>>
>>> /Henrik
>>> (author of R.filesets)
>>>
>>> On Wed, Jul 8, 2015 at 5:41 PM, Andreas Leha
>>> <andreas.leha at med.uni-goettingen.de> wrote:
>>>> Hi all,
>>>>
>>>> Apparently, the colClasses argument to read.table needs to be in the
>>>> order of the columns *even when it is named*. Why is that? And where
>>>> would I find it in the documentation?
>>>>
>>>> Here is a MWE:
>>>>
>>>> --8<---------------cut here---------------start------------->8---
>>>> kkk <- c("a\tb",
>>>> "3.14\tx")
>>>> read.table(textConnection(kkk),
>>>> sep="\t",
>>>> header = TRUE)
>>>>
>>>> cclasses=c(b="character",
>>>> a="numeric")
>>>>
>>>> read.table(textConnection(kkk),
>>>> sep="\t",
>>>> header = TRUE,
>>>> colClasses = cclasses) ## <--- error
>>>>
>>>> read.table(textConnection(kkk),
>>>> sep="\t",
>>>> header = TRUE,
>>>> colClasses = cclasses[order(names(cclasses))])
>>>> --8<---------------cut here---------------end--------------->8---
>>>>
>>>>
>>>> Thanks,
>>>> Andreas
>>>>
>>>> ______________________________________________
>>>> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
>>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>>> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
>>>> and provide commented, minimal, self-contained, reproducible code.
More information about the R-help
mailing list