[Rd] unlist errors on a nested list of empty lists
Martin Maechler
m@echler @ending from @t@t@m@th@ethz@ch
Thu May 10 09:33:23 CEST 2018
>>>>> Steven Nydick <swnydick at gmail.com>
>>>>> on Wed, 9 May 2018 13:25:11 +0000 writes:
> I do not have access to the bug reporting system. If somebody can get me
> access, I can create a formal bug report.
> The latter issues seem like duplicates of:
> https://bugs.r-project.org/bugzilla3/show_bug.cgi?id=12572 (with slightly
> different output), but as that bug was reported nearly 10 years ago, it
> might be worth creating an update under R version 3. I could not find the
> first issue when searching the bug reports (which I ran into when trying to
> parse JSON files), which is why I posted on r-devel.
Indeed, thanks a lot Steven (and Duncan!), I've found the
following:
1. The first issue is a new bug, in R "only" since R version
3.4.0, i.e. working upto R 3.3.3.
Duncan's patch basically fixes.
I've found that the C code there can be simplified and
deconvoluted, and after that, I will commit basically the bug
fix of Duncan Murdoch.
2. The second issues indeed are an entirely different bug, and I
would say actually point to a "design problem" of the whole thing.
The C code in islistfactor() talks about arbitrary trees with
all leaves factors, whereas the R code -- in the
islistfactor() is TRUE -- actually only correctly deals with
simple trees, namely of depth exactly 1. That are those you typically
get from e.g., lapply(), and so this old design-bug triggers
relatively rarely.
Last but not least: I have created an account for you, Steven,
on the bugzilla site.
Given we have holidays till the weekend and private duties of
mine, I won't get to more for now.
Best
Martin Maechler
> On Tue, May 8, 2018 at 7:51 PM Duncan Murdoch <murdoch.duncan at gmail.com>
> wrote:
>> On 08/05/2018 4:50 PM, Steven Nydick wrote:
>> > It also does the same thing if the factor is not on the first level of
>> > the list, which seems to be due to the fact that the islistfactor is
>> > recursive, but if a list is a list-factor, the first level lists are
>> > coerced into character strings.
>> >
>> > > x <- list(list(factor(LETTERS[1])))
>> > > unlist(x)
>> > Error in as.character.factor(x) : malformed factor
>> >
>> > However, if one of the factors is at the top level, and one is nested,
>> > then the result is:
>> >
>> > > x <- list(list(factor(LETTERS[1])), factor(LETTERS[2]))
>> > > unlist(x)
>> >
>> > [1] <NA> B
>> > Levels: B
>> >
>> > ... which does not seem to me to be desired behavior.
>>
>> The patch I suggested doesn't help with either of these. I'd suggest
>> collecting examples, and posting a bug report to bugs.r-project.org.
>>
>> Duncan Murdoch
>>
>>
>> >
>> >
>> > On Tue, May 8, 2018 at 2:22 PM Duncan Murdoch <murdoch.duncan at gmail.com
>> > <mailto:murdoch.duncan at gmail.com>> wrote:
>> >
>> > On 08/05/2018 2:58 PM, Duncan Murdoch wrote:
>> > > On 08/05/2018 1:48 PM, Steven Nydick wrote:
>> > >> Reproducible example:
>> > >>
>> > >> x <- list(list(list(), list()))
>> > >> unlist(x)
>> > >>
>> > >> *> Error in as.character.factor(x) : malformed factor*
>> > >
>> > > The error comes from the line
>> > >
>> > > structure(res, levels = lv, names = nm, class = "factor")
>> > >
>> > > which is called because unlist() thinks that some entry is a
>> factor,
>> > > with NULL levels and NULL names. It's not legal for a factor to
>> have
>> > > NULL levels. Probably it should never get here; the earlier test
>> > >
>> > > if (.Internal(islistfactor(x, recursive))) {
>> > >
>> > > should have been false, and then the result would have been
>> > >
>> > > .Internal(unlist(x, recursive, use.names))
>> > >
>> > > (with both recursive and use.names being TRUE), which returns
>> NULL.
>> >
>> > And the problem is in the islistfactor function in src/main/apply.c,
>> > which looks like this:
>> >
>> > static Rboolean islistfactor(SEXP X)
>> > {
>> > int i, n = length(X);
>> >
>> > switch(TYPEOF(X)) {
>> > case VECSXP:
>> > case EXPRSXP:
>> > if(n == 0) return NA_LOGICAL;
>> > for(i = 0; i < LENGTH(X); i++)
>> > if(!islistfactor(VECTOR_ELT(X, i))) return FALSE;
>> > return TRUE;
>> > break;
>> > }
>> > return isFactor(X);
>> > }
>> >
>> > One of those deeply nested lists is length 0, so at the lowest level
>> it
>> > returns NA_LOGICAL. But then it does C-style logical testing on the
>> > results. I think to C NA_LOGICAL counts as true, so at the next
>> level
>> > up we get the wrong answer.
>> >
>> > A fix would be to rewrite it like this:
>> >
>> > static Rboolean islistfactor(SEXP X)
>> > {
>> > int i, n = length(X);
>> > Rboolean result = NA_LOGICAL, childresult;
>> > switch(TYPEOF(X)) {
>> > case VECSXP:
>> > case EXPRSXP:
>> > for(i = 0; i < LENGTH(X); i++) {
>> > childresult = islistfactor(VECTOR_ELT(X, i));
>> > if(childresult == FALSE) return FALSE;
>> > else if(childresult == TRUE) result = TRUE;
>> > }
>> > return result;
>> > break;
>> > }
>> > return isFactor(X);
>> > }
>> >
>> >
>> >
>> > --
>> > Steven Nydick
>> > PhD, Quantitative Psychology
>> > M.A., Psychology
>> > M.S., Statistics
>> > --
>> > "Beware of the man who works hard to learn something, learns it, and
>> > finds himself no wiser than before, Bokonon tells us. He is full of
>> > murderous resentment of people who are ignorant without having come by
>> > their ignorance the hard way."
>> > -Kurt Vonnegut
>>
>>
> --
> Steven Nydick
> PhD, Quantitative Psychology
> M.A., Psychology
> M.S., Statistics
> --
> "Beware of the man who works hard to learn something, learns it, and finds
> himself no wiser than before, Bokonon tells us. He is full of murderous
> resentment of people who are ignorant without having come by their
> ignorance the hard way."
> -Kurt Vonnegut
> [[alternative HTML version deleted]]
> ______________________________________________
> R-devel at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel
More information about the R-devel
mailing list