[Rd] [External] Re: Wrong number of names?
iuke-tier@ey m@iii@g oii uiow@@edu
iuke-tier@ey m@iii@g oii uiow@@edu
Tue Nov 2 19:39:24 CET 2021
On Mon, 1 Nov 2021, Martin Maechler wrote:
>>>>>> Duncan Murdoch
>>>>>> on Mon, 1 Nov 2021 06:36:17 -0400 writes:
>
> > The StackOverflow post
> > https://stackoverflow.com/a/69767361/2554330 discusses a
> > dataframe which has a named numeric column of length 1488
> > that has 744 names. I don't think this is ever legal, but
> > am I wrong about that?
>
> > The `dat.rds` file mentioned in the post is temporarily
> > available online in case anyone else wants to examine it.
>
> > Assuming that the file contains a badly formed object, I
> > wonder if readRDS() should do some sanity checks as it
> > reads.
>
> > Duncan Murdoch
>
> Good question.
>
> In the mean time, I've also added a bit on the SO page
> above.. e.g.
>
> ---------------------------------------------------------------------------
>
> d <- readRDS("<.....>dat.rds")
> str(d)
> ## 'data.frame': 1488 obs. of 4 variables:
> ## $ facet_var: chr "AUT" "AUT" "AUT" "AUT" ...
> ## $ date : Date, format: "2020-04-26" "2020-04-27" ...
> ## $ variable : Factor w/ 2 levels "arima","prophet": 1 1 1 1 1 1 1 1 1 1 ...
> ## $ score : Named num 2.74e-06 2.41e-06 2.48e-06 2.39e-06 2.79e-06 ...
> ## ..- attr(*, "names")= chr [1:744] "new_confirmed10" "new_confirmed10" "new_confirmed10" "new_confirmed10" ...
>
> ds <- d$score
> c(length(ds), length(names(ds)))
> ## 1488 744
>
> dput(ds) # ->
>
> ## *** caught segfault ***
> ## address (nil), cause 'memory not mapped'
If I'm reading this right then dput is where the segfault is
happening, so that could use some more bulletproofing.
Best,
luke
>
> ---------------------------------------------------------------------------
>
> Hence "proving" that the dat.rds really contains an invalid object,
> when simple dput(.) directly gives a segmentation fault.
>
> I think we are aware that using C code and say .Call(..) one
> can create all kinds of invalid objects "easily".. and I think
> it's clear that it's not feasible to check for validity of such
> objects "everwhere".
>
> Your proposal to have at least our deserialization code used in
> readRDS() do (at least *some*) validity checks seems good, but
> maybe we should think of more cases, and / or do such validity
> checks already during serialization { <-> saveRDS() here } ?
>
> .. Such questions then really are for those who understand more than
> me about (de)serialization in R, its performance bottlenecks etc.
> Given the speed impact we should probably have such checks *optional*
> but have them *on* by default e.g., at least for saveRDS() ?
>
> Martin
>
> ______________________________________________
> R-devel using r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel
>
--
Luke Tierney
Ralph E. Wareham Professor of Mathematical Sciences
University of Iowa Phone: 319-335-3386
Department of Statistics and Fax: 319-335-3017
Actuarial Science
241 Schaeffer Hall email: luke-tierney using uiowa.edu
Iowa City, IA 52242 WWW: http://www.stat.uiowa.edu
More information about the R-devel
mailing list