[R] Numeric class and sasxport.get
Frank E Harrell Jr
f.harrell at vanderbilt.edu
Thu Feb 5 18:56:46 CET 2009
Sebastien Bihorel wrote:
> Ok, just so as I get that straight, is the 'labelled' class something
> that you created in your package or a readily available class in base R?
It's something we added for the Hmisc package.
Signing off,
Frank
>
> *Sebastien Bihorel, PharmD, PhD*
> PKPD Scientist
> Cognigen Corp
> Email: sebastien.bihorel at cognigencorp.com
> <mailto:sebastien.bihorel at cognigencorp.com>
> Phone: (716) 633-3463 ext. 323
>
>
> Frank E Harrell Jr wrote:
>> Sebastien Bihorel wrote:
>>> I also realized the flaw after testing the script on various datasets...
>>>
>>> Following up on your last note:
>>> 1- Is that the reason why the class of integer and regular numeric
>>> variable is solely "labelled" following sasxport.get?
>>
>> Yes. R gurus might correct me but just creating a numeric vector
>> doesn't create a 'hard' class, add adding your own class attribute
>> equal to 'numeric' or 'integer' might cause a problem downstream.
>>
>>> 2- Can class be 'soft' for other 'kind' of variables?
>>
>> Not that I can recall.
>>
>>> 3- Would you anticipate the following wrapper function to generate
>>> incompatibilities with other R functions?
>>
>> I'm going to beg off on that. I'm not enough of an expert on the
>> impact of adding such classes.
>>
>> Frank
>>
>>>
>>>
>>> SASxpt.get <- function(file, force.single = TRUE,
>>> method=c('read.xport','dataload','csv'),
>>> formats=NULL, allow=NULL,
>>> out=NULL, keep=NULL, drop=NULL, as.is=0.5, FUN=NULL) {
>>>
>>> foo <- sasxport.get(file=file, force.single=force.single,
>>> method=method,
>>> formats=formats, allow=allow, out=out, keep=keep,
>>> drop=drop, as.is=as.is, FUN=FUN)
>>>
>>> # For each variable of class "labelled" (and only "labelled"), add
>>> the native class as a second class argument
>>>
>>> sglClassVarInd <- which(lapply(lapply(unclass(foo),class),length)==1)
>>>
>>> for (i in 1:length(sglClassVarInd)){
>>> x <- foo[,sglClassVarInd[i]] if (class(x)=="labelled")
>>> class(foo[,sglClassVarInd[i]]) <- c(class(x), class(unclass(x)))
>>> }
>>> return(foo)
>>> }
>>>
>>>
>>> *Sebastien Bihorel, PharmD, PhD*
>>> PKPD Scientist
>>> Cognigen Corp
>>> Email: sebastien.bihorel at cognigencorp.com
>>> <mailto:sebastien.bihorel at cognigencorp.com>
>>> Phone: (716) 633-3463 ext. 323
>>>
>>>
>>> Frank E Harrell Jr wrote:
>>>> Sebastien Bihorel wrote:
>>>>> Thanks a lot Frank,
>>>>>
>>>>> One last question, though. I was tempted to remove all attributes
>>>>> of my variables after the sasxport.get call using
>>>>> foo <- sasxport.get(...)
>>>>> foo <- as.data.frame(lapply(unclass(foo),as.vector))
>>>>> Since I never worked with the objects of class 'labeled', I was
>>>>> wondering what I will loose by removing this attribute.
>>>>
>>>> Not a good idea, for many reasons including dates and other types.
>>>>
>>>> And the labelled type is need if you subset the data, in order to
>>>> keep the labels.
>>>>
>>>> Note that your original issue is related to "class" being "soft" for
>>>> integers and regular numerics:
>>>>
>>>> x <- 1:3
>>>> > attributes(x)
>>>> NULL
>>>> > class(x)
>>>> [1] "integer"
>>>> > x <- runif(3)
>>>> > class(x)
>>>> [1] "numeric"
>>>> > attributes(x)
>>>> NULL
>>>>
>>>> Frank
>>>>
>>>>>
>>>>> *Sebastien Bihorel, PharmD, PhD*
>>>>> PKPD Scientist
>>>>> Cognigen Corp
>>>>> Email: sebastien.bihorel at cognigencorp.com
>>>>> <mailto:sebastien.bihorel at cognigencorp.com>
>>>>> Phone: (716) 633-3463 ext. 323
>>>>>
>>>>>
>>>>> Frank E Harrell Jr wrote:
>>>>>> Sebastien.Bihorel at cognigencorp.com wrote:
>>>>>>> The problem is actually not related to a broken command but a
>>>>>>> attempt of
>>>>>>> operational qualification of R. A few years ago, my company
>>>>>>> developed a
>>>>>>> set of scripts for the 'operational qualification' of Splus. We are
>>>>>>> switching to R so I am currently trying to port the scripts to R.
>>>>>>> All Splus scripts imported SAS data using the importData
>>>>>>> function, which I
>>>>>>> substituted by sasxport.get. One particular script returns the
>>>>>>> class of
>>>>>>> each variable of the imported data frame; the output must match the
>>>>>>> expected values: numeric, factor, integer, etc... The R
>>>>>>> 'translation' with
>>>>>>> sasxport.get is thus problematic.
>>>>>>> If there is no easy tweak of the function, we will probably have
>>>>>>> to remove
>>>>>>> this script from our list of 'qualification' scripts.
>>>>>>>
>>>>>>> Although it would be nice
>>>>>>
>>>>>> Then my advice is to write your own wrapper function for
>>>>>> sasxport.get that takes its output, looks for labelled variables,
>>>>>> and adds a new class of your choosing depending on properties of
>>>>>> the variable, making sure that you write methods needed for that
>>>>>> class (if any). Then test your new function, not sasxport.get
>>>>>> explicitly.
>>>>>>
>>>>>> Frank
>>>>>>
>>>>>>>
>>>>>>>> Sebastien Bihorel wrote:
>>>>>>>>> Frank,
>>>>>>>>>
>>>>>>>>> It is a non existing issue for me if the variables of class
>>>>>>>>> "labelled"
>>>>>>>>> (and only "labelled") can only be numerical variables (integer or
>>>>>>>>> numeric).
>>>>>>>>>
>>>>>>>>> Sebastien
>>>>>>>> 'labelled' can apply to any type of vector. I'm not clear on the
>>>>>>>> problem this causes you. Please provide a command that is
>>>>>>>> broken by
>>>>>>>> this behavior.
>>>>>>>>
>>>>>>>> Frank
>>>>>>>>
>>>>>>>>> Frank E Harrell Jr wrote:
>>>>>>>>>> Sebastien Bihorel wrote:
>>>>>>>>>>> Dear R-users,
>>>>>>>>>>>
>>>>>>>>>>> The sasxport.get function (from the Hmisc package) automatically
>>>>>>>>>>> defines the class of imported variables. I have noticed that the
>>>>>>>>>>> class of theoretically numeric variables is simply "labelled",
>>>>>>>>>>> although character variables might end up been defined as
>>>>>>>>>>> "labelled"
>>>>>>>>>>> "Date" or "labelled" "factor".
>>>>>>>>>>> Is there a way to tell sasxport.get to define numeric
>>>>>>>>>>> variable as
>>>>>>>>>>> "labelled" "integer" or "labelled" "numeric"?
>>>>>>>>>> Sebastien,
>>>>>>>>>>
>>>>>>>>>> If that would fix a problem you're having we could look into it.
>>>>>>>>>> Otherwise I'd tend to leave well enough alone.
>>>>>>>>>>
>>>>>>>>>> Frank
>>>>>>>>>>
>>>>>>>>>>> Thank you
>>>>>>>>>>>
>>>>>>>>>>> Sebastien
>>>>>>>>>>>
>>>>>>>>>>> ______________________________________________
>>>>>>>>>>> R-help at r-project.org mailing list
>>>>>>>>>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>>>>>>>>>> PLEASE do read the posting guide
>>>>>>>>>>> http://www.R-project.org/posting-guide.html
>>>>>>>>>>> and provide commented, minimal, self-contained, reproducible
>>>>>>>>>>> code.
>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>
>>>>>>>> --
>>>>>>>> Frank E Harrell Jr Professor and Chair School of
>>>>>>>> Medicine
>>>>>>>> Department of Biostatistics Vanderbilt
>>>>>>>> University
>>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>
>>>>>>
>>>>>
>>>>
>>>>
>>>
>>
>>
>
--
Frank E Harrell Jr Professor and Chair School of Medicine
Department of Biostatistics Vanderbilt University
More information about the R-help
mailing list