[R] Numeric class and sasxport.get
Sebastien Bihorel
Sebastien.Bihorel at cognigencorp.com
Thu Feb 5 15:09:25 CET 2009
Ok, just so as I get that straight, is the 'labelled' class something
that you created in your package or a readily available class in base R?
*Sebastien Bihorel, PharmD, PhD*
PKPD Scientist
Cognigen Corp
Email: sebastien.bihorel at cognigencorp.com
<mailto:sebastien.bihorel at cognigencorp.com>
Phone: (716) 633-3463 ext. 323
Frank E Harrell Jr wrote:
> Sebastien Bihorel wrote:
>> I also realized the flaw after testing the script on various datasets...
>>
>> Following up on your last note:
>> 1- Is that the reason why the class of integer and regular numeric
>> variable is solely "labelled" following sasxport.get?
>
> Yes. R gurus might correct me but just creating a numeric vector
> doesn't create a 'hard' class, add adding your own class attribute
> equal to 'numeric' or 'integer' might cause a problem downstream.
>
>> 2- Can class be 'soft' for other 'kind' of variables?
>
> Not that I can recall.
>
>> 3- Would you anticipate the following wrapper function to generate
>> incompatibilities with other R functions?
>
> I'm going to beg off on that. I'm not enough of an expert on the
> impact of adding such classes.
>
> Frank
>
>>
>>
>> SASxpt.get <- function(file, force.single = TRUE,
>> method=c('read.xport','dataload','csv'),
>> formats=NULL, allow=NULL,
>> out=NULL, keep=NULL, drop=NULL, as.is=0.5, FUN=NULL) {
>>
>> foo <- sasxport.get(file=file, force.single=force.single,
>> method=method,
>> formats=formats, allow=allow, out=out, keep=keep,
>> drop=drop, as.is=as.is, FUN=FUN)
>>
>> # For each variable of class "labelled" (and only "labelled"), add
>> the native class as a second class argument
>>
>> sglClassVarInd <- which(lapply(lapply(unclass(foo),class),length)==1)
>>
>> for (i in 1:length(sglClassVarInd)){
>> x <- foo[,sglClassVarInd[i]] if (class(x)=="labelled")
>> class(foo[,sglClassVarInd[i]]) <- c(class(x), class(unclass(x)))
>> }
>> return(foo)
>> }
>>
>>
>> *Sebastien Bihorel, PharmD, PhD*
>> PKPD Scientist
>> Cognigen Corp
>> Email: sebastien.bihorel at cognigencorp.com
>> <mailto:sebastien.bihorel at cognigencorp.com>
>> Phone: (716) 633-3463 ext. 323
>>
>>
>> Frank E Harrell Jr wrote:
>>> Sebastien Bihorel wrote:
>>>> Thanks a lot Frank,
>>>>
>>>> One last question, though. I was tempted to remove all attributes
>>>> of my variables after the sasxport.get call using
>>>> foo <- sasxport.get(...)
>>>> foo <- as.data.frame(lapply(unclass(foo),as.vector))
>>>> Since I never worked with the objects of class 'labeled', I was
>>>> wondering what I will loose by removing this attribute.
>>>
>>> Not a good idea, for many reasons including dates and other types.
>>>
>>> And the labelled type is need if you subset the data, in order to
>>> keep the labels.
>>>
>>> Note that your original issue is related to "class" being "soft" for
>>> integers and regular numerics:
>>>
>>> x <- 1:3
>>> > attributes(x)
>>> NULL
>>> > class(x)
>>> [1] "integer"
>>> > x <- runif(3)
>>> > class(x)
>>> [1] "numeric"
>>> > attributes(x)
>>> NULL
>>>
>>> Frank
>>>
>>>>
>>>> *Sebastien Bihorel, PharmD, PhD*
>>>> PKPD Scientist
>>>> Cognigen Corp
>>>> Email: sebastien.bihorel at cognigencorp.com
>>>> <mailto:sebastien.bihorel at cognigencorp.com>
>>>> Phone: (716) 633-3463 ext. 323
>>>>
>>>>
>>>> Frank E Harrell Jr wrote:
>>>>> Sebastien.Bihorel at cognigencorp.com wrote:
>>>>>> The problem is actually not related to a broken command but a
>>>>>> attempt of
>>>>>> operational qualification of R. A few years ago, my company
>>>>>> developed a
>>>>>> set of scripts for the 'operational qualification' of Splus. We are
>>>>>> switching to R so I am currently trying to port the scripts to R.
>>>>>> All Splus scripts imported SAS data using the importData
>>>>>> function, which I
>>>>>> substituted by sasxport.get. One particular script returns the
>>>>>> class of
>>>>>> each variable of the imported data frame; the output must match the
>>>>>> expected values: numeric, factor, integer, etc... The R
>>>>>> 'translation' with
>>>>>> sasxport.get is thus problematic.
>>>>>> If there is no easy tweak of the function, we will probably have
>>>>>> to remove
>>>>>> this script from our list of 'qualification' scripts.
>>>>>>
>>>>>> Although it would be nice
>>>>>
>>>>> Then my advice is to write your own wrapper function for
>>>>> sasxport.get that takes its output, looks for labelled variables,
>>>>> and adds a new class of your choosing depending on properties of
>>>>> the variable, making sure that you write methods needed for that
>>>>> class (if any). Then test your new function, not sasxport.get
>>>>> explicitly.
>>>>>
>>>>> Frank
>>>>>
>>>>>>
>>>>>>> Sebastien Bihorel wrote:
>>>>>>>> Frank,
>>>>>>>>
>>>>>>>> It is a non existing issue for me if the variables of class
>>>>>>>> "labelled"
>>>>>>>> (and only "labelled") can only be numerical variables (integer or
>>>>>>>> numeric).
>>>>>>>>
>>>>>>>> Sebastien
>>>>>>> 'labelled' can apply to any type of vector. I'm not clear on the
>>>>>>> problem this causes you. Please provide a command that is
>>>>>>> broken by
>>>>>>> this behavior.
>>>>>>>
>>>>>>> Frank
>>>>>>>
>>>>>>>> Frank E Harrell Jr wrote:
>>>>>>>>> Sebastien Bihorel wrote:
>>>>>>>>>> Dear R-users,
>>>>>>>>>>
>>>>>>>>>> The sasxport.get function (from the Hmisc package) automatically
>>>>>>>>>> defines the class of imported variables. I have noticed that the
>>>>>>>>>> class of theoretically numeric variables is simply "labelled",
>>>>>>>>>> although character variables might end up been defined as
>>>>>>>>>> "labelled"
>>>>>>>>>> "Date" or "labelled" "factor".
>>>>>>>>>> Is there a way to tell sasxport.get to define numeric
>>>>>>>>>> variable as
>>>>>>>>>> "labelled" "integer" or "labelled" "numeric"?
>>>>>>>>> Sebastien,
>>>>>>>>>
>>>>>>>>> If that would fix a problem you're having we could look into it.
>>>>>>>>> Otherwise I'd tend to leave well enough alone.
>>>>>>>>>
>>>>>>>>> Frank
>>>>>>>>>
>>>>>>>>>> Thank you
>>>>>>>>>>
>>>>>>>>>> Sebastien
>>>>>>>>>>
>>>>>>>>>> ______________________________________________
>>>>>>>>>> R-help at r-project.org mailing list
>>>>>>>>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>>>>>>>>> PLEASE do read the posting guide
>>>>>>>>>> http://www.R-project.org/posting-guide.html
>>>>>>>>>> and provide commented, minimal, self-contained, reproducible
>>>>>>>>>> code.
>>>>>>>>>>
>>>>>>>>>
>>>>>>>
>>>>>>> --
>>>>>>> Frank E Harrell Jr Professor and Chair School of
>>>>>>> Medicine
>>>>>>> Department of Biostatistics Vanderbilt
>>>>>>> University
>>>>>>>
>>>>>>
>>>>>>
>>>>>
>>>>>
>>>>
>>>
>>>
>>
>
>
More information about the R-help
mailing list