[R] Numeric class and sasxport.get
Sebastien Bihorel
Sebastien.Bihorel at cognigencorp.com
Wed Feb 4 20:36:58 CET 2009
I also realized the flaw after testing the script on various datasets...
Following up on your last note:
1- Is that the reason why the class of integer and regular numeric
variable is solely "labelled" following sasxport.get?
2- Can class be 'soft' for other 'kind' of variables?
3- Would you anticipate the following wrapper function to generate
incompatibilities with other R functions?
SASxpt.get <- function(file, force.single = TRUE,
method=c('read.xport','dataload','csv'), formats=NULL,
allow=NULL,
out=NULL, keep=NULL, drop=NULL, as.is=0.5, FUN=NULL) {
foo <- sasxport.get(file=file, force.single=force.single, method=method,
formats=formats, allow=allow, out=out, keep=keep,
drop=drop, as.is=as.is, FUN=FUN)
# For each variable of class "labelled" (and only "labelled"), add the
native class as a second class argument
sglClassVarInd <- which(lapply(lapply(unclass(foo),class),length)==1)
for (i in 1:length(sglClassVarInd)){
x <- foo[,sglClassVarInd[i]]
if (class(x)=="labelled") class(foo[,sglClassVarInd[i]]) <-
c(class(x), class(unclass(x)))
}
return(foo)
}
*Sebastien Bihorel, PharmD, PhD*
PKPD Scientist
Cognigen Corp
Email: sebastien.bihorel at cognigencorp.com
<mailto:sebastien.bihorel at cognigencorp.com>
Phone: (716) 633-3463 ext. 323
Frank E Harrell Jr wrote:
> Sebastien Bihorel wrote:
>> Thanks a lot Frank,
>>
>> One last question, though. I was tempted to remove all attributes of
>> my variables after the sasxport.get call using
>> foo <- sasxport.get(...)
>> foo <- as.data.frame(lapply(unclass(foo),as.vector))
>> Since I never worked with the objects of class 'labeled', I was
>> wondering what I will loose by removing this attribute.
>
> Not a good idea, for many reasons including dates and other types.
>
> And the labelled type is need if you subset the data, in order to keep
> the labels.
>
> Note that your original issue is related to "class" being "soft" for
> integers and regular numerics:
>
> x <- 1:3
> > attributes(x)
> NULL
> > class(x)
> [1] "integer"
> > x <- runif(3)
> > class(x)
> [1] "numeric"
> > attributes(x)
> NULL
>
> Frank
>
>>
>> *Sebastien Bihorel, PharmD, PhD*
>> PKPD Scientist
>> Cognigen Corp
>> Email: sebastien.bihorel at cognigencorp.com
>> <mailto:sebastien.bihorel at cognigencorp.com>
>> Phone: (716) 633-3463 ext. 323
>>
>>
>> Frank E Harrell Jr wrote:
>>> Sebastien.Bihorel at cognigencorp.com wrote:
>>>> The problem is actually not related to a broken command but a
>>>> attempt of
>>>> operational qualification of R. A few years ago, my company
>>>> developed a
>>>> set of scripts for the 'operational qualification' of Splus. We are
>>>> switching to R so I am currently trying to port the scripts to R.
>>>> All Splus scripts imported SAS data using the importData function,
>>>> which I
>>>> substituted by sasxport.get. One particular script returns the
>>>> class of
>>>> each variable of the imported data frame; the output must match the
>>>> expected values: numeric, factor, integer, etc... The R
>>>> 'translation' with
>>>> sasxport.get is thus problematic.
>>>> If there is no easy tweak of the function, we will probably have to
>>>> remove
>>>> this script from our list of 'qualification' scripts.
>>>>
>>>> Although it would be nice
>>>
>>> Then my advice is to write your own wrapper function for
>>> sasxport.get that takes its output, looks for labelled variables,
>>> and adds a new class of your choosing depending on properties of the
>>> variable, making sure that you write methods needed for that class
>>> (if any). Then test your new function, not sasxport.get explicitly.
>>>
>>> Frank
>>>
>>>>
>>>>> Sebastien Bihorel wrote:
>>>>>> Frank,
>>>>>>
>>>>>> It is a non existing issue for me if the variables of class
>>>>>> "labelled"
>>>>>> (and only "labelled") can only be numerical variables (integer or
>>>>>> numeric).
>>>>>>
>>>>>> Sebastien
>>>>> 'labelled' can apply to any type of vector. I'm not clear on the
>>>>> problem this causes you. Please provide a command that is broken by
>>>>> this behavior.
>>>>>
>>>>> Frank
>>>>>
>>>>>> Frank E Harrell Jr wrote:
>>>>>>> Sebastien Bihorel wrote:
>>>>>>>> Dear R-users,
>>>>>>>>
>>>>>>>> The sasxport.get function (from the Hmisc package) automatically
>>>>>>>> defines the class of imported variables. I have noticed that the
>>>>>>>> class of theoretically numeric variables is simply "labelled",
>>>>>>>> although character variables might end up been defined as
>>>>>>>> "labelled"
>>>>>>>> "Date" or "labelled" "factor".
>>>>>>>> Is there a way to tell sasxport.get to define numeric variable as
>>>>>>>> "labelled" "integer" or "labelled" "numeric"?
>>>>>>> Sebastien,
>>>>>>>
>>>>>>> If that would fix a problem you're having we could look into it.
>>>>>>> Otherwise I'd tend to leave well enough alone.
>>>>>>>
>>>>>>> Frank
>>>>>>>
>>>>>>>> Thank you
>>>>>>>>
>>>>>>>> Sebastien
>>>>>>>>
>>>>>>>> ______________________________________________
>>>>>>>> R-help at r-project.org mailing list
>>>>>>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>>>>>>> PLEASE do read the posting guide
>>>>>>>> http://www.R-project.org/posting-guide.html
>>>>>>>> and provide commented, minimal, self-contained, reproducible code.
>>>>>>>>
>>>>>>>
>>>>>
>>>>> --
>>>>> Frank E Harrell Jr Professor and Chair School of Medicine
>>>>> Department of Biostatistics Vanderbilt
>>>>> University
>>>>>
>>>>
>>>>
>>>
>>>
>>
>
>
More information about the R-help
mailing list