[R] Numeric class and sasxport.get

Sebastien Bihorel Sebastien.Bihorel at cognigencorp.com
Thu Feb 5 15:09:25 CET 2009


Ok, just so as I get that straight, is the 'labelled' class something 
that you created in your package or a readily available class in base R?

*Sebastien Bihorel, PharmD, PhD*
PKPD Scientist
Cognigen Corp
Email: sebastien.bihorel at cognigencorp.com 
<mailto:sebastien.bihorel at cognigencorp.com>
Phone: (716) 633-3463 ext. 323


Frank E Harrell Jr wrote:
> Sebastien Bihorel wrote:
>> I also realized the flaw after testing the script on various datasets...
>>
>> Following up on your last note:
>> 1- Is that the reason why the class of integer and regular numeric 
>> variable is solely "labelled" following sasxport.get?
>
> Yes.  R gurus might correct me but just creating a numeric vector 
> doesn't create a 'hard' class, add adding your own class attribute 
> equal to 'numeric' or 'integer' might cause a problem downstream.
>
>> 2- Can class be 'soft' for other 'kind' of variables?
>
> Not that I can recall.
>
>> 3- Would you anticipate the following wrapper function to generate 
>> incompatibilities with other R functions?
>
> I'm going to beg off on that.  I'm not enough of an expert on the 
> impact of adding such classes.
>
> Frank
>
>>
>>
>> SASxpt.get <- function(file, force.single = TRUE,
>>                  method=c('read.xport','dataload','csv'), 
>> formats=NULL, allow=NULL,
>>                  out=NULL, keep=NULL, drop=NULL, as.is=0.5, FUN=NULL) {
>>
>>  foo <- sasxport.get(file=file, force.single=force.single, 
>> method=method,
>>                      formats=formats, allow=allow, out=out, keep=keep,
>>                      drop=drop, as.is=as.is, FUN=FUN)
>>
>>  # For each variable of class "labelled" (and only "labelled"), add 
>> the native class as a second class argument
>>
>>  sglClassVarInd <- which(lapply(lapply(unclass(foo),class),length)==1)
>>
>>  for (i in 1:length(sglClassVarInd)){
>>    x <- foo[,sglClassVarInd[i]]      if (class(x)=="labelled") 
>> class(foo[,sglClassVarInd[i]]) <- c(class(x), class(unclass(x)))
>>  }
>>  return(foo)
>> }
>>
>>
>> *Sebastien Bihorel, PharmD, PhD*
>> PKPD Scientist
>> Cognigen Corp
>> Email: sebastien.bihorel at cognigencorp.com 
>> <mailto:sebastien.bihorel at cognigencorp.com>
>> Phone: (716) 633-3463 ext. 323
>>
>>
>> Frank E Harrell Jr wrote:
>>> Sebastien Bihorel wrote:
>>>> Thanks a lot Frank,
>>>>
>>>> One last question, though. I was tempted to remove all attributes 
>>>> of my variables after the sasxport.get call using
>>>> foo <- sasxport.get(...)
>>>> foo <- as.data.frame(lapply(unclass(foo),as.vector))
>>>> Since I never worked with the objects of class 'labeled', I was 
>>>> wondering what I will loose by removing this attribute.
>>>
>>> Not a good idea, for many reasons including dates and other types.
>>>
>>> And the labelled type is need if you subset the data, in order to 
>>> keep the labels.
>>>
>>> Note that your original issue is related to "class" being "soft" for 
>>> integers and regular numerics:
>>>
>>>  x <- 1:3
>>> > attributes(x)
>>> NULL
>>> > class(x)
>>> [1] "integer"
>>> > x <- runif(3)
>>> > class(x)
>>> [1] "numeric"
>>> > attributes(x)
>>> NULL
>>>
>>> Frank
>>>
>>>>
>>>> *Sebastien Bihorel, PharmD, PhD*
>>>> PKPD Scientist
>>>> Cognigen Corp
>>>> Email: sebastien.bihorel at cognigencorp.com 
>>>> <mailto:sebastien.bihorel at cognigencorp.com>
>>>> Phone: (716) 633-3463 ext. 323
>>>>
>>>>
>>>> Frank E Harrell Jr wrote:
>>>>> Sebastien.Bihorel at cognigencorp.com wrote:
>>>>>> The problem is actually not related to a broken command but a 
>>>>>> attempt of
>>>>>> operational qualification of R. A few years ago, my company 
>>>>>> developed a
>>>>>> set of scripts for the 'operational qualification' of Splus. We are
>>>>>> switching to R so I am currently trying to port the scripts to R.
>>>>>> All Splus scripts imported SAS data using the importData 
>>>>>> function, which I
>>>>>> substituted by sasxport.get. One particular script returns the 
>>>>>> class of
>>>>>> each variable of the imported data frame; the output must match the
>>>>>> expected values: numeric, factor, integer, etc... The R 
>>>>>> 'translation' with
>>>>>> sasxport.get is thus problematic.
>>>>>> If there is no easy tweak of the function, we will probably have 
>>>>>> to remove
>>>>>> this script from our list of 'qualification' scripts.
>>>>>>
>>>>>> Although it would be nice
>>>>>
>>>>> Then my advice is to write your own wrapper function for 
>>>>> sasxport.get that takes its output, looks for labelled variables, 
>>>>> and adds a new class of your choosing depending on properties of 
>>>>> the variable, making sure that you write methods needed for that 
>>>>> class (if any).  Then test your new function, not sasxport.get 
>>>>> explicitly.
>>>>>
>>>>> Frank
>>>>>
>>>>>>
>>>>>>> Sebastien Bihorel wrote:
>>>>>>>> Frank,
>>>>>>>>
>>>>>>>> It is a non existing issue for me if the variables of class 
>>>>>>>> "labelled"
>>>>>>>> (and only "labelled") can only be numerical variables (integer or
>>>>>>>> numeric).
>>>>>>>>
>>>>>>>> Sebastien
>>>>>>> 'labelled' can apply to any type of vector.  I'm not clear on the
>>>>>>> problem this causes you.  Please provide a command that is 
>>>>>>> broken by
>>>>>>> this behavior.
>>>>>>>
>>>>>>> Frank
>>>>>>>
>>>>>>>> Frank E Harrell Jr wrote:
>>>>>>>>> Sebastien Bihorel wrote:
>>>>>>>>>> Dear R-users,
>>>>>>>>>>
>>>>>>>>>> The sasxport.get function (from the Hmisc package) automatically
>>>>>>>>>> defines the class of imported variables. I have noticed that the
>>>>>>>>>> class of theoretically numeric variables is simply "labelled",
>>>>>>>>>> although character variables might end up been defined as 
>>>>>>>>>> "labelled"
>>>>>>>>>> "Date" or "labelled" "factor".
>>>>>>>>>> Is there a way to tell sasxport.get to define numeric 
>>>>>>>>>> variable as
>>>>>>>>>> "labelled" "integer" or "labelled" "numeric"?
>>>>>>>>> Sebastien,
>>>>>>>>>
>>>>>>>>> If that would fix a problem you're having we could look into it.
>>>>>>>>> Otherwise I'd tend to leave well enough alone.
>>>>>>>>>
>>>>>>>>> Frank
>>>>>>>>>
>>>>>>>>>> Thank you
>>>>>>>>>>
>>>>>>>>>> Sebastien
>>>>>>>>>>
>>>>>>>>>> ______________________________________________
>>>>>>>>>> R-help at r-project.org mailing list
>>>>>>>>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>>>>>>>>> PLEASE do read the posting guide
>>>>>>>>>> http://www.R-project.org/posting-guide.html
>>>>>>>>>> and provide commented, minimal, self-contained, reproducible 
>>>>>>>>>> code.
>>>>>>>>>>
>>>>>>>>>
>>>>>>>
>>>>>>> -- 
>>>>>>> Frank E Harrell Jr   Professor and Chair           School of 
>>>>>>> Medicine
>>>>>>>                       Department of Biostatistics   Vanderbilt 
>>>>>>> University
>>>>>>>
>>>>>>
>>>>>>
>>>>>
>>>>>
>>>>
>>>
>>>
>>
>
>




More information about the R-help mailing list