[R] Numeric class and sasxport.get

Frank E Harrell Jr f.harrell at vanderbilt.edu
Thu Feb 5 18:56:46 CET 2009


Sebastien Bihorel wrote:
> Ok, just so as I get that straight, is the 'labelled' class something 
> that you created in your package or a readily available class in base R?

It's something we added for the Hmisc package.
Signing off,
Frank

> 
> *Sebastien Bihorel, PharmD, PhD*
> PKPD Scientist
> Cognigen Corp
> Email: sebastien.bihorel at cognigencorp.com 
> <mailto:sebastien.bihorel at cognigencorp.com>
> Phone: (716) 633-3463 ext. 323
> 
> 
> Frank E Harrell Jr wrote:
>> Sebastien Bihorel wrote:
>>> I also realized the flaw after testing the script on various datasets...
>>>
>>> Following up on your last note:
>>> 1- Is that the reason why the class of integer and regular numeric 
>>> variable is solely "labelled" following sasxport.get?
>>
>> Yes.  R gurus might correct me but just creating a numeric vector 
>> doesn't create a 'hard' class, add adding your own class attribute 
>> equal to 'numeric' or 'integer' might cause a problem downstream.
>>
>>> 2- Can class be 'soft' for other 'kind' of variables?
>>
>> Not that I can recall.
>>
>>> 3- Would you anticipate the following wrapper function to generate 
>>> incompatibilities with other R functions?
>>
>> I'm going to beg off on that.  I'm not enough of an expert on the 
>> impact of adding such classes.
>>
>> Frank
>>
>>>
>>>
>>> SASxpt.get <- function(file, force.single = TRUE,
>>>                  method=c('read.xport','dataload','csv'), 
>>> formats=NULL, allow=NULL,
>>>                  out=NULL, keep=NULL, drop=NULL, as.is=0.5, FUN=NULL) {
>>>
>>>  foo <- sasxport.get(file=file, force.single=force.single, 
>>> method=method,
>>>                      formats=formats, allow=allow, out=out, keep=keep,
>>>                      drop=drop, as.is=as.is, FUN=FUN)
>>>
>>>  # For each variable of class "labelled" (and only "labelled"), add 
>>> the native class as a second class argument
>>>
>>>  sglClassVarInd <- which(lapply(lapply(unclass(foo),class),length)==1)
>>>
>>>  for (i in 1:length(sglClassVarInd)){
>>>    x <- foo[,sglClassVarInd[i]]      if (class(x)=="labelled") 
>>> class(foo[,sglClassVarInd[i]]) <- c(class(x), class(unclass(x)))
>>>  }
>>>  return(foo)
>>> }
>>>
>>>
>>> *Sebastien Bihorel, PharmD, PhD*
>>> PKPD Scientist
>>> Cognigen Corp
>>> Email: sebastien.bihorel at cognigencorp.com 
>>> <mailto:sebastien.bihorel at cognigencorp.com>
>>> Phone: (716) 633-3463 ext. 323
>>>
>>>
>>> Frank E Harrell Jr wrote:
>>>> Sebastien Bihorel wrote:
>>>>> Thanks a lot Frank,
>>>>>
>>>>> One last question, though. I was tempted to remove all attributes 
>>>>> of my variables after the sasxport.get call using
>>>>> foo <- sasxport.get(...)
>>>>> foo <- as.data.frame(lapply(unclass(foo),as.vector))
>>>>> Since I never worked with the objects of class 'labeled', I was 
>>>>> wondering what I will loose by removing this attribute.
>>>>
>>>> Not a good idea, for many reasons including dates and other types.
>>>>
>>>> And the labelled type is need if you subset the data, in order to 
>>>> keep the labels.
>>>>
>>>> Note that your original issue is related to "class" being "soft" for 
>>>> integers and regular numerics:
>>>>
>>>>  x <- 1:3
>>>> > attributes(x)
>>>> NULL
>>>> > class(x)
>>>> [1] "integer"
>>>> > x <- runif(3)
>>>> > class(x)
>>>> [1] "numeric"
>>>> > attributes(x)
>>>> NULL
>>>>
>>>> Frank
>>>>
>>>>>
>>>>> *Sebastien Bihorel, PharmD, PhD*
>>>>> PKPD Scientist
>>>>> Cognigen Corp
>>>>> Email: sebastien.bihorel at cognigencorp.com 
>>>>> <mailto:sebastien.bihorel at cognigencorp.com>
>>>>> Phone: (716) 633-3463 ext. 323
>>>>>
>>>>>
>>>>> Frank E Harrell Jr wrote:
>>>>>> Sebastien.Bihorel at cognigencorp.com wrote:
>>>>>>> The problem is actually not related to a broken command but a 
>>>>>>> attempt of
>>>>>>> operational qualification of R. A few years ago, my company 
>>>>>>> developed a
>>>>>>> set of scripts for the 'operational qualification' of Splus. We are
>>>>>>> switching to R so I am currently trying to port the scripts to R.
>>>>>>> All Splus scripts imported SAS data using the importData 
>>>>>>> function, which I
>>>>>>> substituted by sasxport.get. One particular script returns the 
>>>>>>> class of
>>>>>>> each variable of the imported data frame; the output must match the
>>>>>>> expected values: numeric, factor, integer, etc... The R 
>>>>>>> 'translation' with
>>>>>>> sasxport.get is thus problematic.
>>>>>>> If there is no easy tweak of the function, we will probably have 
>>>>>>> to remove
>>>>>>> this script from our list of 'qualification' scripts.
>>>>>>>
>>>>>>> Although it would be nice
>>>>>>
>>>>>> Then my advice is to write your own wrapper function for 
>>>>>> sasxport.get that takes its output, looks for labelled variables, 
>>>>>> and adds a new class of your choosing depending on properties of 
>>>>>> the variable, making sure that you write methods needed for that 
>>>>>> class (if any).  Then test your new function, not sasxport.get 
>>>>>> explicitly.
>>>>>>
>>>>>> Frank
>>>>>>
>>>>>>>
>>>>>>>> Sebastien Bihorel wrote:
>>>>>>>>> Frank,
>>>>>>>>>
>>>>>>>>> It is a non existing issue for me if the variables of class 
>>>>>>>>> "labelled"
>>>>>>>>> (and only "labelled") can only be numerical variables (integer or
>>>>>>>>> numeric).
>>>>>>>>>
>>>>>>>>> Sebastien
>>>>>>>> 'labelled' can apply to any type of vector.  I'm not clear on the
>>>>>>>> problem this causes you.  Please provide a command that is 
>>>>>>>> broken by
>>>>>>>> this behavior.
>>>>>>>>
>>>>>>>> Frank
>>>>>>>>
>>>>>>>>> Frank E Harrell Jr wrote:
>>>>>>>>>> Sebastien Bihorel wrote:
>>>>>>>>>>> Dear R-users,
>>>>>>>>>>>
>>>>>>>>>>> The sasxport.get function (from the Hmisc package) automatically
>>>>>>>>>>> defines the class of imported variables. I have noticed that the
>>>>>>>>>>> class of theoretically numeric variables is simply "labelled",
>>>>>>>>>>> although character variables might end up been defined as 
>>>>>>>>>>> "labelled"
>>>>>>>>>>> "Date" or "labelled" "factor".
>>>>>>>>>>> Is there a way to tell sasxport.get to define numeric 
>>>>>>>>>>> variable as
>>>>>>>>>>> "labelled" "integer" or "labelled" "numeric"?
>>>>>>>>>> Sebastien,
>>>>>>>>>>
>>>>>>>>>> If that would fix a problem you're having we could look into it.
>>>>>>>>>> Otherwise I'd tend to leave well enough alone.
>>>>>>>>>>
>>>>>>>>>> Frank
>>>>>>>>>>
>>>>>>>>>>> Thank you
>>>>>>>>>>>
>>>>>>>>>>> Sebastien
>>>>>>>>>>>
>>>>>>>>>>> ______________________________________________
>>>>>>>>>>> R-help at r-project.org mailing list
>>>>>>>>>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>>>>>>>>>> PLEASE do read the posting guide
>>>>>>>>>>> http://www.R-project.org/posting-guide.html
>>>>>>>>>>> and provide commented, minimal, self-contained, reproducible 
>>>>>>>>>>> code.
>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>
>>>>>>>> -- 
>>>>>>>> Frank E Harrell Jr   Professor and Chair           School of 
>>>>>>>> Medicine
>>>>>>>>                       Department of Biostatistics   Vanderbilt 
>>>>>>>> University
>>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>
>>>>>>
>>>>>
>>>>
>>>>
>>>
>>
>>
> 


-- 
Frank E Harrell Jr   Professor and Chair           School of Medicine
                      Department of Biostatistics   Vanderbilt University




More information about the R-help mailing list