[R-sig-ME] PROC MIXED RANDOM equivalence in R nlme

Fri Aug 11 23:03:11 CEST 2017

I am trying to reproduce some old SAS PROC MIXED code using R and nlme.

As background, the data consist of 300+ vehicle emission test results 
(grams/mile of THC over a given driving cycle) versus the gasoline fuel 
properties, which include: Read Vapor Pressure (rv), Temperature for 50% 
Evaporation (t5), Temperature for 90% Evaporation (t9), Aromatics (ar), 
Olefins, (ol), Sulfur (su), Oxygenate (ox), Benzene (bz), and a code 
identifying the study and vehicle (study_vehicle). All variables are 
real numbers except "study_vehicle", which is character. Unfortunately, 
since the data are confidential, I'm unable to provide an example.

The goal is to determine the relationship between the emissions and the 
fuel properties.

The original SAS v6.12 code is provided below:

------------------------------------------------------------------
/* READ DATA */
DATA emiss;
    INFILE 'data.tab' LRECL=8000 FIRSTOBS=2 DLM='09'X MISSOVER DSD;
    INPUT study_vehicle $ thc rv t5 t9 ar ol ox su bz;

/* CREATE NEW VARIABLE */
ln_thc = log (thc);

/* PERFORM ANALYSIS RELATING LOG EMISSIONS TO FUEL PROPERTIES */
PROC MIXED DATA=emiss MAXITER=1000 CONVH=1E-8 METHOD=REML NOCLPRINT 
NOITPRINT;
CLASS study_vehicle;

MODEL ln_thc = rv t5 t9 ar ol ox su bz
                 /S DDFM=RES;

RANDOM         int rv t5 t9 ar ol ox su bz
                /SUB=study_vehicle;
RUN;
------------------------------------------------------------------

The R code I've devised for reading/processing the data and to replicate 
the PROC MIXED statement is shown below:

------------------------------------------------------------------
# LOAD REQUIRED LIBRARY
library (“lmne”)

# FORCE R TO USE THE SAME "CONTRAST" METHOD AS SAS
options(contrasts = c(factor = "contr.SAS", ordered = "contr.poly"))

# READ DATA (study_vehicle, thc, rv, t5, t9, ar, ol, ox, su, bz)
emiss <- read.table("data.csv", header=TRUE, sep=",") # NA for missing 
data (default)

# CREATE NEW VARIABLE
ln_thc <- log(thc)

# PERFORM ANALYSIS RELATING LOG EMISSIONS TO FUEL PROPERTIES
FitTHC <- lme(ln_thc ~ rv + t5 + t9 + ar + ol + ox + su + bz,
           data = emiss,
           random = ??????? | study_vehicle )

# OUTPUT THE RESULTS
summary(FitTHC)
------------------------------------------------------------------

A review of the dataframe "emiss" shows that all variables, except 
"study_vehicle", are of type "num". As expected, the character variable 
"study_vehicle" is of type "factor".  These properties match those from SAS.

As indicated by the questions marks, the problem I'm having is in 
constructing the equivalent R random code for the SAS RANDOM and any 
remaining settings (e.g. "/S DDFM=RES"). I've tried

random = ~1 + rv + t5 + t9 + ar + ol + ox + su + bz | study_vehicle)

but R gets caught in a processing loop that produces no errors or 
warnings, and no results, even after 15+ minutes of execution time. As I 
recall, the SAS code took less than 15 seconds back in 1999.

I have also tried

random = ~1 | study_vehicle)

and R immediately produces output.  However, although the regression 
coefficients are close to those from SAS, the degrees of freedom from 
SAS are higher than those from R.

Therefore, what is the equivalent R random code for the original SAS RANDOM?

Thanks!