[R] Importing data coming from Splus into R.

gerald.jean at dgag.ca gerald.jean at dgag.ca
Fri Feb 5 16:35:29 CET 2010


Hello there,

I spent all day yesterday trying to get a small data set from Splus into R,
no luck!  Both, Splus and R, are run on a 64-bit RedHat Linux machine, the
versions of the softwares are 64-bit and are as what follows:

Splus:
TIBCO Software Inc. Confidential Information
Copyright (c) 1988-2008 TIBCO Software Inc. ALL RIGHTS RESERVED.
TIBCO Spotfire S+ Version 8.1.1 for Linux 2.6.9-34.EL, 64-bit : 2008

R:
R version 2.8.0 (2008-10-20)
Copyright (C) 2008 The R Foundation for Statistical Computing
ISBN 3-900051-07-0

I know that the "foreign" package has a function to directly import Splus
data sets into R, but I also know that it is working only for 32-bit
versions of the softwares, hence I didn't try that route.  Here is what I
have done:

In Splus:

ttt <- exportData(data = FMD.CR.test,
                  file = "/home/jeg002/splus/R/Exemples/R/FMD-CR-test.csv",
                  type = "ASCII", delimiter = "@", quote = T, na.string =
"NA")
ttt.class <- unlist(lapply(FMD.CR.test, class))

### I am using "@" as delimiter since some factor levels contain both the
"," and the ";".

In R:

FMD.CR.test.fields <- count.fields(file =
"/home/jeg002/splus/R/Exemples/R/FMD-CR-test.csv",
                                   sep = "@", quote = "\"", comment.char =
"")
all(FMD.CR.test.fields == 327)
[1] TRUE  ## Hence all observations have the same number of fields, so far,
so good!

FMD.CR.test.classes <- c("factor", "character", "factor", "factor",
"factor",
                         "factor", "factor", "factor", "factor", "factor",
                         "factor", "numeric", "character", and so on)
names(FMD.CR.test.classes) <- c("RTA","police", "mnt.rent.bnct",
                         "mnt.rent.boni", "mnt.rent.cred.bnct",
                         "mnt.rent.epar.bnct", "mnt.rent.snbn",
                         "mnt.rent.trxl", "solde.eop", "solde.nenr.es",
                         "solde.enr.es", "num.enreg", "trouve", and so on)
FMD.CR.test <-
    read.table(file = "/home/jeg002/splus/R/Exemples/R/FMD-CR-test.csv",
               header = TRUE, sep = "@", quote = "\"", as.is = FALSE,
               strip.white = FALSE, comment.char = "", na.strings = "NA",
               nrows = 65000, colClasses = FMD.CR.test.classes)
dim(FMD.CR.test)
[1] 64093   327  ## OK

### Testing if classes are the same as the Splus classes.

FMD.CR.test.R.classes <- apply(FMD.CR.test, 2, FUN = class)
sum(FMD.CR.test.R.classes == FMD.CR.test.classes)
[1] 79  ## Not exactly what I was expecting!
all(FMD.CR.test.R.classes == "character")
[1] TRUE

Hence all variables were imported as character, which I find very
inconvenient; since the data set has a few hundred factor variables
recoding them is a lot of work, this work has already been done in Splus;
furthermore, the numeric variables would need conversion as well.

I tried all combinations of the arguments "as.is", "stringsAsFactors" and
"colClasses" to no avail.  I also tried to export the data set in SAS
transport format from Splus and read it through the foreign's read.xport
function, always the same result, everything is imported as character.  I
search the r-help archives, I found several messages relating this problem
but no satisfactory solution!

I am a long time user of Splus and I am planning to use R more often,
mainly due to its wealth of packages and the convenience of installing
them.  I hope to find a reliable and convivial way of transferring data
between the two cousins pieces of software.

Thanks for any insights,

Gérald Jean
Conseiller senior en statistiques,
VP Planification et Développement des Marchés,
Desjardins Groupe d'Assurances Générales
télephone            : (418) 835-4900 poste (7639)
télecopieur          : (418) 835-6657
courrier électronique: gerald.jean at dgag.ca

"In God we trust, all others must bring data"  W. Edwards Deming





Le message ci-dessus, ainsi que les documents l'accompagnant, sont destinés
uniquement aux personnes identifiées et peuvent contenir des informations
privilégiées, confidentielles ou ne pouvant être divulguées. Si vous avez
reçu ce message par erreur, veuillez le détruire.

This communication ( and/or the attachments ) is intended for named
recipients only and may contain privileged or confidential information
which is not to be disclosed. If you received this communication by mistake
please destroy all copies.




Faites bonne impression et imprimez seulement au besoin !
Think green before you print !

Le message ci-dessus, ainsi que les documents l'accompagnant, sont destinés uniquement aux personnes identifiées et peuvent contenir des informations privilégiées, confidentielles ou ne pouvant être divulguées. Si vous avez reçu ce message par erreur, veuillez le détruire.

This communication (and/or the attachments) is intended for named recipients only and may contain privileged or confidential information which is not to be disclosed. If you received this communication by mistake please destroy all copies.



More information about the R-help mailing list