[R] Importing data coming from Splus into R.

gerald.jean at dgag.ca gerald.jean at dgag.ca
Fri Feb 5 19:58:08 CET 2010


Hello Bill,

here is what I tried with the Splus built-in data set "claims".

In Splus:

apply(claims, 2, class)
       age   car.age     type      cost    number
 "ordered" "ordered" "factor" "numeric" "numeric"
dump(list = "claims",
     fileout = "/home/jeg002/splus/R/Exemples/R/myclaims.txt",
     oldStyle = T)  ## I tried both, oldStyle = T and oldStyle = F, same
results.

In R:

claims <- source("/home/jeg002/splus/R/Exemples/R/myclaims.txt")
apply(claims$value, 2, class)  ## oldStyle = T this time.
        age     car.age        type        cost      number
"character" "character" "character" "character" "character"

I must admit I had not tried using "write.table" from Splus.  I did, now,
always with the "claims" data set.  On the first attempt R complained of no
method to change the character variables to the "ordered" class.  I made a
copy of the data set in Splus, changed the class of two variables from
"ordered" to "factor" and gave it another try.  Here are the results:

In Splus:

new.claims <- claims
class(new.claims$age) <- "factor"
class(new.claims$car.age) <- "factor"
apply(new.claims, 2, class)
      age  car.age     type      cost    number
 "factor" "factor" "factor" "numeric" "numeric"
write.table(data = new.claims,
            file = "/home/jeg002/splus/R/Exemples/R/myclaims.txt",
            sep = "@", append = F, quote.strings = T,
            dimnames.write = T, na = NA, end.of.row = "\n",
            justify.format = "decimal")

In R:

claims.classes <- c("character", "factor", "factor", "factor", "numeric",
                    "numeric")  ## The first "character" is for the
row.names
claims <-
    read.table(file = "/home/jeg002/splus/R/Exemples/R/myclaims.txt",
               header = TRUE, sep = "@", quote = "\"", as.is = FALSE,
               strip.white = FALSE, comment.char = "", na.strings = "NA",
               nrows = 200, colClasses = claims.classes)
apply(claims, 2, class)
  row.names         age     car.age        type        cost      number
"character" "character" "character" "character" "character" "character"


I'd be more than happy to supply you a small sample of my data set if the
built-in "claims" doesn't do the job.

Thanks for your support,

Gérald Jean
Conseiller senior en statistiques,
VP Planification et Développement des Marchés,
Desjardins Groupe d'Assurances Générales
télephone            : (418) 835-4900 poste (7639)
télecopieur          : (418) 835-6657
courrier électronique: gerald.jean at dgag.ca

"In God we trust, all others must bring data"  W. Edwards Deming


"William Dunlap" <wdunlap at tibco.com> a écrit sur 2010/02/05 12:37:25 :

> For a data.frame with only numeric and factor
> columns using dump() on the S+ end and source()
> on the R end ought to work.  If you have timeDate
> columns you will need to convert them to character
> data before exporting and convert them to your
> favorite R time/date class after importing them.
>
> If you could send me a fairly small sample of your
> data that shows the incompatibility between S+'s
> write.table and R's read.table I could try to fix
> things up so they were more compatible.
>
> Code that reads the S+ native binary format must
> be 32/64 bit aware, since S+ integers are 32 bits
> on 32-bit versions of S+ and 64 bits on 64-bit
> versions.
>
> Bill Dunlap
> Spotfire, TIBCO Software
> wdunlap tibco.com
>
> > -----Original Message-----
> > From: r-help-bounces at r-project.org
> > [mailto:r-help-bounces at r-project.org] On Behalf Of Uwe Ligges
> > Sent: Friday, February 05, 2010 8:05 AM
> > To: Gerald Jean
> > Cc: r-help at r-project.org
> > Subject: Re: [R] Importing data coming from Splus into R.
> >
> > 1. I am stuck with a copy of S-PLUS 4.x. At that time I used
> > dump() in
> > S-PLUS and source() to get things into R afterwards ...
> >
> > 2. Why do you think that 32-bit vs. 64-bit issues matter? The file
> > format does not change (well, this is guessed since I do not have any
> > 64-bit S-PLUS version available).
> >
> > Best,
> > Uwe Ligges
> >
> >
> > On 05.02.2010 16:35, gerald.jean at dgag.ca wrote:
> > >
> > > Hello there,
> > >
> > > I spent all day yesterday trying to get a small data set
> > from Splus into R,
> > > no luck!  Both, Splus and R, are run on a 64-bit RedHat
> > Linux machine, the
> > > versions of the softwares are 64-bit and are as what follows:
> > >
> > > Splus:
> > > TIBCO Software Inc. Confidential Information
> > > Copyright (c) 1988-2008 TIBCO Software Inc. ALL RIGHTS RESERVED.
> > > TIBCO Spotfire S+ Version 8.1.1 for Linux 2.6.9-34.EL, 64-bit : 2008
> > >
> > > R:
> > > R version 2.8.0 (2008-10-20)
> > > Copyright (C) 2008 The R Foundation for Statistical Computing
> > > ISBN 3-900051-07-0
> > >
> > > I know that the "foreign" package has a function to
> > directly import Splus
> > > data sets into R, but I also know that it is working only for 32-bit
> > > versions of the softwares, hence I didn't try that route.
> > Here is what I
> > > have done:
> > >
> > > In Splus:
> > >
> > > ttt<- exportData(data = FMD.CR.test,
> > >                    file =
> > "/home/jeg002/splus/R/Exemples/R/FMD-CR-test.csv",
> > >                    type = "ASCII", delimiter = "@", quote =
> > T, na.string =
> > > "NA")
> > > ttt.class<- unlist(lapply(FMD.CR.test, class))
> > >
> > > ### I am using "@" as delimiter since some factor levels
> > contain both the
> > > "," and the ";".
> > >
> > > In R:
> > >
> > > FMD.CR.test.fields<- count.fields(file =
> > > "/home/jeg002/splus/R/Exemples/R/FMD-CR-test.csv",
> > >                                     sep = "@", quote =
> > "\"", comment.char =
> > > "")
> > > all(FMD.CR.test.fields == 327)
> > > [1] TRUE  ## Hence all observations have the same number of
> > fields, so far,
> > > so good!
> > >
> > > FMD.CR.test.classes<- c("factor", "character", "factor", "factor",
> > > "factor",
> > >                           "factor", "factor", "factor",
> > "factor", "factor",
> > >                           "factor", "numeric", "character",
> > and so on)
> > > names(FMD.CR.test.classes)<- c("RTA","police", "mnt.rent.bnct",
> > >                           "mnt.rent.boni", "mnt.rent.cred.bnct",
> > >                           "mnt.rent.epar.bnct", "mnt.rent.snbn",
> > >                           "mnt.rent.trxl", "solde.eop",
> > "solde.nenr.es",
> > >                           "solde.enr.es", "num.enreg",
> > "trouve", and so on)
> > > FMD.CR.test<-
> > >      read.table(file =
> > "/home/jeg002/splus/R/Exemples/R/FMD-CR-test.csv",
> > >                 header = TRUE, sep = "@", quote = "\"",
> > as.is = FALSE,
> > >                 strip.white = FALSE, comment.char = "",
> > na.strings = "NA",
> > >                 nrows = 65000, colClasses = FMD.CR.test.classes)
> > > dim(FMD.CR.test)
> > > [1] 64093   327  ## OK
> > >
> > > ### Testing if classes are the same as the Splus classes.
> > >
> > > FMD.CR.test.R.classes<- apply(FMD.CR.test, 2, FUN = class)
> > > sum(FMD.CR.test.R.classes == FMD.CR.test.classes)
> > > [1] 79  ## Not exactly what I was expecting!
> > > all(FMD.CR.test.R.classes == "character")
> > > [1] TRUE
> > >
> > > Hence all variables were imported as character, which I find very
> > > inconvenient; since the data set has a few hundred factor variables
> > > recoding them is a lot of work, this work has already been
> > done in Splus;
> > > furthermore, the numeric variables would need conversion as well.
> > >
> > > I tried all combinations of the arguments "as.is",
> > "stringsAsFactors" and
> > > "colClasses" to no avail.  I also tried to export the data
> > set in SAS
> > > transport format from Splus and read it through the
> > foreign's read.xport
> > > function, always the same result, everything is imported as
> > character.  I
> > > search the r-help archives, I found several messages
> > relating this problem
> > > but no satisfactory solution!
> > >
> > > I am a long time user of Splus and I am planning to use R
> > more often,
> > > mainly due to its wealth of packages and the convenience of
> > installing
> > > them.  I hope to find a reliable and convivial way of
> > transferring data
> > > between the two cousins pieces of software.
> > >
> > > Thanks for any insights,
> > >
> > > Gérald Jean
> > > Conseiller senior en statistiques,
> > > VP Planification et Développement des Marchés,
> > > Desjardins Groupe d'Assurances Générales
> > > télephone            : (418) 835-4900 poste (7639)
> > > télecopieur          : (418) 835-6657
> > > courrier électronique: gerald.jean at dgag.ca
> > >
> > > "In God we trust, all others must bring data"  W. Edwards Deming
> > >
> > >
> > >
> > >
> > >
> > > Le message ci-dessus, ainsi que les documents
> > l'accompagnant, sont destinés
> > > uniquement aux personnes identifiées et peuvent contenir
> > des informations
> > > privilégiées, confidentielles ou ne pouvant être
> > divulguées. Si vous avez
> > > reçu ce message par erreur, veuillez le détruire.
> > >
> > > This communication ( and/or the attachments ) is intended for named
> > > recipients only and may contain privileged or confidential
> > information
> > > which is not to be disclosed. If you received this
> > communication by mistake
> > > please destroy all copies.
> > >
> > >
> > >
> > >
> > > Faites bonne impression et imprimez seulement au besoin !
> > > Think green before you print !
> > >
> > > Le message ci-dessus, ainsi que les documents
> > l'accompagnant, sont destinés uniquement aux personnes
> > identifiées et peuvent contenir des informations
> > privilégiées, confidentielles ou ne pouvant être divulguées.
> > Si vous avez reçu ce message par erreur, veuillez le détruire.
> > >
> > > This communication (and/or the attachments) is intended for
> > named recipients only and may contain privileged or
> > confidential information which is not to be disclosed. If you
> > received this communication by mistake please destroy all copies.
> > >
> > > ______________________________________________
> > > R-help at r-project.org mailing list
> > > https://stat.ethz.ch/mailman/listinfo/r-help
> > > PLEASE do read the posting guide
> > http://www.R-project.org/posting-guide.html
> > > and provide commented, minimal, self-contained, reproducible code.
> >
> > ______________________________________________
> > R-help at r-project.org mailing list
> > https://stat.ethz.ch/mailman/listinfo/r-help
> > PLEASE do read the posting guide
> > http://www.R-project.org/posting-guide.html
> > and provide commented, minimal, self-contained, reproducible code.
> >



Le message ci-dessus, ainsi que les documents l'accompagnant, sont destinés
uniquement aux personnes identifiées et peuvent contenir des informations
privilégiées, confidentielles ou ne pouvant être divulguées. Si vous avez
reçu ce message par erreur, veuillez le détruire.

This communication ( and/or the attachments ) is intended for named
recipients only and may contain privileged or confidential information
which is not to be disclosed. If you received this communication by mistake
please destroy all copies.




Faites bonne impression et imprimez seulement au besoin !
Think green before you print !

Le message ci-dessus, ainsi que les documents l'accompagnant, sont destinés uniquement aux personnes identifiées et peuvent contenir des informations privilégiées, confidentielles ou ne pouvant être divulguées. Si vous avez reçu ce message par erreur, veuillez le détruire.

This communication (and/or the attachments) is intended for named recipients only and may contain privileged or confidential information which is not to be disclosed. If you received this communication by mistake please destroy all copies.



More information about the R-help mailing list