[R] Importing data coming from Splus into R.
gerald.jean at dgag.ca
gerald.jean at dgag.ca
Fri Feb 5 18:34:08 CET 2010
Uwe Ligges <ligges at statistik.tu-dortmund.de> a écrit sur 2010/02/05
11:04:44 :
> 1. I am stuck with a copy of S-PLUS 4.x. At that time I used dump() in
> S-PLUS and source() to get things into R afterwards ...
>
> 2. Why do you think that 32-bit vs. 64-bit issues matter? The file
> format does not change (well, this is guessed since I do not have any
> 64-bit S-PLUS version available).
The "R-data_ImportExport" manual says:
Function read.S which can read binary objects produced by S-PLUS 3.x, 4.x
or 2000 on
(32-bit) Unix or Windows (and can read them on a different OS). This is
able to read many but
not all S objects: in particular it can read vectors, matrices and data
frames and lists containing
those.
Function data.restore reads S-PLUS data dumps (created by data.dump) with
the same
restrictions (except that dumps from the Alpha platform can also be read).
It should be possible
to read data dumps from S-PLUS 5.x and later written with
data.dump(oldStyle=T).
Following Richard Heiberger suggestion I also trie "dput" in Splus and
"dget" in R, works but all columns are imported as character, same thing
with "dump" from Splus and "source" from R.
If I get this right there is no easy way of going back and forth between
the cousins while preserving the structure of the data, too bad!
Thanks for your interest in my problem,
Gérald Jean
Conseiller senior en statistiques,
VP Planification et Développement des Marchés,
Desjardins Groupe d'Assurances Générales
télephone : (418) 835-4900 poste (7639)
télecopieur : (418) 835-6657
courrier électronique: gerald.jean at dgag.ca
"In God we trust, all others must bring data" W. Edwards Deming
>
> Best,
> Uwe Ligges
>
>
> On 05.02.2010 16:35, gerald.jean at dgag.ca wrote:
> >
> > Hello there,
> >
> > I spent all day yesterday trying to get a small data set from Splus
into R,
> > no luck! Both, Splus and R, are run on a 64-bit RedHat Linux machine,
the
> > versions of the softwares are 64-bit and are as what follows:
> >
> > Splus:
> > TIBCO Software Inc. Confidential Information
> > Copyright (c) 1988-2008 TIBCO Software Inc. ALL RIGHTS RESERVED.
> > TIBCO Spotfire S+ Version 8.1.1 for Linux 2.6.9-34.EL, 64-bit : 2008
> >
> > R:
> > R version 2.8.0 (2008-10-20)
> > Copyright (C) 2008 The R Foundation for Statistical Computing
> > ISBN 3-900051-07-0
> >
> > I know that the "foreign" package has a function to directly import
Splus
> > data sets into R, but I also know that it is working only for 32-bit
> > versions of the softwares, hence I didn't try that route. Here is what
I
> > have done:
> >
> > In Splus:
> >
> > ttt<- exportData(data = FMD.CR.test,
> > file =
"/home/jeg002/splus/R/Exemples/R/FMD-CR-test.csv",
> > type = "ASCII", delimiter = "@", quote = T,
na.string =
> > "NA")
> > ttt.class<- unlist(lapply(FMD.CR.test, class))
> >
> > ### I am using "@" as delimiter since some factor levels contain both
the
> > "," and the ";".
> >
> > In R:
> >
> > FMD.CR.test.fields<- count.fields(file =
> > "/home/jeg002/splus/R/Exemples/R/FMD-CR-test.csv",
> > sep = "@", quote = "\"",
comment.char =
> > "")
> > all(FMD.CR.test.fields == 327)
> > [1] TRUE ## Hence all observations have the same number of fields, so
far,
> > so good!
> >
> > FMD.CR.test.classes<- c("factor", "character", "factor", "factor",
> > "factor",
> > "factor", "factor", "factor", "factor",
"factor",
> > "factor", "numeric", "character", and so on)
> > names(FMD.CR.test.classes)<- c("RTA","police", "mnt.rent.bnct",
> > "mnt.rent.boni", "mnt.rent.cred.bnct",
> > "mnt.rent.epar.bnct", "mnt.rent.snbn",
> > "mnt.rent.trxl", "solde.eop",
"solde.nenr.es",
> > "solde.enr.es", "num.enreg", "trouve", and so
on)
> > FMD.CR.test<-
> > read.table(file =
"/home/jeg002/splus/R/Exemples/R/FMD-CR-test.csv",
> > header = TRUE, sep = "@", quote = "\"", as.is = FALSE,
> > strip.white = FALSE, comment.char = "", na.strings =
"NA",
> > nrows = 65000, colClasses = FMD.CR.test.classes)
> > dim(FMD.CR.test)
> > [1] 64093 327 ## OK
> >
> > ### Testing if classes are the same as the Splus classes.
> >
> > FMD.CR.test.R.classes<- apply(FMD.CR.test, 2, FUN = class)
> > sum(FMD.CR.test.R.classes == FMD.CR.test.classes)
> > [1] 79 ## Not exactly what I was expecting!
> > all(FMD.CR.test.R.classes == "character")
> > [1] TRUE
> >
> > Hence all variables were imported as character, which I find very
> > inconvenient; since the data set has a few hundred factor variables
> > recoding them is a lot of work, this work has already been done in
Splus;
> > furthermore, the numeric variables would need conversion as well.
> >
> > I tried all combinations of the arguments "as.is", "stringsAsFactors"
and
> > "colClasses" to no avail. I also tried to export the data set in SAS
> > transport format from Splus and read it through the foreign's
read.xport
> > function, always the same result, everything is imported as character.
I
> > search the r-help archives, I found several messages relating this
problem
> > but no satisfactory solution!
> >
> > I am a long time user of Splus and I am planning to use R more often,
> > mainly due to its wealth of packages and the convenience of installing
> > them. I hope to find a reliable and convivial way of transferring data
> > between the two cousins pieces of software.
> >
> > Thanks for any insights,
> >
> > Gérald Jean
> > Conseiller senior en statistiques,
> > VP Planification et Développement des Marchés,
> > Desjardins Groupe d'Assurances Générales
> > télephone : (418) 835-4900 poste (7639)
> > télecopieur : (418) 835-6657
> > courrier électronique: gerald.jean at dgag.ca
> >
> > "In God we trust, all others must bring data" W. Edwards Deming
> >
> >
> >
> >
> >
> > Le message ci-dessus, ainsi que les documents l'accompagnant, sont
destinés
> > uniquement aux personnes identifiées et peuvent contenir des
informations
> > privilégiées, confidentielles ou ne pouvant être divulguées. Si vous
avez
> > reçu ce message par erreur, veuillez le détruire.
> >
> > This communication ( and/or the attachments ) is intended for named
> > recipients only and may contain privileged or confidential information
> > which is not to be disclosed. If you received this communication by
mistake
> > please destroy all copies.
> >
> >
> >
> >
> > Faites bonne impression et imprimez seulement au besoin !
> > Think green before you print !
> >
> > Le message ci-dessus, ainsi que les documents l'accompagnant, sont
> destinés uniquement aux personnes identifiées et peuvent contenir
> des informations privilégiées, confidentielles ou ne pouvant être
> divulguées. Si vous avez reçu ce message par erreur, veuillez le
détruire.
> >
> > This communication (and/or the attachments) is intended for named
> recipients only and may contain privileged or confidential
> information which is not to be disclosed. If you received this
> communication by mistake please destroy all copies.
> >
> > ______________________________________________
> > R-help at r-project.org mailing list
> > https://stat.ethz.ch/mailman/listinfo/r-help
> > PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
> > and provide commented, minimal, self-contained, reproducible code.
Le message ci-dessus, ainsi que les documents l'accompagnant, sont destinés
uniquement aux personnes identifiées et peuvent contenir des informations
privilégiées, confidentielles ou ne pouvant être divulguées. Si vous avez
reçu ce message par erreur, veuillez le détruire.
This communication ( and/or the attachments ) is intended for named
recipients only and may contain privileged or confidential information
which is not to be disclosed. If you received this communication by mistake
please destroy all copies.
Faites bonne impression et imprimez seulement au besoin !
Think green before you print !
Le message ci-dessus, ainsi que les documents l'accompagnant, sont destinés uniquement aux personnes identifiées et peuvent contenir des informations privilégiées, confidentielles ou ne pouvant être divulguées. Si vous avez reçu ce message par erreur, veuillez le détruire.
This communication (and/or the attachments) is intended for named recipients only and may contain privileged or confidential information which is not to be disclosed. If you received this communication by mistake please destroy all copies.
More information about the R-help
mailing list