[Rd] writeForeignSAS and potential extensions
Stephen D. Weigand
weigand.stephen at charter.net
Fri Jul 14 07:00:17 CEST 2006
Sorry, looks like my work e-mailer put the attachments
in the body. Please e-mail weigand at mayo.edu if interested
and I'll send you a copy of the files.
I think it'll also work to grab the files from:
ftp://mayoftp:''@ftp.mayo.edu/pub/weigand/writeForeignSAS7.R
ftp://mayoftp:''@ftp.mayo.edu/pub/weigand/diff.txt
Thank you,
Stephen
On Jul 13, 2006, at 1:48 PM, Stephen Weigand wrote:
> Dear R-devel,
>
> I've made some potential extensions to writeForeignSAS
> in 'foreign' that I wanted to pass along if anyone is
> interested. I've attached the diff -u output against
> the version found in foreign_0.8-15 and an .R file
> with my changes. (In this .R file, the function is named
> writeForeignSAS7 to simplify testing/comparisons.)
>
> I've tried to alter the current version as little as
> possible while making the following changes:
>
> * Try to convert data.frame names to SAS-legal names and
> allow the user to specify an 8- or 32-character limit.
>
> * For factors, try to convert the variable name to a
> SAS-legal 8-character name not ending in a digit
>
> * Read in 'datafile' with DSD specified in the INFILE
> statement. SAS says this "changes how SAS treats
> delimiters when list input is used and sets the default
> delimiter to a comma. When you specify DSD, SAS treats
> two consecutive delimiters as a missing value and
> removes quotation marks from character values." The
> point of this is the added safety of using 'quote=TRUE'
> when writing 'datafile' via write.table
>
> * Functionality to write out Dates and read them in with
> an INFORMAT statement
>
> * Functionality to write out datetime variables
> (assuming a class of POSIXct) and read them in with an
> INFORMAT statement
>
> * In order to handle character variables a bit better,
> use a LENGTH statement to tell SAS the maximum character
> width of values in the variable. Without this, some
> character values can be truncated.
>
> If it'd be helpful to make any changes or add anything,
> I'd be happy try to do so.
>
> Finally, some testing code that works in SAS 6.12, 8.2,
> and 9.
>
> d <-
> structure(list(a.b = as.integer(c(1, 2)),
> alphabetsoup =
> structure(as.integer(c(1, 2)),
> .Label = c("A", "B"),
> class = "factor"),
> datevar1 = structure(c(13342, 12977),
> class = "Date"),
> datetimevar1 = structure(c(1152802685,
> 1152716285),
> class = c("POSIXt", "POSIXct")),
> charactervariable = c("L",
> "Last, First")),
> .Names = c("a.b", "alphabetsoup",
> "datevar1", "datetimevar1",
> "charactervariable"),
> row.names = c("1", "2"),
> class = "data.frame")
>
> require(foreign)
>
> ### adQuote here to (temporarily) avoid ':::'
> adQuote <- function (x) paste("\"", x, "\"", sep = "")
>
> dfile <- file.path(tempdir(), "test.dat")
> cfile <- file.path(tempdir(), "test.sas")
> write.foreign(d, datafile = dfile, codefile = cfile,
> package = "SAS7", validvarname = "V6")
> file.show(dfile)
> file.show(cfile)
>
> Sincerely,
>
> Stephen
>
> ::::::::::::::::::::::::::::::::::
> Stephen Weigand
> Division of Biostatistics
> Mayo Clinic Rochester, Minn., USA
> Phone (507) 266-1650, fax 284-9542
> --- writeForeignSAS.R Fri Feb 17 03:30:53 2006
> +++ /tmp/writeForeignSAS.R Thu Jul 13 12:24:24 2006
> @@ -1,21 +1,52 @@
> -writeForeignSAS<-function(df,datafile,codefile,dataname="rdata"){
> +make.SAS.names <- function(varnames, validvarname = c("V7", "V6")){
> + validvarname <- match.arg(validvarname)
> + nmax <- if(validvarname == "V7") 32 else 8
>
> + x <- sub("^([0-9])", "_\\1", varnames)
> + x <- gsub("[^a-zA-Z0-9_]", "_", x)
> + x <- abbreviate(x, minlength = nmax)
> +
> + if (any(nchar(x) > nmax) || any(duplicated(x)))
> + stop("Cannot uniquely abbreviate the variable names to ",
> + nmax, " or fewer characters")
> + names(x) <- varnames
> + x
> +}
> +
> +make.SAS.formats <- function(varnames){
> + x <- sub("^([0-9])", "_\\1", varnames)
> + x <- gsub("[^a-zA-Z0-9_]", "_", x)
> + x <- sub("([0-9])$", "\\1f", x) # can't end in digit so append 'f'
> + x <- abbreviate(x, minlength = 8)
> +
> + if(any(nchar(x) > 8) || any(duplicated(x)))
> + stop("Cannot uniquely abbreviate format names to conform to ",
> + " eight-character limit and not ending in a digit")
> + names(x) <- varnames
> + x
> +}
> +
> +writeForeignSAS7<-function(df,datafile,codefile,dataname="rdata",
> + validvarname = c("V7", "V6")){
> factors <- sapply(df, is.factor)
> strings <- sapply(df, is.character)
> -
> + dates <- sapply(df, FUN = function(x) inherits(x, "Date"))
> + datetimes <- sapply(df, FUN = function(x) inherits(x, "POSIXct"))
> +
> varlabels <- names(df)
> - varnames <- abbreviate(names(df), 8)
> - if (any(sapply(varnames, nchar) > 8))
> - stop("Cannot abbreviate the variable names to eight or fewer
> letters")
> - if (any(abbreviated <- (varnames != varlabels)))
> - message("Some variable names were abbreviated.")
> + varnames <- make.SAS.names(names(df), validvarname = validvarname)
> + if (any(varnames != varlabels))
> + message("Some variable names were abbreviated or otherwise
> altered.")
>
>
> dfn<-df
> if (any(factors))
> dfn[factors]<-lapply(dfn[factors], as.numeric)
> + if (any(datetimes))
> + dfn[datetimes] <- lapply(dfn[datetimes],
> + FUN = function(x) format(x, "%d%b%Y
> %H:%M:%S"))
> write.table(dfn, file = datafile, row = FALSE, col = FALSE,
> - sep = ",", quote = FALSE, na = ".")
> + sep = ",", quote = TRUE, na = "")
> lrecl<-max(sapply(readLines(datafile),nchar))+4
>
> cat("* Written by R;\n", file=codefile)
> @@ -22,24 +53,50 @@
> cat("* ",deparse(sys.call(-2))[1],";\n\n",file=codefile,append=TRUE)
> if (any(factors)){
> cat("PROC FORMAT;\n",file=codefile,append=TRUE)
> - for(v in 1:ncol(df)){
> - if (factors[v]){
> - cat("value ",varnames[v],"\n",file=codefile,append=TRUE)
> - values<-levels(df[[v]])
> + fmtnames <- make.SAS.formats(varnames[factors])
> + fmt.values <- lapply(df[, factors, drop = FALSE], levels)
> + names(fmt.values) <- fmtnames
> + for (f in fmtnames){
> + cat("value",f,"\n",file=codefile,append = TRUE)
> + values<-fmt.values[[f]]
> for(i in 1:length(values)){
> cat("
> ",i,"=",adQuote(values[i]),"\n",file=codefile,append=TRUE)
> }
> cat(";\n\n",file=codefile,append=TRUE)
> - }
> - }
> + }
> }
>
> cat("DATA ",dataname,";\n",file=codefile,append=TRUE)
> +
> + if (any(strings)){
> + cat("LENGTH", file = codefile, append = TRUE)
> + lengths <- sapply(df[,strings, drop = FALSE],
> + FUN = function(x) max(nchar(x)))
> + names(lengths) <- varnames[strings]
> + for(v in varnames[strings])
> + cat("\n", v, "$", lengths[v],file=codefile,append=TRUE)
> + cat("\n;\n\n", file = codefile, append = TRUE)
> + }
> +
> + if (any(dates)){
> + cat("INFORMAT", file = codefile, append = TRUE)
> + for(v in varnames[dates])
> + cat("\n", v, file = codefile, append = TRUE)
> + cat("\n YYMMDD10.\n;\n\n", file = codefile, append = TRUE)
> + }
> +
> + if (any(datetimes)){
> + cat("INFORMAT", file = codefile, append = TRUE)
> + for(v in varnames[datetimes])
> + cat("\n", v, file = codefile, append = TRUE)
> + cat("\n DATETIME18.\n;\n\n", file = codefile, append = TRUE)
> + }
> +
> cat("INFILE ",adQuote(datafile),
> - "\n DELIMITER=','",
> + "\n DSD",
> "\n LRECL=",lrecl,";\n",
> file=codefile,append=TRUE)
> -
> +
> cat("INPUT",file=codefile,append=TRUE)
> for(v in 1:ncol(df)){
> cat("\n",varnames[v],file=codefile,append=TRUE)
> @@ -49,16 +106,26 @@
> cat("\n;\n",file=codefile,append=TRUE)
>
> for(v in 1:ncol(df)){
> - if (abbreviated[v])
> + if (varnames[v] != names(varnames)[v])
> cat("LABEL ",varnames[v],"=",adQuote(varlabels[v]),";\n",
> file=codefile,append=TRUE)
> - }
> -
> - for(v in 1:ncol(df)){
> - if(factors[v])
> - cat("FORMAT ",varnames[v],paste(varnames[v],".",sep=""),";\n",
> + }
> +
> + if (any(factors)){
> + for (f in 1:length(fmtnames))
> + cat("FORMAT", names(fmtnames)[f],paste(fmtnames[f],".",sep =
> ""),";\n",
> file=codefile,append=TRUE)
> }
> -
> +
> + if (any(dates)){
> + for(v in varnames[dates])
> + cat("FORMAT", v, "yymmdd10.;\n", file = codefile, append = TRUE)
> + }
> +
> + if (any(datetimes)){
> + for(v in varnames[datetimes])
> + cat("FORMAT", v, "datetime18.;\n", file = codefile, append =
> TRUE)
> + }
> +
> cat("RUN;\n",file=codefile,append=TRUE)
> }
> make.SAS.names <- function(varnames, validvarname = c("V7", "V6")){
> validvarname <- match.arg(validvarname)
> nmax <- if(validvarname == "V7") 32 else 8
>
> x <- sub("^([0-9])", "_\\1", varnames)
> x <- gsub("[^a-zA-Z0-9_]", "_", x)
> x <- abbreviate(x, minlength = nmax)
>
> if (any(nchar(x) > nmax) || any(duplicated(x)))
> stop("Cannot uniquely abbreviate the variable names to ",
> nmax, " or fewer characters")
> names(x) <- varnames
> x
> }
>
> make.SAS.formats <- function(varnames){
> x <- sub("^([0-9])", "_\\1", varnames)
> x <- gsub("[^a-zA-Z0-9_]", "_", x)
> x <- sub("([0-9])$", "\\1f", x) # can't end in digit so append 'f'
> x <- abbreviate(x, minlength = 8)
>
> if(any(nchar(x) > 8) || any(duplicated(x)))
> stop("Cannot uniquely abbreviate format names to conform to ",
> " eight-character limit and not ending in a digit")
> names(x) <- varnames
> x
> }
>
> writeForeignSAS7<-function(df,datafile,codefile,dataname="rdata",
> validvarname = c("V7", "V6")){
> factors <- sapply(df, is.factor)
> strings <- sapply(df, is.character)
> dates <- sapply(df, FUN = function(x) inherits(x, "Date"))
> datetimes <- sapply(df, FUN = function(x) inherits(x, "POSIXct"))
>
> varlabels <- names(df)
> varnames <- make.SAS.names(names(df), validvarname = validvarname)
> if (any(varnames != varlabels))
> message("Some variable names were abbreviated or otherwise
> altered.")
>
>
> dfn<-df
> if (any(factors))
> dfn[factors]<-lapply(dfn[factors], as.numeric)
> if (any(datetimes))
> dfn[datetimes] <- lapply(dfn[datetimes],
> FUN = function(x) format(x, "%d%b%Y
> %H:%M:%S"))
> write.table(dfn, file = datafile, row = FALSE, col = FALSE,
> sep = ",", quote = TRUE, na = "")
> lrecl<-max(sapply(readLines(datafile),nchar))+4
>
> cat("* Written by R;\n", file=codefile)
> cat("* ",deparse(sys.call(-2))[1],";\n\n",file=codefile,append=TRUE)
> if (any(factors)){
> cat("PROC FORMAT;\n",file=codefile,append=TRUE)
> fmtnames <- make.SAS.formats(varnames[factors])
> fmt.values <- lapply(df[, factors, drop = FALSE], levels)
> names(fmt.values) <- fmtnames
> for (f in fmtnames){
> cat("value",f,"\n",file=codefile,append = TRUE)
> values<-fmt.values[[f]]
> for(i in 1:length(values)){
> cat("
> ",i,"=",adQuote(values[i]),"\n",file=codefile,append=TRUE)
> }
> cat(";\n\n",file=codefile,append=TRUE)
> }
> }
>
> cat("DATA ",dataname,";\n",file=codefile,append=TRUE)
>
> if (any(strings)){
> cat("LENGTH", file = codefile, append = TRUE)
> lengths <- sapply(df[,strings, drop = FALSE],
> FUN = function(x) max(nchar(x)))
> names(lengths) <- varnames[strings]
> for(v in varnames[strings])
> cat("\n", v, "$", lengths[v],file=codefile,append=TRUE)
> cat("\n;\n\n", file = codefile, append = TRUE)
> }
>
> if (any(dates)){
> cat("INFORMAT", file = codefile, append = TRUE)
> for(v in varnames[dates])
> cat("\n", v, file = codefile, append = TRUE)
> cat("\n YYMMDD10.\n;\n\n", file = codefile, append = TRUE)
> }
>
> if (any(datetimes)){
> cat("INFORMAT", file = codefile, append = TRUE)
> for(v in varnames[datetimes])
> cat("\n", v, file = codefile, append = TRUE)
> cat("\n DATETIME18.\n;\n\n", file = codefile, append = TRUE)
> }
>
> cat("INFILE ",adQuote(datafile),
> "\n DSD",
> "\n LRECL=",lrecl,";\n",
> file=codefile,append=TRUE)
>
> cat("INPUT",file=codefile,append=TRUE)
> for(v in 1:ncol(df)){
> cat("\n",varnames[v],file=codefile,append=TRUE)
> if(strings[v])
> cat(" $ ",file=codefile,append=TRUE)
> }
> cat("\n;\n",file=codefile,append=TRUE)
>
> for(v in 1:ncol(df)){
> if (varnames[v] != names(varnames)[v])
> cat("LABEL ",varnames[v],"=",adQuote(varlabels[v]),";\n",
> file=codefile,append=TRUE)
> }
>
> if (any(factors)){
> for (f in 1:length(fmtnames))
> cat("FORMAT", names(fmtnames)[f],paste(fmtnames[f],".",sep =
> ""),";\n",
> file=codefile,append=TRUE)
> }
>
> if (any(dates)){
> for(v in varnames[dates])
> cat("FORMAT", v, "yymmdd10.;\n", file = codefile, append = TRUE)
> }
>
> if (any(datetimes)){
> for(v in varnames[datetimes])
> cat("FORMAT", v, "datetime18.;\n", file = codefile, append =
> TRUE)
> }
>
> cat("RUN;\n",file=codefile,append=TRUE)
> }
> ______________________________________________
> R-devel at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel
More information about the R-devel
mailing list