[R] Using write.csv as a connection for read.csv

Kevin Thorpe kev|n@thorpe @end|ng |rom utoronto@c@
Mon Jul 9 19:14:20 CEST 2018


Thanks Jeff and all others.

I will need to use the tempfile route I guess (I'm running in a Linux OS) for the time-being.

After I re-loaded the data frames that were broken before and they seemed fine, after using them for awhile they broke again.

I am trying to build my analysis with rmarkdown and tools. I have not been able to determine (yet) exactly what set of interactions are "breaking" things. I certainly don't expect the list to debug everything I'm doing.

The only thing is can say is that there appears to be some weird interaction between SAS data sets imported by haven and other packages. Note that I encountered (I think) related issues with an imported data set when I tried working with it in the tidyverse.

Maybe I'm getting too old to learn new stuff. :-)

Sorry I am not being much help with my own problem. I just have not been able to determine where things break. If can come up with a reproducible example that reliably breaks, I'll post it.

Kevin
  
 
--
 Kevin E. Thorpe
 Head of Biostatistics,  Applied Health Research Centre (AHRC)
 Li Ka Shing Knowledge Institute of St. Michael's
 Assistant Professor, Dalla Lana School of Public Health
 University of Toronto
 email: kevin.thorpe using utoronto.ca  Tel: 416.864.5776  Fax: 416.864.3016
 
     



From: Jeff Newmiller <jdnewmil using dcn.davis.ca.us>
Sent: Monday, July 9, 2018 1:01 PM
To: r-help using r-project.org; Kevin Thorpe; R Help Mailing List
Subject: Re: [R] Using write.csv as a connection for read.csv
  

TL;DR: If you want to do this, go ahead and use a temporary file or text connection.

Others have pointed out that write.csv returns NULL rather than a file connection, but I haven't seen comments on your impulse to avoid the use of files.

*nix operating systems are admirably efficient with multitasking... such that shells can efficiently run multiple programs connected by pipes, pausing the producers to pause if they get ahead of the consumers and resuming them if the consumers run out of data,  thus minimizing the amount of temporary disk space usage.

R does not presume this to be among the fundamental capabilities of the operating system, rather assuming single tasking capability by default. This means that even if you do connect write.csv to a pipe then it will run to completion before read.csv gets a  chance to process any of the data. MSDOS used to simulate command line program chaining by writing all the data to a temporary file before running the consumer program. R is similar... and like MSDOS there is little reason to avoid temporary files in R.

set.seed( 42 )
DF <- data.frame( X=1:100, Y=rnorm( 100 ) )
frame <- tempfile()
write.csv( DF, file=fname, row.names=FALSE )
DF2 <- read.csv( file=fname )
all.equal( DF$X, DF2$X ) && all.equal( DF$Y, DF2$Y )
unlink( fname )


On July 9, 2018 7:42:00 AM PDT, Kevin Thorpe <kevin.thorpe using utoronto.ca> wrote:
>Hi.
>
>I have some data frames I created previously that seem to not be
>working correctly anymore. I *think* the problem is that some of the
>variables in the data frame are of a type called labelled. There are
>other attributes in the data frame as well. I thought that the easiest
>way to fix this was to convert to, say a csv and re-load.
>
>I tried something like read.csv(write.csv(df,row.names=FALSE)) but got
>the error
>
>Error in read.table(file = file, header = header, sep = sep, quote =
>quote,  : 
>  'file' must be a character string or connection
>
>I guess there must be a way to send the output of write.csv to a
>connection that read.csv can use but I was mystified by the help page
>on connections, at least I could not determine how to achieve my
>desired result.
>
>I realize I could write to a file and read it back in, but that feels
>klunky somehow. Maybe my approach to convert my data to strip the
>"weird" stuff is wrong-headed and I would accept alternative
>strategies.
>
>I would like a more general solution to fix this because I expect to
>encounter it some more. For those wondering how I found myself in such
>a mess, the data frames were initially imported from SAS data sets
>through the haven package. I then did some standard manipulation and
>added some additional labels with the upData() function from Hmisc
>(both packages have been updated since initial creation of the data
>frames).
>
>Thanks,
>
>Kevin
> 
>--
> Kevin E. Thorpe
> Head of Biostatistics,  Applied Health Research Centre (AHRC)
> Li Ka Shing Knowledge Institute of St. Michael's
> Assistant Professor, Dalla Lana School of Public Health
> University of Toronto
> email: kevin.thorpe using utoronto.ca  Tel: 416.864.5776  Fax: 416.864.3016
> 
>     
>______________________________________________
>R-help using r-project.org mailing list -- To UNSUBSCRIBE and more, see
>https://stat.ethz.ch/mailman/listinfo/r-help
>PLEASE do read the posting guide
>http://www.R-project.org/posting-guide.html
>and provide commented, minimal, self-contained, reproducible code.

-- 
Sent from my phone. Please excuse my brevity.
    



More information about the R-help mailing list