[R] Process substitution and read.table/scan
grassi.e at gmail.com
Wed Apr 3 14:19:27 CEST 2013
Hello, I did the same question on stackoverflow
did not understand completely the issue so I'm reporting it here:
I've given a look around about what puzzles me and I only found this:
which is partially helping, but I really would like to understand the
full story. I noticed that some of my R scripts give different (ie.
wrong) results when I use process substitution.
I tried to pinpoint the problem with a test case:
args <- commandArgs(TRUE)
data <- read.table(file, header=F)
with an input file generated in this way:
$ for i in `seq 1 10`; do echo $i >> p; done
$ for i in `seq 1 500`; do cat p >> test; done
leads me to this:
$ ./mean.R test
$ ./mean.R <(cat test)
Further tests reveal that some lines are lost...but I would like to
understand why. Does read.table (scan gives the same results) uses
Ps. with a smaller test file (100) an error is reported:
$./mean.R <(cat test3)
Error in read.table(file, header = F) : no lines available in input
Other notes: with a modified script that uses scan the results are the same.
Printing the whole data.frame results in 5001 lines in the first case
(which is correct) and only 3050 with the process redirection.
I checked read.table source code and I saw that it goes around in the
file to check for column types and so on...I thought that this was an
explanation for this problem but I would prefer an error message
reported instead than a result gotten from partial data...then someone
on stackoverflow pointed me to fifo() which solves the problem (i.e
the mean is reported correctly even with the process redirection) and
therefore I'm even more puzzled: does fifo() allows seeks and peeks
around a named pipe?
I'm willing to read the relevant code to understand what's really
happening (and even help if someone thinks that this issue could
represent a small bug) but I would really appreciate some pointers.
Here the sessionInfo() and other possibly relevant things:
R version 3.0.0 beta (2013-03-23 r62384)
Platform: x86_64-pc-linux-gnu (64-bit)
 LC_CTYPE=en_US.utf8 LC_NUMERIC=C
 LC_TIME=en_US.utf8 LC_COLLATE=en_US.utf8
 LC_MONETARY=en_US.utf8 LC_MESSAGES=en_US.utf8
 LC_PAPER=C LC_NAME=C
 LC_ADDRESS=C LC_TELEPHONE=C
 LC_MEASUREMENT=en_US.utf8 LC_IDENTIFICATION=C
attached base packages:
 stats graphics grDevices utils datasets methods base
$ uname -a
Linux femto 3.6-trunk-amd64 #1 SMP Debian 3.6.9-1~experimental.1
I use the debian R package: r-base-core, 3.0.0~20130324-1
I started on R-help as long as this could be of general interest,
sorry if that's a bad call.
More information about the R-help