Colleagues,

Using R2.7.0 in OS X, I am having trouble understanding the command  
textConnection.  My situation is as follows:
1.  I am trying to read a lengthy file (45000 lines) that has headers  
~ every 1000 lines.  read.table (or its variants) fail because of the  
recurrent headers.
2.  My present approach is the following:
	a.  use readLines to read the file, save as an array
	b.  use grep to find the recurrent headers (not including the first  
set)
	c.  delete the recurrent headers from the array
	d.  write the array to a temp file
	e.  read the temp file using read.table
	f.   delete the temp file
3.  My understanding is to textConnection might enable me to replace  
steps d-f with a single step akin to  
read.table(textConnection(array)).  This appears to work but it is  
very slow.  I executed code on successively larger chunks of the array:
for (Each in 1000 * 1:45)
	{
	cat("N lines =", Each, "\t", date(), "\n")
	A <- read.table(textConnection(Z[1:Each]), header=T)
	}
yielding:
N lines = 1000 	 Sun Oct 12 07:09:48 2008
N lines = 2000 	 Sun Oct 12 07:09:48 2008
N lines = 3000 	 Sun Oct 12 07:09:48 2008
N lines = 4000 	 Sun Oct 12 07:09:50 2008
N lines = 5000 	 Sun Oct 12 07:09:52 2008
N lines = 6000 	 Sun Oct 12 07:09:56 2008
N lines = 7000 	 Sun Oct 12 07:10:01 2008
N lines = 8000 	 Sun Oct 12 07:10:09 2008
N lines = 9000 	 Sun Oct 12 07:10:18 2008
N lines = 10000 	 Sun Oct 12 07:10:31 2008
N lines = 11000 	 Sun Oct 12 07:10:46 2008
N lines = 12000 	 Sun Oct 12 07:11:04 2008
N lines = 13000 	 Sun Oct 12 07:11:25 2008
N lines = 14000 	 Sun Oct 12 07:11:51 2008
N lines = 15000 	 Sun Oct 12 07:12:20 2008
N lines = 16000 	 Sun Oct 12 07:12:54 2008
N lines = 17000 	 Sun Oct 12 07:13:32 2008
N lines = 18000 	 Sun Oct 12 07:14:16 2008
N lines = 19000 	 Sun Oct 12 07:15:04 2008
N lines = 20000 	 Sun Oct 12 07:15:58 2008
N lines = 21000 	 Sun Oct 12 07:16:58 2008
N lines = 22000 	 Sun Oct 12 07:18:04 2008
N lines = 23000 	 Sun Oct 12 07:19:17 2008
N lines = 24000 	 Sun Oct 12 07:20:36 2008
N lines = 25000 	 Sun Oct 12 07:22:02 2008
N lines = 26000 	 Sun Oct 12 07:23:36 2008

Any clever ideas will be greatly appreciated.

Dennis


Dennis Fisher MD
P < (The "P Less Than" Company)
Phone: 1-866-PLessThan (1-866-753-7784)
Fax: 1-415-564-2220
www.PLessThan.com


	[[alternative HTML version deleted]]

