[R] Read text file subsetting rows

Charles C. Berry cberry at tajo.ucsd.edu
Fri Apr 11 18:21:23 CEST 2008


On Fri, 11 Apr 2008, Zev Ross wrote:

> Hi All,
>
> Can anyone direct me to a read function in R that will allow me to only
> read in rows of a text file that begin with a particular value such as
> the data below. I would read the entire file in and then limit, but the
> files were constructed such that the first two letters determine how
> many variables are in the row (different letters mean different numbers
> of columns and different column names/types).
>
> I can do this in SAS, but I'd prefer to use R. The approximate SAS code
> is below with the key piece of code being "if rectype='RD'" then do.
>
> Thoughts?

If your data are in 'tmp.dat':

> txt <- readLines( "tmp.dat" ) 
> con <- textConnection( grep( "^RD", txt, value=TRUE ) )
> dat <- read.csv( con, sep='|', header=FALSE)
> close(con)
> summary( dat[ , 1:3 ] )
   V1    V2          V3
  RD:6   I:6   Min.   :1
               1st Qu.:1
               Median :1
               Mean   :1
               3rd Qu.:1
               Max.   :1

Alternatively, if you have 'grep' in your system and in the path:

> con2 <- pipe( 'grep "^RD" tmp.dat' )
> dat2 <- read.csv( con2, sep='|', header=FALSE)
>


See
 	?connection
 	?textConnection
 	?grep

HTH,

Chuck
>
> Zev
>
>
> RD|I|01|073|0023|68103|5|7|017|810|20070103|00:00|0.6||3|||||||||||||
> RD|I|01|073|0023|68103|5|7|017|810|20070106|00:00|9.5||3|||||||||||||
> RD|I|01|073|0023|68103|5|7|017|810|20070109|00:00|2.5||3|||||||||||||
> RD|I|01|073|0023|68103|5|7|017|810|20070112|00:00|13.7||3|||||||||||||
> RD|I|01|073|0023|68103|5|7|017|810|20070115|00:00|7.3||3|||||||||||||
> RA|I|01|073|0023|A334|5|7|017|810|20070118|00:00|3.7||3|||||||||||||
> RD|I|01|073|0023|68103|5|7|017|810|20070121|00:00|6.9||3|||||||||||||
> RC|I|01|073|0023|Quer|5|7|017|810|20070124|00:00|1.8||3|||||||||||||
>
>
> infile 'C:\junk\RD_501_88101_2006-0.txt'
> dlm='|' firstobs=3 missover;
> rectype $2. @;
> if rectype = 'RD' then do;
>
> -- 
> Zev Ross
> ZevRoss Spatial Analysis
> 303 Fairmount Ave
> Ithaca, NY 14850
> 607-277-0004 (phone)
> 866-877-3690 (fax, toll-free)
> zev at zevross.com
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

Charles C. Berry                            (858) 534-2098
                                             Dept of Family/Preventive Medicine
E mailto:cberry at tajo.ucsd.edu	            UC San Diego
http://famprevmed.ucsd.edu/faculty/cberry/  La Jolla, San Diego 92093-0901



More information about the R-help mailing list