[R] problem with reading data files with different numbers oflines to skip
john seers (IFR)
john.seers at bbsrc.ac.uk
Fri Aug 3 14:41:06 CEST 2007
Hi Tom
It looks as if you are reading in genepix files. I believe the format for the start lines includes a second line to say how many lines to skip. Something like this, specifying 27 lines to skip:
ATF 1
27 43
Type=GenePix Results 1.4
DateTime=2003/11/14 17:18:30
If so here is a function I use to do what you want to do. If your files have a different format then you need to modify how you set the number of lines to skip.
# Preprocess the genepix files - strip off first header lines
dopix<-function(genepixfiles, workingdir) {
pre<-"Pre"
# Read in each genepix file, strip unwanted rows and write out again
for (pixfile in genepixfiles) {
pixfileout<-paste(workingdir, pre, basename(pixfile), sep="")
secondline<-read.table(pixfile, skip=1, nrows=1)
skiplines<-as.numeric(secondline[1]) + 2
outdf<-read.table(pixfile, header=T, skip=skiplines, sep="\t")
write.table(outdf, file=pixfileout, sep="\t", row.names=FALSE)
}
}
Regards
John Seers
-----Original Message-----
From: r-help-bounces at stat.math.ethz.ch [mailto:r-help-bounces at stat.math.ethz.ch] On Behalf Of Tom Cohen
Sent: 03 August 2007 13:04
To: r-help at stat.math.ethz.ch
Subject: Re: [R] problem with reading data files with different numbers oflines to skip
Thanks to Ted and Gabor for your response.
I apology for not being clear with my previous description of the problem. I tried with your suggestions using readLines but couldn't make it work. I now explain the problem in more details and hope that you can help me out.
I have 30 data files, where some of them have 33 lines and the rests have 31 lines that I want to skip (examples below with bold text). That is, I only want to keep the lines start from
Block Column Row Name ID
I read in the data files with a loop like below, the problem is how do I tell the loop to skip 31 lines in some data files and 33 in the rests ?
> for (i in 1:num.files) {
> a<-read.table(file=data[i],
> ,header=T,skip=31,sep='\t',na.strings="NA") }
Thanks for your help,
Tom
# 33 lines to skip
Type=GenePix Results 3 DateTime=2006/10/20 13:35:11 Settings= GalFile=G:\Avdelningar\viv\translational immunologi\Peptide-arrays\Gal-files\742-human-pep2.gal PixelSize=10 Wavelengths=635 ImageFiles=M:\Peptidearrays\061020\742-2.tif 1 NormalizationMethod=None NormalizationFactors=1 JpegImage=C:\Documents and Settings\Shahnaz Mahdavifar\Skrivbord\Human pep,742\742-2.s1.jpg StdDev=Type 1 RatioFormulations=W1/W2 (635/) FeatureType=Circular Barcode= BackgroundSubtraction=LocalFeature ImageOrigin=560, 1360 JpegOrigin=1940, 3670 Creator=GenePix Pro 6.0.1.25 Scanner=GenePix 4000B [84948] FocusPosition=0 Temperature=30.2 LinesAveraged=1
Comment= PMTGain=600 ScanPower=100 LaserPower=3.36 Filters=<Empty> ScanRegion=56,136,2123,6532 Supplier=Genetix Ltd. ArrayerSoftwareName=MicroArraying ArrayerSoftwareVersion=QSoft XP Build 6450 (Revision 131) Block Column Row Name ID X Y Dia. F635 Median F635 Mean 1 1 1 IgG-human none 2390 4140 200 301 317 1 2 1 >PGDR_HUMAN (P09619) AHASDEIYEIMQK 2630 4140 200 254 250 1 3 1 >ML1X_HUMAN (Q13585) AIAHPVSDDSDLP 2860 4140 200 268 252
1000 more rows....
# 31 lines to skip
ATF 1.0 29 41 Type=GenePix Results 3 DateTime=2006/10/20 13:05:20 Settings= GalFile=G:\Avdelningar\viv\translational immunologi\Peptide-arrays\Gal-files\742-s2.gal PixelSize=10 Wavelengths=635 ImageFiles=M:\Peptidearrays\061020\742-4.tif 1 NormalizationMethod=None NormalizationFactors=1 JpegImage=C:\Documents and Settings\Shahnaz Mahdavifar\Skrivbord\Human pep,742\742-4.s2.jpg StdDev=Type 1 RatioFormulations=W1/W2 (635/) FeatureType=Circular Barcode= BackgroundSubtraction=LocalFeature ImageOrigin=560, 1360 JpegOrigin=1950, 24310 Creator=GenePix Pro 6.0.1.25 Scanner=GenePix 4000B [84948] FocusPosition=0
Temperature=28.49 LinesAveraged=1 Comment= PMTGain=600 ScanPower=100 LaserPower=3.32 Filters=<Empty> ScanRegion=56,136,2113,6532 Supplier= Block Column Row Name ID X Y Dia. F635 Median F635 Mean 1 1 1 IgG-human none 2370 24780 200 133 175 1 2 1 >PGDR_HUMAN (P09619) AHASDEIYEIMQK 2600 24780 200 120 121 1 3 1 >ML1X_HUMAN (Q13585) AIAHPVSDDSDLP 2840 24780 200 120 118
1000 more rows....
ted.harding at nessie.mcc.ac.uk skrev:
On 02-Aug-07 21:14:20, Tom Cohen wrote:
> Dear List,
>
> I have 30 data files with different numbers of lines (31 and 33) that
> I want to skip before reading the files. If I use the skip option I
> can only choose either to skip 31 or 33 lines. The data files with 31
> lines have no blank rows between the lines and the header row. How can
> I read the files without manually checking which files have 31
> respectively 33 lines ? The only text line I want to keep is the header.
>
> Thamks for your help,
> Tom
>
>
> for (i in 1:num.files) {
> a<-read.table(file=data[i],
> ,header=T,skip=31,sep='\t',na.strings="NA")
>
> }
Apologies, I misunderstood your description in my previous response (I thought that the total number of lines in one of your files was either 31 or 33, and you wanted to know which was which).
I now think you mean that there are either 0 (you want to skip 31) or 2 (you want to skip 33) blank lines in the first 33, and then you want the remainder (aswell as the header). Though it's still not really clear ...
You can find out how many blank lines there are in the first 33 with
> sum(cbind(readLines("~/00_junk/temp.tr", 33))=="")
and then choose how many lines to skip.
Best wishes,
Ted.
--------------------------------------------------------------------
E-Mail: (Ted Harding)
Fax-to-email: +44 (0)870 094 0861
Date: 03-Aug-07 Time: 00:11:21
------------------------------ XFMail ------------------------------
---------------------------------
Jämför pris på flygbiljetter och hotellrum: http://shopping.yahoo.se/c-169901-resor-biljetter.html
[[alternative HTML version deleted]]
More information about the R-help
mailing list