[R] problem with reading data files with different numbers of lines to skip
Gabor Grothendieck
ggrothendieck at gmail.com
Fri Aug 3 14:32:21 CEST 2007
Please read the last line of every message
to r-help:
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
We don't know what you did.
On 8/3/07, Tom Cohen <tom.cohen78 at yahoo.se> wrote:
> Thanks to Ted and Gabor for your response.
> I apology for not being clear with my previous description of the problem. I tried with your suggestions using readLines but couldn't make it work. I now explain the problem in more details and hope that you can help me out.
>
> I have 30 data files, where some of them have 33 lines and the rests have 31 lines that I want to skip (examples below with bold text). That is, I only want to keep the lines start from
> Block Column Row Name ID
>
> I read in the data files with a loop like below, the problem is how do I tell the loop to skip 31 lines in some data files and 33 in the rests ?
>
> > for (i in 1:num.files) {
> > a<-read.table(file=data[i], ,header=T,skip=31,sep='\t',na.strings="NA") }
>
> Thanks for your help,
> Tom
>
> # 33 lines to skip
>
> Type=GenePix Results 3 DateTime=2006/10/20 13:35:11 Settings= GalFile=G:\Avdelningar\viv\translational immunologi\Peptide-arrays\Gal-files\742-human-pep2.gal PixelSize=10 Wavelengths=635 ImageFiles=M:\Peptidearrays\061020\742-2.tif 1 NormalizationMethod=None NormalizationFactors=1 JpegImage=C:\Documents and Settings\Shahnaz Mahdavifar\Skrivbord\Human pep,742\742-2.s1.jpg StdDev=Type 1 RatioFormulations=W1/W2 (635/) FeatureType=Circular Barcode= BackgroundSubtraction=LocalFeature ImageOrigin=560, 1360 JpegOrigin=1940, 3670 Creator=GenePix Pro 6.0.1.25 Scanner=GenePix 4000B [84948] FocusPosition=0 Temperature=30.2 LinesAveraged=1
> Comment= PMTGain=600 ScanPower=100 LaserPower=3.36 Filters=<Empty> ScanRegion=56,136,2123,6532 Supplier=Genetix Ltd. ArrayerSoftwareName=MicroArraying ArrayerSoftwareVersion=QSoft XP Build 6450 (Revision 131) Block Column Row Name ID X Y Dia. F635 Median F635 Mean 1 1 1 IgG-human none 2390 4140 200 301 317 1 2 1 >PGDR_HUMAN (P09619) AHASDEIYEIMQK 2630 4140 200 254 250 1 3 1 >ML1X_HUMAN (Q13585) AIAHPVSDDSDLP 2860 4140 200 268 252
> 1000 more rows....
>
>
>
> # 31 lines to skip
>
> ATF 1.0 29 41 Type=GenePix Results 3 DateTime=2006/10/20 13:05:20 Settings= GalFile=G:\Avdelningar\viv\translational immunologi\Peptide-arrays\Gal-files\742-s2.gal PixelSize=10 Wavelengths=635 ImageFiles=M:\Peptidearrays\061020\742-4.tif 1 NormalizationMethod=None NormalizationFactors=1 JpegImage=C:\Documents and Settings\Shahnaz Mahdavifar\Skrivbord\Human pep,742\742-4.s2.jpg StdDev=Type 1 RatioFormulations=W1/W2 (635/) FeatureType=Circular Barcode= BackgroundSubtraction=LocalFeature ImageOrigin=560, 1360 JpegOrigin=1950, 24310 Creator=GenePix Pro 6.0.1.25 Scanner=GenePix 4000B [84948] FocusPosition=0
> Temperature=28.49 LinesAveraged=1 Comment= PMTGain=600 ScanPower=100 LaserPower=3.32 Filters=<Empty> ScanRegion=56,136,2113,6532 Supplier= Block Column Row Name ID X Y Dia. F635 Median F635 Mean 1 1 1 IgG-human none 2370 24780 200 133 175 1 2 1 >PGDR_HUMAN (P09619) AHASDEIYEIMQK 2600 24780 200 120 121 1 3 1 >ML1X_HUMAN (Q13585) AIAHPVSDDSDLP 2840 24780 200 120 118
> 1000 more rows....
>
>
> ted.harding at nessie.mcc.ac.uk skrev:
> On 02-Aug-07 21:14:20, Tom Cohen wrote:
> > Dear List,
> >
> > I have 30 data files with different numbers of lines (31 and 33) that
> > I want to skip before reading the files. If I use the skip option I can
> > only choose either to skip 31 or 33 lines. The data files with 31 lines
> > have no blank rows between the lines and the header row. How can I read
> > the files without manually checking which files have 31 respectively 33
> > lines ? The only text line I want to keep is the header.
> >
> > Thamks for your help,
> > Tom
> >
> >
> > for (i in 1:num.files) {
> > a<-read.table(file=data[i],
> > ,header=T,skip=31,sep='\t',na.strings="NA")
> >
> > }
>
> Apologies, I misunderstood your description in my previous response
> (I thought that the total number of lines in one of your files was
> either 31 or 33, and you wanted to know which was which).
>
> I now think you mean that there are either 0 (you want to skip 31)
> or 2 (you want to skip 33) blank lines in the first 33, and then you
> want the remainder (aswell as the header). Though it's still not
> really clear ...
>
> You can find out how many blank lines there are in the first 33 with
>
> > sum(cbind(readLines("~/00_junk/temp.tr", 33))=="")
>
> and then choose how many lines to skip.
>
> Best wishes,
> Ted.
>
> --------------------------------------------------------------------
> E-Mail: (Ted Harding)
> Fax-to-email: +44 (0)870 094 0861
> Date: 03-Aug-07 Time: 00:11:21
> ------------------------------ XFMail ------------------------------
>
>
>
> ---------------------------------
>
> Jämför pris på flygbiljetter och hotellrum: http://shopping.yahoo.se/c-169901-resor-biljetter.html
> [[alternative HTML version deleted]]
>
>
> ______________________________________________
> R-help at stat.math.ethz.ch mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>
>
More information about the R-help
mailing list