[R] problem with reading data files with different numbers of lines to skip

Fri Aug 3 14:32:21 CEST 2007

Please read the last line of every message
to r-help:
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

We don't know what you did.

On 8/3/07, Tom Cohen <tom.cohen78 at yahoo.se> wrote:
> Thanks to Ted and Gabor for your response.
>  I apology for not being clear with my previous description of the problem. I tried with  your suggestions using readLines but couldn't make it work. I now explain the problem in more details and hope that you can help me out.
>
>   I have 30 data files, where some of them have 33 lines and the rests have 31 lines that I want to skip (examples below with bold text). That is, I only want to keep the lines start from
>          Block  Column  Row  Name  ID
>
>  I read in the data files with a loop like below, the problem is how do I tell the loop to skip 31 lines in some data files and 33 in the rests ?
>
>  > for (i in 1:num.files) {
> > a<-read.table(file=data[i], ,header=T,skip=31,sep='\t',na.strings="NA")  }
>
>  Thanks for your help,
>  Tom
>
>  # 33 lines to skip
>
>            Type=GenePix Results 3                  DateTime=2006/10/20 13:35:11                Settings=                      GalFile=G:\Avdelningar\viv\translational immunologi\Peptide-arrays\Gal-files\742-human-pep2.gal    PixelSize=10                    Wavelengths=635                    ImageFiles=M:\Peptidearrays\061020\742-2.tif 1              NormalizationMethod=None                  NormalizationFactors=1                  JpegImage=C:\Documents and Settings\Shahnaz Mahdavifar\Skrivbord\Human pep,742\742-2.s1.jpg    StdDev=Type 1                    RatioFormulations=W1/W2 (635/)                FeatureType=Circular                  Barcode=                      BackgroundSubtraction=LocalFeature                ImageOrigin=560, 1360                  JpegOrigin=1940, 3670                  Creator=GenePix Pro 6.0.1.25                  Scanner=GenePix 4000B [84948]                FocusPosition=0                    Temperature=30.2                    LinesAveraged=1
>              Comment=                    PMTGain=600                    ScanPower=100                    LaserPower=3.36                    Filters=<Empty>                    ScanRegion=56,136,2123,6532                  Supplier=Genetix Ltd.                  ArrayerSoftwareName=MicroArraying                ArrayerSoftwareVersion=QSoft XP Build 6450 (Revision 131)            Block  Column  Row  Name  ID  X  Y  Dia.  F635 Median  F635 Mean    1  1  1  IgG-human  none  2390  4140  200  301  317    1  2  1  >PGDR_HUMAN (P09619)  AHASDEIYEIMQK  2630  4140  200  254  250    1  3  1  >ML1X_HUMAN (Q13585)  AIAHPVSDDSDLP  2860  4140  200  268  252
>  1000 more rows....
>
>
>
>  # 31 lines to skip
>
>              ATF  1.0                    29  41                    Type=GenePix Results 3                  DateTime=2006/10/20 13:05:20                Settings=                      GalFile=G:\Avdelningar\viv\translational immunologi\Peptide-arrays\Gal-files\742-s2.gal      PixelSize=10                    Wavelengths=635                    ImageFiles=M:\Peptidearrays\061020\742-4.tif 1              NormalizationMethod=None                  NormalizationFactors=1                  JpegImage=C:\Documents and Settings\Shahnaz Mahdavifar\Skrivbord\Human pep,742\742-4.s2.jpg    StdDev=Type 1                    RatioFormulations=W1/W2 (635/)                FeatureType=Circular                  Barcode=                      BackgroundSubtraction=LocalFeature                ImageOrigin=560, 1360                  JpegOrigin=1950, 24310                  Creator=GenePix Pro 6.0.1.25                  Scanner=GenePix 4000B [84948]                FocusPosition=0
>  Temperature=28.49                    LinesAveraged=1                    Comment=                    PMTGain=600                    ScanPower=100                    LaserPower=3.32                    Filters=<Empty>                    ScanRegion=56,136,2113,6532                  Supplier=                      Block  Column  Row  Name  ID  X  Y  Dia.  F635 Median  F635 Mean    1  1  1  IgG-human  none  2370  24780  200  133  175    1  2  1  >PGDR_HUMAN (P09619)  AHASDEIYEIMQK  2600  24780  200  120  121    1  3  1  >ML1X_HUMAN (Q13585)  AIAHPVSDDSDLP  2840  24780  200  120  118
>  1000 more rows....
>
>
> ted.harding at nessie.mcc.ac.uk skrev:
>  On 02-Aug-07 21:14:20, Tom Cohen wrote:
> > Dear List,
> >
> > I have 30 data files with different numbers of lines (31 and 33) that
> > I want to skip before reading the files. If I use the skip option I can
> > only choose either to skip 31 or 33 lines. The data files with 31 lines
> > have no blank rows between the lines and the header row. How can I read
> > the files without manually checking which files have 31 respectively 33
> > lines ? The only text line I want to keep is the header.
> >
> > Thamks for your help,
> > Tom
> >
> >
> > for (i in 1:num.files) {
> > a<-read.table(file=data[i],
> > ,header=T,skip=31,sep='\t',na.strings="NA")
> >
> > }
>
> Apologies, I misunderstood your description in my previous response
> (I thought that the total number of lines in one of your files was
> either 31 or 33, and you wanted to know which was which).
>
> I now think you mean that there are either 0 (you want to skip 31)
> or 2 (you want to skip 33) blank lines in the first 33, and then you
> want the remainder (aswell as the header). Though it's still not
> really clear ...
>
> You can find out how many blank lines there are in the first 33 with
>
> > sum(cbind(readLines("~/00_junk/temp.tr", 33))=="")
>
> and then choose how many lines to skip.
>
> Best wishes,
> Ted.
>
> --------------------------------------------------------------------
> E-Mail: (Ted Harding)
> Fax-to-email: +44 (0)870 094 0861
> Date: 03-Aug-07 Time: 00:11:21
> ------------------------------ XFMail ------------------------------
>
>
>
> ---------------------------------
>
> Jämför pris på flygbiljetter och hotellrum: http://shopping.yahoo.se/c-169901-resor-biljetter.html
>        [[alternative HTML version deleted]]
>
>
> ______________________________________________
> R-help at stat.math.ethz.ch mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>
>