[BioC] how to put together 600 files from the 5.0 Affy SNPchip

laurent lgautier at gmail.com
Sat Nov 8 13:43:55 CET 2008



On Fri, 2008-11-07 at 22:37 +0100, [Ricardo Rodriguez] Your EPEC Network
ICT Team wrote:
> Hi, Laura,
> 
> In the meantime perhaps you could give this a try:
> 
> https://stat.ethz.ch/pipermail/r-help/2006-September/112953.html
> 
> Just us setwd("/this/is/my/folder") to move to the folder holing your 
> files. I guess "c:/your/folder/here" must work on a Windows box.
> 
> This doesn't solve the "extra four heading lines issue" though. But I am 
> pretty sure if want be hard to cut they out prior to concatenate the files.
> 
> Please, Laurent, have you this function reading files as byte streams 
> already implemented in R? Thanks!

I don't. At the time, it was done in Perl (wishing it was Python), I
think.
It should be possible to do something similar using the R function
"readChar()".

> Cheers,
> 
> Ricardo
> 
> 
> Laura Rodriguez Murillo wrote:
> > Hi,
> >
> > Thank you for your email. As soon as I get some free time I'll try to
> > learn some perl. See what I can do with these files.
> >
> > Laura
> >
> > 2008/11/7 laurent <lgautier at gmail.com>:
> >   
> >> Such files do exceed the capabilities of most machines.
> >>
> >> One way I helped someone with a similar problem was to hack a script
> >> (Python or Perl come to my mind for such jobs, but it is possible to
> >> implement it in R if you like) that read the files as byte streams
> >> rather than line per line.
> >>
> >> This way it run with a minimal memory footprint (and will on Microsoft
> >> Windows... if the resulting file size does not exceed the capabilities
> >> of the OS).
> >>
> >>
> >> L.
> >>
> >> On Fri, 2008-11-07 at 13:38 -0500, Laura Rodriguez Murillo wrote:
> >>     
> >>> Benilton,
> >>> Thank you. Unfortunately, the paste command doesn't work with these
> >>> big files. Does this only work with R under unix or is it possible in
> >>> windows?. Otherwise, I'll try with R when I get back to the unix
> >>> machine.
> >>>
> >>> Laura
> >>>
> >>> 2008/11/7 Benilton Carvalho <bcarvalh at jhsph.edu>:
> >>>       
> >>>> Laura,
> >>>>
> >>>> if you're running *NIX, can't you just use the bash command "paste"?
> >>>>
> >>>> if you really want to use R, assume you have names of your files in the
> >>>> variable "files", then something like:
> >>>>
> >>>> ## This goes in R and assumes you're running *NIX
> >>>> cmd <- paste("paste", paste(files, collapse=" "), "> output.txt")
> >>>> system(cmd)
> >>>>
> >>>> later, you can just get rid of the first 4 lines of output.txt.
> >>>>
> >>>> b
> >>>>
> >>>> On Nov 7, 2008, at 3:04 PM, Laura Rodriguez Murillo wrote:
> >>>>
> >>>>         
> >>>>> David,
> >>>>>
> >>>>> Thank you for your reply. I had tried with this software but it
> >>>>> doesn't recognize my files, it looks as it doesn't like .txt files.
> >>>>> Any idea?
> >>>>>
> >>>>> Laura
> >>>>>
> >>>>> 2008/11/6 David Carter <dcarter at robarts.ca>:
> >>>>>           
> >>>>>> Hi Laura,
> >>>>>> Affymetrix has a free tool (free and easy to download) called Genotyping
> >>>>>> Console that will export 1 file for all your samples with SNPs on rows
> >>>>>> and
> >>>>>> samples on columns.  I haven't tried it with 622 samples though...
> >>>>>> Sincerely,
> >>>>>> David Carter
> >>>>>>
> >>>>>>
> >>>>>> Laura Rodriguez Murillo wrote:
> >>>>>>             
> >>>>>>> Hi,
> >>>>>>> I'm new in this mailing list and  also using bioconductor. I'd
> >>>>>>> appreciate your feedback on this: I have 622 files that correspond to
> >>>>>>> 622 samples genotyped for the SNPs in the 5.0 SNPChip from Affymetrix.
> >>>>>>> Each file consists of two columns of almost 500 K rows (plus 4 lines
> >>>>>>> at the begining that I won't need). The number of rows are the same in
> >>>>>>> every file. I would need to put all these files together, where the
> >>>>>>> first column is common to all of them (SNP names) (so I just need it
> >>>>>>> once in the big file). Once I have all the columns one after the other
> >>>>>>> I would also need to paste a column with the chromosome number for
> >>>>>>> each SNP (which is in another file, just this info alone). Do you know
> >>>>>>> if there's any way to do this with Bioconductor?.
> >>>>>>>
> >>>>>>> Thank you!
> >>>>>>>
> >>>>>>> Laura
> >>>>>>>
> >>>>>>> _______________________________________________
> >>>>>>> Bioconductor mailing list
> >>>>>>> Bioconductor at stat.math.ethz.ch
> >>>>>>> https://stat.ethz.ch/mailman/listinfo/bioconductor
> >>>>>>> Search the archives:
> >>>>>>> http://news.gmane.org/gmane.science.biology.informatics.conductor
> >>>>>>>
> >>>>>>>
> >>>>>>>               
> >>>>>> --
> >>>>>> David Carter
> >>>>>> Facility Manager
> >>>>>> London Regional Genomics Centre
> >>>>>> Robarts Research Institute, Room 4.01
> >>>>>> PO Box 5015, 100 Perth Drive
> >>>>>> London, Ontario, Canada, N6A 5K8
> >>>>>>
> >>>>>> phone:  519-663-3253
> >>>>>> fax:    519-663-3037
> >>>>>>
> >>>>>> dcarter at robarts.ca
> >>>>>> http://www.lrgc.ca
> >>>>>>
> >>>>>>
> >>>>>>
> >>>>>>             
> >>>>> _______________________________________________
> >>>>> Bioconductor mailing list
> >>>>> Bioconductor at stat.math.ethz.ch
> >>>>> https://stat.ethz.ch/mailman/listinfo/bioconductor
> >>>>> Search the archives:
> >>>>> http://news.gmane.org/gmane.science.biology.informatics.conductor
> >>>>>           
> >>>>         
> >>> _______________________________________________
> >>> Bioconductor mailing list
> >>> Bioconductor at stat.math.ethz.ch
> >>> https://stat.ethz.ch/mailman/listinfo/bioconductor
> >>> Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor
> >>>       
> >>     
> >
> > _______________________________________________
> > Bioconductor mailing list
> > Bioconductor at stat.math.ethz.ch
> > https://stat.ethz.ch/mailman/listinfo/bioconductor
> > Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor
> >
> >
> >   
> 
> _______________________________________________
> Bioconductor mailing list
> Bioconductor at stat.math.ethz.ch
> https://stat.ethz.ch/mailman/listinfo/bioconductor
> Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor



More information about the Bioconductor mailing list