[R] Writing a summary file in R

David Winsemius dwinsemius at comcast.net
Thu Jul 28 01:19:44 CEST 2011

On Jul 27, 2011, at 7:02 PM, a217 wrote:

> Hello,
> I have an input file:
> http://r.789695.n4.nabble.com/file/n3700031/testOut.txt testOut.txt
> where col 1 is chromosome, column2 is start of region, column 3 is  
> end of
> region, column 4 and 5 is base position, column 6 is total reads,  
> column 7
> is methylation data, and column 8 is the strand.
> I would like a summary output file such as:
> http://r.789695.n4.nabble.com/file/n3700031/out.summary.txt  
> out.summary.txt
> where column 1 is chromosome, column 2 is start of region, column 3  
> is end
> of region, column 4 is total reads in general, column 5 is total  
> reads >=1,
> column 6 is (col4/col5) or the percentage, and at the end I'd like  
> to list 6
> more columns based on summary results from summary() function in R.
> The summary() function will be used to analyze all of the  
> methylation data
> (col7 from input) for each region (bounded by col2 and col3).
> For example for chr1 100 159 summary() gives:
> Min. 1st Qu.  Median    Mean 3rd Qu.    Max.
> 0.0400  0.0425  0.0450  0.0450  0.0475  0.0500
> which is simply the methylation data input into summary() only in  
> the region
> of chr1 100 159.
> I know how to perform all of the required functions line-by-line,  
> but the
> hard part for me is essentially taking the input data with multiple
> positions in each region and assigning all of the summary results to  
> one
> line identified by the region.
> If any of you have any suggestions I would appreciate it.

So essentially you want to drop columns 4:5 and column 8 and calculate  
a proportion of counts >= 1 and get summary stats within  separate  
categories of start-of-region. Is that correct?

This is probably  a job for aggregate or for ddply in plyr if I felt  
comfortable with it, which I don't in general. Its documentation  
through the help pages is s not great IMO but there are those who love  
it. And I admit the melt function is a major contributor to human  
happiness.  Why don't you read up on aggregate which is a base  
function (in the r-sense, not in the biological sense.) I will see  
what I can come up with in the meantime.

David Winsemius, MD
West Hartford, CT

More information about the R-help mailing list