[R] Loops for repetitive task

Wed Aug 10 15:12:56 CEST 2011

Hi:

Try this:

## Function that takes a data frame as input and outputs a data frame:
chrSumm <- function(d) {   # d is a data frame
    colnames(d) <- c("chr","start","end","base1","base2",
                     "totalreads","methylation","strand")
    TR <- nrow(d)
    RG1 <- sum(d['totalreads'] >= 1)
    percent <- TR/RG1
    methylSumm <- summary(d$methylation)
    names(methylSumm) <- c('Min', 'Q1', 'Median', 'Mean', 'Q3', 'Max')
    data.frame(TR, RG1, percent, as.data.frame(as.list(methylSumm)))
  }

# Read the data files into a list and apply the function to each file
recursively,
# resulting in a data frame

# vector of file names
files <- c('chr1.out.txt', 'chr2.out.txt')
# use lapply() to read files into a list
filelist <- lapply(files, read.table, header = FALSE)
# Use the ldply() function from the plyr package to
# process the list and return a data frame
library('plyr')
ldply(filelist, chrSumm)

# Result from your example:
> ldply(filelist, chrSumm)
  TR RG1 percent  Min     Q1 Median    Mean     Q3  Max
1  4   4     1.0 0.04 0.0475   0.07 0.07500 0.0975 0.12
2  3   2     1.5 0.00 0.0150   0.03 0.03667 0.0550 0.08

HTH,
Dennis

On Tue, Aug 9, 2011 at 9:31 PM, a217 <ajn21 at case.edu> wrote:
> Hello,
>
> I have an R script that I use as a template to perform a task for multiple
> files (in this case, multiple chromosomes).
>
> What I would like to do is to utilize a simple loop to parse through each
> chromosome number so that I don't have to type the same code over and over
> again in the R console.
>
> I've tried using:
>
> for(i in 1:22){
> etc..
> }
>
> and replacing each chromosome number with [[i]], but that did not seem to
> work.
>
> Below is the script I have. Basically everywhere you see a '2' I would like
> there to be an 'i' so that the script can be applied in a general sense.
> ################################Code###############################
>
> chr2.data<-read.table(file="chr2.out.txt", header=F)
> colnames(chr2.data)<-c("chr","start","end","base1","base2","totalreads","methylation","strand")
> splc2<-split(chr2.data, paste(chr2.data$chr))
> chr2.df<-as.data.frame(t(sapply(splc2, function(x)
> list(TR=NROW(x[['totalreads']]),    RG1=sum(x[['totalreads']]>=1),
> percent=(NROW(x[['totalreads']]>=1)/sum(x[['totalreads']]))))))
> chr2.df.summ<-as.data.frame(t(sapply(splc2, function(x)
> summary(x$methylation))))
> chr2.summ<-cbind(chr2.df,chr2.df.summ)
>
> ##################################################################
>
>
> Here are some sample input files in case you'd like to test the code:
> ##########
> # chr1.out.txt
> ##########
> chr1    100     159     104     104     1       0.05    +
> chr1    100     159     145     145     1       0.04    +
> chr1    200     260     205     205     1       0.12    +
> chr1    500     750     600     600     1       0.09    +
>
> ##########
> # chr2.out.txt
> ##########
> chr2    100     200     105     105     1       0.03    +
> chr2    100     200     110     110     1       0.08    +
> chr2    300     400     350     350     0       0       +
>
>
> The code works perfectly fine just typing everything out by hand, but that
> is very inefficient given that there are 24 chromosomes for each dataset. I
> am just looking for any suggestions as to how I can write a general version
> of this code.
>
>
> --
> View this message in context: http://r.789695.n4.nabble.com/Loops-for-repetitive-task-tp3732022p3732022.html
> Sent from the R help mailing list archive at Nabble.com.
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>