[R] Loops for repetitive task
Dennis Murphy
djmuser at gmail.com
Wed Aug 10 15:12:56 CEST 2011
Hi:
Try this:
## Function that takes a data frame as input and outputs a data frame:
chrSumm <- function(d) { # d is a data frame
colnames(d) <- c("chr","start","end","base1","base2",
"totalreads","methylation","strand")
TR <- nrow(d)
RG1 <- sum(d['totalreads'] >= 1)
percent <- TR/RG1
methylSumm <- summary(d$methylation)
names(methylSumm) <- c('Min', 'Q1', 'Median', 'Mean', 'Q3', 'Max')
data.frame(TR, RG1, percent, as.data.frame(as.list(methylSumm)))
}
# Read the data files into a list and apply the function to each file
recursively,
# resulting in a data frame
# vector of file names
files <- c('chr1.out.txt', 'chr2.out.txt')
# use lapply() to read files into a list
filelist <- lapply(files, read.table, header = FALSE)
# Use the ldply() function from the plyr package to
# process the list and return a data frame
library('plyr')
ldply(filelist, chrSumm)
# Result from your example:
> ldply(filelist, chrSumm)
TR RG1 percent Min Q1 Median Mean Q3 Max
1 4 4 1.0 0.04 0.0475 0.07 0.07500 0.0975 0.12
2 3 2 1.5 0.00 0.0150 0.03 0.03667 0.0550 0.08
HTH,
Dennis
On Tue, Aug 9, 2011 at 9:31 PM, a217 <ajn21 at case.edu> wrote:
> Hello,
>
> I have an R script that I use as a template to perform a task for multiple
> files (in this case, multiple chromosomes).
>
> What I would like to do is to utilize a simple loop to parse through each
> chromosome number so that I don't have to type the same code over and over
> again in the R console.
>
> I've tried using:
>
> for(i in 1:22){
> etc..
> }
>
> and replacing each chromosome number with [[i]], but that did not seem to
> work.
>
> Below is the script I have. Basically everywhere you see a '2' I would like
> there to be an 'i' so that the script can be applied in a general sense.
> ################################Code###############################
>
> chr2.data<-read.table(file="chr2.out.txt", header=F)
> colnames(chr2.data)<-c("chr","start","end","base1","base2","totalreads","methylation","strand")
> splc2<-split(chr2.data, paste(chr2.data$chr))
> chr2.df<-as.data.frame(t(sapply(splc2, function(x)
> list(TR=NROW(x[['totalreads']]), RG1=sum(x[['totalreads']]>=1),
> percent=(NROW(x[['totalreads']]>=1)/sum(x[['totalreads']]))))))
> chr2.df.summ<-as.data.frame(t(sapply(splc2, function(x)
> summary(x$methylation))))
> chr2.summ<-cbind(chr2.df,chr2.df.summ)
>
> ##################################################################
>
>
> Here are some sample input files in case you'd like to test the code:
> ##########
> # chr1.out.txt
> ##########
> chr1 100 159 104 104 1 0.05 +
> chr1 100 159 145 145 1 0.04 +
> chr1 200 260 205 205 1 0.12 +
> chr1 500 750 600 600 1 0.09 +
>
> ##########
> # chr2.out.txt
> ##########
> chr2 100 200 105 105 1 0.03 +
> chr2 100 200 110 110 1 0.08 +
> chr2 300 400 350 350 0 0 +
>
>
> The code works perfectly fine just typing everything out by hand, but that
> is very inefficient given that there are 24 chromosomes for each dataset. I
> am just looking for any suggestions as to how I can write a general version
> of this code.
>
>
> --
> View this message in context: http://r.789695.n4.nabble.com/Loops-for-repetitive-task-tp3732022p3732022.html
> Sent from the R help mailing list archive at Nabble.com.
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>
More information about the R-help
mailing list