[R] For help in R coding
Bansal, Vikas
vikas.bansal at kcl.ac.uk
Sat Jul 2 03:06:55 CEST 2011
Dear David,
Thanks for your reply.I tried your code it is running but as I mentioned in my mail,I am working on pileup file.So I used a command-
mydf=read.table("Case2.pileup",fill=T,sep="\t")
to read pileup file to have data frame i:e mydf.Now the problem is it has 10 columns and have to count the number of A C G T which is in 9th column.
In your mail we input data like this
> txt <- " .a,g,,
+ .t,t,,
+ .,c,c,
+ .,a,,,
+ .,t,t,t
+ .c,,g,^!.
+ .g,ggg.^!,
+ .$,,,,,.,
+ a,g,,t,
+ ,,,,,.,^!.
+ ,$,,,,.,."
but how I should input my data(in column 9) from dataframe mydf using txt command because there are thousands of rows?
Thanking you,
Warm Regards
Vikas Bansal
Msc Bioinformatics
Kings College London
________________________________________
From: David Winsemius [dwinsemius at comcast.net]
Sent: Friday, July 01, 2011 11:25 PM
To: Bansal, Vikas
Cc: r-help at r-project.org
Subject: Re: [R] For help in R coding
On Jul 1, 2011, at 12:47 PM, Bansal, Vikas wrote:
> Dear all,
>
> I am doing a project on variant calling using R.I am working on
> pileup file.There are 10 columns in my data frame and I want to
> count the number of A,C,G and T in each row for column 9.example of
> column 9 is given below-
>
> .a,g,,
> .t,t,,
> .,c,c,
> .,a,,,
> .,t,t,t
> .c,,g,^!.
> .g,ggg.^!,
> .$,,,,,.,
> a,g,,t,
> ,,,,,.,^!.
> ,$,,,,.,.
>
> This is a bit confusing for me as these characters are in one column
> and how can we scan them for each row to print number of A,C,G and T
> for each row.
Seems a bit clunky but this does the job (first the data):
> txt <- " .a,g,,
+ .t,t,,
+ .,c,c,
+ .,a,,,
+ .,t,t,t
+ .c,,g,^!.
+ .g,ggg.^!,
+ .$,,,,,.,
+ a,g,,t,
+ ,,,,,.,^!.
+ ,$,,,,.,."
> txtvec <- readLines(textConnection(txt))
Now the clunky solution, Basically subtracts 1 from the counts of
"fragments" that result from splitting on each letter in turn. Could
be made prettier with a function that did the job.
> data.frame(A = unlist(lapply( lapply( sapply(txtvec, strsplit,
split="a"), length) , "-", 1)),
+ C = unlist(lapply( lapply( sapply(txtvec, strsplit, split="c"),
length) , "-", 1)),
+ G = unlist(lapply( lapply( sapply(txtvec, strsplit, split="g"),
length) , "-", 1)),
+ T = unlist(lapply( lapply( sapply(txtvec, strsplit, split="t"),
length) , "-", 1)) )
A C G T
.a,g,, 1 0 1 0
.t,t,, 0 0 0 2
.,c,c, 0 2 0 0
.,a,,, 1 0 0 0
.,t,t,t 0 0 0 2
.c,,g,^!. 0 1 1 0
.g,ggg.^!, 0 0 4 0
.$,,,,,., 0 0 0 0
a,g,,t, 1 0 1 1
,,,,,.,^!. 0 0 0 0
,$,,,,.,. 0 0 0 0
Has the advantage that the input data ends up as rownames, which was a
surprise.
If you wanted to count "A" and "a" as equivalent, then the split
argument should be "a|A"
> Most of the rows have . and , and other symbols
> but we will ignore them.I just want to run a loop with a counter
> which will count the number of A,C,G and T for each row and will
> give output something like this-
>
>
> A C G T
> 1 0 1 0
> 0 0 0 2
> 0 2 0 0
> 1 0 0 0
> 0 0 0 3
>
> This output is for first 5 rows from the example given above.
>
> I am new to R can you please help me.I will be very thankful to you.
>
>
>
> Thanking you,
> Warm Regards
> Vikas Bansal
> Msc Bioinformatics
> Kings College London
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
David Winsemius, MD
West Hartford, CT
More information about the R-help
mailing list