[R] For help in R coding

Bansal, Vikas vikas.bansal at kcl.ac.uk
Sat Jul 2 03:18:23 CEST 2011


Dear David,

it is showing this error-

 data.frame(A = unlist(lapply( lapply( sapply(mydf[,5], strsplit,  
+ split="a|A"), length) , "-", 1)),C = unlist(lapply( lapply( sapply((mydf[,5], strsplit, split="c|C"),  
Error: unexpected ',' in:
"data.frame(A = unlist(lapply( lapply( sapply(mydf[,5], strsplit,  
split="a|A"), length) , "-", 1)),C = unlist(lapply( lapply( sapply((mydf[,5],"
> length) , "-", 1)),G = unlist(lapply( lapply( sapply((mydf[,5], strsplit, split="g|G"),  
Error: unexpected ')' in "length)"
> length) , "-", 1)),T = unlist(lapply( lapply( sapply(mydf[,5], strsplit, split="t|T"),  
Error: unexpected ')' in "length)"

What should I do?

Thanking you,
Warm Regards
Vikas Bansal
Msc Bioinformatics
Kings College London
________________________________________
From: David Winsemius [dwinsemius at comcast.net]
Sent: Saturday, July 02, 2011 2:07 AM
To: Bansal, Vikas
Subject: Re: [R] For help in R coding

On Jul 1, 2011, at 8:01 PM, Bansal, Vikas wrote:

> Dear David,
>
> Thanks for your reply.I tried your code it is running but as I
> mentioned in my mail,I am working on pileup file.So I used a command-
> mydf=read.table(
> to read pileup file to have data frame i:e mydf.Now the problem is
> it has 10 columns and have to count the number of A C G T which is
> in 9th column.
> In your mail we input data like this
>> txt <- " .a,g,,
> +            .t,t,,
> +            .,c,c,
> +            .,a,,,
> +            .,t,t,t
> +            .c,,g,^!.
> +            .g,ggg.^!,
> +            .$,,,,,.,
> +            a,g,,t,
> +            ,,,,,.,^!.
> +            ,$,,,,.,."
>
> but how I should input my data from dataframe mydf using txt command
> because there are thousands of rows?

Just sent mydf[ , 9] as the argument in place of testvec.

>
> Thanking you,
> Warm Regards
> Vikas Bansal
> Msc Bioinformatics
> Kings College London
> ________________________________________
> From: David Winsemius [dwinsemius at comcast.net]
> Sent: Friday, July 01, 2011 11:25 PM
> To: Bansal, Vikas
> Cc: r-help at r-project.org
> Subject: Re: [R] For help in R coding
>
> On Jul 1, 2011, at 12:47 PM, Bansal, Vikas wrote:
>
>> Dear all,
>>
>> I am doing a project on variant calling using R.I am working on
>> pileup file.There are 10 columns in my data frame and I want to
>> count the number of A,C,G and T in each row for column 9.example of
>> column 9 is given below-
>>
>>           .a,g,,
>>           .t,t,,
>>           .,c,c,
>>           .,a,,,
>>           .,t,t,t
>>           .c,,g,^!.
>>           .g,ggg.^!,
>>           .$,,,,,.,
>>           a,g,,t,
>>           ,,,,,.,^!.
>>           ,$,,,,.,.
>>
>> This is a bit confusing for me as these characters are in one column
>> and how can we scan them for each row to print number of A,C,G and T
>> for each row.
>
> Seems a bit clunky but this does the job (first the data):
>> txt <- " .a,g,,
> +            .t,t,,
> +            .,c,c,
> +            .,a,,,
> +            .,t,t,t
> +            .c,,g,^!.
> +            .g,ggg.^!,
> +            .$,,,,,.,
> +            a,g,,t,
> +            ,,,,,.,^!.
> +            ,$,,,,.,."
>
>> txtvec <- readLines(textConnection(txt))
>
> Now the clunky solution, Basically subtracts 1 from the counts of
> "fragments" that result from splitting on each letter in turn. Could
> be made prettier with a function that did the job.
>
>> data.frame(A = unlist(lapply( lapply( sapply(txtvec, strsplit,
> split="a"), length) , "-", 1)),
> + C = unlist(lapply( lapply( sapply(txtvec, strsplit, split="c"),
> length) , "-", 1)),
> + G = unlist(lapply( lapply( sapply(txtvec, strsplit, split="g"),
> length) , "-", 1)),
> + T = unlist(lapply( lapply( sapply(txtvec, strsplit, split="t"),
> length) , "-", 1)) )
>                       A C G T
>  .a,g,,               1 0 1 0
>            .t,t,,     0 0 0 2
>            .,c,c,     0 2 0 0
>            .,a,,,     1 0 0 0
>            .,t,t,t    0 0 0 2
>            .c,,g,^!.  0 1 1 0
>            .g,ggg.^!, 0 0 4 0
>            .$,,,,,.,  0 0 0 0
>            a,g,,t,    1 0 1 1
>            ,,,,,.,^!. 0 0 0 0
>            ,$,,,,.,.  0 0 0 0
>
> Has the advantage that the input data ends up as rownames, which was a
> surprise.
>
> If you wanted to count "A" and "a" as equivalent, then the split
> argument should be "a|A"
>
>
>> Most of the rows have      .         and      ,    and other symbols
>> but we will ignore them.I just want to run a loop with a counter
>> which will count the number of A,C,G and T for each row and will
>> give output something like this-
>>
>>
>> A   C   G  T
>> 1   0   1  0
>> 0   0   0  2
>> 0   2   0  0
>> 1   0   0  0
>> 0   0   0  3
>>
>> This output is for first 5 rows from the example given above.
>>
>> I am new to R can you please help me.I will be very thankful to you.
>>
>>
>>
>> Thanking you,
>> Warm Regards
>> Vikas Bansal
>> Msc Bioinformatics
>> Kings College London
>> ______________________________________________
>> R-help at r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>
> David Winsemius, MD
> West Hartford, CT
>
>
>
>
>
>

David Winsemius, MD
West Hartford, CT



More information about the R-help mailing list