[R] For help in R coding

David Winsemius dwinsemius at comcast.net
Sat Jul 2 04:39:07 CEST 2011


On Jul 1, 2011, at 9:18 PM, Bansal, Vikas wrote:

> Dear David,
>
> it is showing this error-

Looks like a syntax error rather than a semantic error.

>
> data.frame(A = unlist(lapply( lapply( sapply(mydf[,5], strsplit,
> + split="a|A"), length) , "-", 1)),C =  
> unlist(lapply( lapply( sapply((mydf[,5], strsplit, split="c|C"),
> Error: unexpected ',' in:
> "data.frame(A = unlist(lapply( lapply( sapply(, strsplit,

There seems to be a missing object to the first argument of sapply...?

You should supply str(mydf[,5]) or at least see if the error occurs on  
mydf[1:20, 5] and supply str on that it the error persists.

--
David.

> split="a|A"), length) , "-", 1)),C =  
> unlist(lapply( lapply( sapply((mydf[,5],"
>> length) , "-", 1)),G = unlist(lapply( lapply( sapply((mydf[,5],  
>> strsplit, split="g|G"),
> Error: unexpected ')' in "length)"
>> length) , "-", 1)),T = unlist(lapply( lapply( sapply(mydf[,5],  
>> strsplit, split="t|T"),
> Error: unexpected ')' in "length)"
>
> What should I do?
>
> Thanking you,
> Warm Regards
> Vikas Bansal
> Msc Bioinformatics
> Kings College London
> ________________________________________
> From: David Winsemius [dwinsemius at comcast.net]
> Sent: Saturday, July 02, 2011 2:07 AM
> To: Bansal, Vikas
> Subject: Re: [R] For help in R coding
>
> On Jul 1, 2011, at 8:01 PM, Bansal, Vikas wrote:
>
>> Dear David,
>>
>> Thanks for your reply.I tried your code it is running but as I
>> mentioned in my mail,I am working on pileup file.So I used a command-
>> mydf=read.table(
>> to read pileup file to have data frame i:e mydf.Now the problem is
>> it has 10 columns and have to count the number of A C G T which is
>> in 9th column.
>> In your mail we input data like this
>>> txt <- " .a,g,,
>> +            .t,t,,
>> +            .,c,c,
>> +            .,a,,,
>> +            .,t,t,t
>> +            .c,,g,^!.
>> +            .g,ggg.^!,
>> +            .$,,,,,.,
>> +            a,g,,t,
>> +            ,,,,,.,^!.
>> +            ,$,,,,.,."
>>
>> but how I should input my data from dataframe mydf using txt command
>> because there are thousands of rows?
>
> Just sent mydf[ , 9] as the argument in place of testvec.
>
>>
>> Thanking you,
>> Warm Regards
>> Vikas Bansal
>> Msc Bioinformatics
>> Kings College London
>> ________________________________________
>> From: David Winsemius [dwinsemius at comcast.net]
>> Sent: Friday, July 01, 2011 11:25 PM
>> To: Bansal, Vikas
>> Cc: r-help at r-project.org
>> Subject: Re: [R] For help in R coding
>>
>> On Jul 1, 2011, at 12:47 PM, Bansal, Vikas wrote:
>>
>>> Dear all,
>>>
>>> I am doing a project on variant calling using R.I am working on
>>> pileup file.There are 10 columns in my data frame and I want to
>>> count the number of A,C,G and T in each row for column 9.example of
>>> column 9 is given below-
>>>
>>>          .a,g,,
>>>          .t,t,,
>>>          .,c,c,
>>>          .,a,,,
>>>          .,t,t,t
>>>          .c,,g,^!.
>>>          .g,ggg.^!,
>>>          .$,,,,,.,
>>>          a,g,,t,
>>>          ,,,,,.,^!.
>>>          ,$,,,,.,.
>>>
>>> This is a bit confusing for me as these characters are in one column
>>> and how can we scan them for each row to print number of A,C,G and T
>>> for each row.
>>
>> Seems a bit clunky but this does the job (first the data):
>>> txt <- " .a,g,,
>> +            .t,t,,
>> +            .,c,c,
>> +            .,a,,,
>> +            .,t,t,t
>> +            .c,,g,^!.
>> +            .g,ggg.^!,
>> +            .$,,,,,.,
>> +            a,g,,t,
>> +            ,,,,,.,^!.
>> +            ,$,,,,.,."
>>
>>> txtvec <- readLines(textConnection(txt))
>>
>> Now the clunky solution, Basically subtracts 1 from the counts of
>> "fragments" that result from splitting on each letter in turn. Could
>> be made prettier with a function that did the job.
>>
>>> data.frame(A = unlist(lapply( lapply( sapply(txtvec, strsplit,
>> split="a"), length) , "-", 1)),
>> + C = unlist(lapply( lapply( sapply(txtvec, strsplit, split="c"),
>> length) , "-", 1)),
>> + G = unlist(lapply( lapply( sapply(txtvec, strsplit, split="g"),
>> length) , "-", 1)),
>> + T = unlist(lapply( lapply( sapply(txtvec, strsplit, split="t"),
>> length) , "-", 1)) )
>>                      A C G T
>> .a,g,,               1 0 1 0
>>           .t,t,,     0 0 0 2
>>           .,c,c,     0 2 0 0
>>           .,a,,,     1 0 0 0
>>           .,t,t,t    0 0 0 2
>>           .c,,g,^!.  0 1 1 0
>>           .g,ggg.^!, 0 0 4 0
>>           .$,,,,,.,  0 0 0 0
>>           a,g,,t,    1 0 1 1
>>           ,,,,,.,^!. 0 0 0 0
>>           ,$,,,,.,.  0 0 0 0
>>
>> Has the advantage that the input data ends up as rownames, which  
>> was a
>> surprise.
>>
>> If you wanted to count "A" and "a" as equivalent, then the split
>> argument should be "a|A"
>>
>>
>>> Most of the rows have      .         and      ,    and other symbols
>>> but we will ignore them.I just want to run a loop with a counter
>>> which will count the number of A,C,G and T for each row and will
>>> give output something like this-
>>>
>>>
>>> A   C   G  T
>>> 1   0   1  0
>>> 0   0   0  2
>>> 0   2   0  0
>>> 1   0   0  0
>>> 0   0   0  3
>>>
>>> This output is for first 5 rows from the example given above.
>>>
>>> I am new to R can you please help me.I will be very thankful to you.
>>>
>>>
>>>
>>> Thanking you,
>>> Warm Regards
>>> Vikas Bansal
>>> Msc Bioinformatics
>>> Kings College London
>>> ______________________________________________
>>> R-help at r-project.org mailing list
>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
>>> and provide commented, minimal, self-contained, reproducible code.
>>
>> David Winsemius, MD
>> West Hartford, CT
>>
>>
>>
>>
>>
>>
>
> David Winsemius, MD
> West Hartford, CT
>

David Winsemius, MD
West Hartford, CT



More information about the R-help mailing list