[R] For column values-Quality control

Bansal, Vikas vikas.bansal at kcl.ac.uk
Sat Jul 9 18:45:28 CEST 2011


Dear sir,

I was doing with different code that is why u did not get output which I was saying.Please use this code on summary file-

I have a file that is summary.txt(I have attached it) .we can read
this file using-

 dfa=read.table("summar.txt",fill=T,colClasses = "character",header=T)

 In V10 column I have  ASCII values which I converted into decimal
 numbers using this code-

dfa$V10 <- sapply(dfa$V10, function(a) paste(as.integer(charToRaw(a)), collapse = ' '))

now you will get this output.

 dfa
    V7 V8  V9      V10
1    0  1   G       96
2    0  1   T       97
3    0  1   C       97
4    0  1   A       97
5    0  1   G       95
6    0  1   G       94
7    0  1   C       94
8    0  1   C       92
9    0  1   A       98
10   0  1   T       97
11   0  1   g       94
12   0  1   A       92
13   0  1   C       95
14   0  1   G       97
15   0  1   C       88
16   0  1   C       96
17   0  1   G       97
18   0  1   G       95
19   0  1   G       97
20   0  1   G       97
21   0  1   A       97
22   0  1   G       97
23   0  1   G       97
24   0  1   C       97
25   0  1   A       97
26   0  1   C       95
27   0  1   A       88
28   0  1   g       96
29   0  2  GG    92 92
30   0  2  GG    91 94
31   0  2  AT    89 94
32   0  2  GG    96 93

the values in column V10 corresponds to A,C,G T in column V9.I want
only those, whose score is more than 90.so output of above should be-
V7 V8  V9      V10
1    0  1   G       96
2    0  1   T       97
3    0  1   C       97
4    0  1   A       97
5    0  1   G       95
6    0  1   G       94
7    0  1   C       94
8    0  1   C       92
9    0  1   A       98
10   0  1   T       97
11   0  1   g       94
12   0  1   A       92
13   0  1   C       95
14   0  1   G       97
16   0  1   C       96
17   0  1   G       97
18   0  1   G       95
19   0  1   G       97
20   0  1   G       97
21   0  1   A       97
22   0  1   G       97
23   0  1   G       97
24   0  1   C       97
25   0  1   A       97
26   0  1   C       95
28   0  1   g       96
29   0  2  GG    92 92
30   0  2  GG    91 94
31   0  2  T       89 94
32   0  2  GG    96 93

so in output 15th and 27th row should be deleted and 31st row should be-

31   0  2  T    89 94

because 89 is score for A and 94 is score for T.Therefore A has been deleted because its score is less than 90.

Can you help me please.







Thanking you,
Warm Regards
Vikas Bansal
Msc Bioinformatics
Kings College London
________________________________________
From: David Winsemius [dwinsemius at comcast.net]
Sent: Saturday, July 09, 2011 12:04 AM
To: Bansal, Vikas
Cc: r-help at r-project.org
Subject: Re: [R] For column values-Quality control

On Jul 8, 2011, at 6:46 PM, Bansal, Vikas wrote:

> Yes sir.you are right.after this I use this code to convert ASCII
> values in column V10 to decimal numbers-
>
> dfa$V10=lapply(dfa[,4], function(c) as.numeric(charToRaw(c)))
>
> now u will get output something like this-
>
> V7 V8
> V9                                                       V10
>  0  1
> G                                                        82
>  0  1              CGT
> c(90, 92, 96)
>  0  1
> GA                                                 c(78, 92)
>  0  1              GAG
> c(90, 92, 92)
>  0  1
> G                                                        88
>  0  1
> A                                                        96
>  0  1              ATT
> c(90, 96, 92)
>  0  1
> T                                                        94
>  0  1
> C                                                        97
>
> now after this I am facing the problem-
>

I don't think so: Here's what I getas teh top pf dfa after that
operation:
 > str(dfa)
'data.frame':   111 obs. of  4 variables:
  $ V7 : chr  "0" "0" "0" "0" ...
  $ V8 : chr  "1" "1" "1" "1" ...
  $ V9 : chr  "G" "T" "C" "A" ...
  $ V10:List of 111
   ..$ : num 96
   ..$ : num 97
   ..$ : num 97
   ..$ : num 97
   ..$ : num 95
   ..$ : num 90
   ..$ : num 94
   ..$ : num 92
   ..$ : num 90
   ..$ : num 97
   ..$ : num 94
   ..$ : num 92
   ..$ : num 95
   ..$ : num 97
   ..$ : num 88
   ..$ : num 96
   ..$ : num 97
   ..$ : num 95
   ..$ : num 97
   ..$ : num 97
   ..$ : num 97
   ..$ : num 97
   ..$ : num 97
   ..$ : num 97
   ..$ : num 97
   ..$ : num 95
   ..$ : num 88
   ..$ : num 96
   ..$ : num  92 92
   ..$ : num  91 94
   ..$ : num  89 94
,,,, more follows and output was terminated

I say again/// read the Posting Guide and use dump() or dput().

--
David.


> the values in column V10 corresponds to A,C,G T in column V9.I want
> only those, whose score is more than 91.so output of above should be-
>
> V7 V8
> V9                                                       V10
>  0  1              GT
> c(90, 92, 96)
>  0  1              A
> c(78, 92)
>  0  1              AG
> c(90, 92, 92)
>  0  1
> A                                                        96
>  0  1              TT
> c(90, 96, 92)
>  0  1
> T                                                        94
>  0  1
> C                                                        97
>
> First row should be deleted because it contains 82 which is less
> than 91.In second row C should deleted because it has less than 91
> score in col V10.
>
>
> Thanking you,
> Warm Regards
> Vikas Bansal
> Msc Bioinformatics
> Kings College London
> ________________________________________
> From: David Winsemius [dwinsemius at comcast.net]
> Sent: Friday, July 08, 2011 11:37 PM
> To: Bansal, Vikas
> Cc: r-help at r-project.org
> Subject: Re: [R] For column values-Quality control
>
> I get something entirely different when I execute that input command
> with the attached file:
>
> This is what I see as the first 14 lines for a displayed value for
> dfa:
>
>> dfa
>     V7 V8  V9  V10
> 1    0  1   G    `
> 2    0  1   T    a
> 3    0  1   C    a
> 4    0  1   A    a
> 5    0  1   G    _
> 6    0  1   G    Z
> 7    0  1   C    ^
> 8    0  1   C   \\
> 9    0  1   A    Z
> 10   0  1   T    a
> 11   0  1   g    ^
> 12   0  1   A   \\
> 13   0  1   C    _
> 14   0  1   G    a
>
> If this is different than what you see when you type dfa after input
> of that file in that manner then you should consider alternative
> methods of communicating an unambiguous representation of your dfa
> object.... as I have detailed in prior private messages.
>
> --
>
> David.
>
> On Jul 8, 2011, at 6:10 PM, Bansal, Vikas wrote:
>
>>
>> Dear all,
>>
>> I am really sorry for not giving the input file because in my mail,I
>> did not explain my problem in a best way.
>>
>> I have a file that is summary.txt(I have attached it) .we can read
>> this file using-
>>
>> dfa=read.table("summar.txt",fill=T,colClasses = "character",header=T)
>>
>> In V10 column I have  ASCII values which I converted into decimal
>> numbers using this code-
>>
>> dfa$V10=lapply(dfa[,4], function(c) as.numeric(charToRaw(c)))
>>
>> Now I have a dataframe dfa with these columns something like this-
>>
>> V7 V8
>> V9                                                       V10
>> 0  1
>> G                                                        82
>> 0  1              CGT
>> c(90, 92, 96)
>> 0  1
>> GA                                                 c(78, 92)
>> 0  1              GAG
>> c(90, 92, 92)
>> 0  1
>> G                                                        88
>> 0  1
>> A                                                        96
>> 0  1              ATT
>> c(90, 96, 92)
>> 0  1
>> T                                                        94
>> 0  1
>> C                                                        97
>>
>> the values in column V10 corresponds to A,C,G T in column V9.I want
>> only those whose score is more than 91.so output of above should be-
>>
>> V7 V8
>> V9                                                       V10
>> 0  1              GT
>> c(90, 92, 96)
>> 0  1              A
>> c(78, 92)
>> 0  1              AG
>> c(90, 92, 92)
>> 0  1
>> A                                                        96
>> 0  1              TT
>> c(90, 96, 92)
>> 0  1
>> T                                                        94
>> 0  1
>> C                                                        97
>>
>> Can you please tell me the solution.
>>
>> Thanking you,
>> Warm Regards
>> Vikas Bansal
>> Msc Bioinformatics
>> Kings College
>> London<summary.txt>______________________________________________
>> R-help at r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>
> David Winsemius, MD
> West Hartford, CT
>

David Winsemius, MD
West Hartford, CT

-------------- next part --------------
An embedded and charset-unspecified text was scrubbed...
Name: summary.txt
URL: <https://stat.ethz.ch/pipermail/r-help/attachments/20110709/15f0aa60/attachment.txt>


More information about the R-help mailing list