[R] For column values-Quality control

David Winsemius dwinsemius at comcast.net
Sat Jul 9 22:38:40 CEST 2011


On Jul 9, 2011, at 12:45 PM, Bansal, Vikas wrote:

> Dear sir,
>
> I was doing with different code that is why u did not get output  
> which I was saying.Please use this code on summary file-
>
> I have a file that is summary.txt(I have attached it) .we can read
> this file using-
>
> dfa=read.table("summar.txt",fill=T,colClasses = "character",header=T)
>
> In V10 column I have  ASCII values which I converted into decimal
> numbers using this code-
>
> dfa$V10 <- sapply(dfa$V10, function(a)  
> paste(as.integer(charToRaw(a)), collapse = ' '))
>
> now you will get this output.
>
> dfa
>    V7 V8  V9      V10
> 1    0  1   G       96
snipped
> 26   0  1   C       95
> 27   0  1   A       88
> 28   0  1   g       96
> 29   0  2  GG    92 92
> 30   0  2  GG    91 94
> 31   0  2  AT    89 94
> 32   0  2  GG    96 93
>
> the values in column V10 corresponds to A,C,G T in column V9.I want
> only those, whose score is more than 90.so output of above should be-
> V7 V8  V9      V10
> 1    0  1   G       96

snipped the easy lines
> 29   0  2  GG    92 92
> 30   0  2  GG    91 94
> 31   0  2  T       89 94
> 32   0  2  GG    96 93
>
> so in output 15th and 27th row should be deleted and 31st row should  
> be-
>
> 31   0  2  T    89 94
>
> because 89 is score for A and 94 is score for T.Therefore A has been  
> deleted because its score is less than 90.

At the moment I have a version of dfa that has the original V10 and  
another column named 'value' in the fifth position. Since apply  
removes attributes and names, functions written to work with an apply  
function need to refer to positions:

dfa$newcol <-
   apply(dfa, 1, function(x){ # create index vectors for letters in V9
          vals <- c( sapply(strsplit(x[5], " "), as.numeric))
# use paste to make them into single character string
# so they will fit back into a dataframe
       paste( unlist(
# # unlist the list of qualifying letters in third column and
           strsplit(x[3],"")[ which(vals >=90)] ),
                      collapse=" ")} )

Here's the middle of that dataframe:
 > dfa
     V7 V8  V9  V10    value  newcol
snipped
25   0  1   A    a       97       A
26   0  1   C    _       95       C
27   0  1   A    X       88
28   0  1   g    `       96       g
29   0  2  GG \\\\    92 92    G, G
30   0  2  GG   [^    91 94    G, G
31   0  2  AA   Y^    89 94       A
32   0  2  GG   `]    96 93    G, G
33   0  2  AA   a^    97 94    A, A
34   0  2  GG   ]^    93 94    G, G
35   0  2  AA  a\\    97 92    A, A
36   0  2  GG   a]    97 93    G, G
37   0  2  GG   Z]    90 93    G, G
38   0  2  GG   ]^    93 94    G, G
39   0  2  CC  W\\    87 92       C
40   0  2  CC   a]    97 93    C, C
41   0  2  TT   ``    96 96    T, T
42   0  2  GG  a\\    97 92    G, G
43   0  2  GG   ``    96 96    G, G
44   0  2  aa   aa    97 97    a, a
45   0  2  AA   a^    97 94    A, A
46   0  2  CC   b`    98 96    C, C
47   0  2  AA  _\\    95 92    A, A
48   0  2  CC   ]`    93 96    C, C
49   0  2  TT  ^\\    94 92    T, T
50   0  2  CC   Z`    90 96    C, C
51   0  2  Ac   `a    96 97    A, c
52   0  3 AAA  b`a 98 96 97 A, A, A
53   0  3 GGG  aa] 97 97 93 G, G, G
54   0  3 AAA  `[_ 96 91 95 A, A, A
55   0  3 CCC  a`_ 97 96 95 C, C, C
56   0  3 TTT  _]^ 95 93 94 T, T, T
57   0  3 CCC  aaa 97 97 97 C, C, C
58   0  3 CCC  ^a` 94 97 96 C, C, C
59   0  3 CCC  _`` 95 96 96 C, C, C
60   0  3 AAA  Z`] 90 96 93 A, A, A
61   0  3 CCC \\a] 92 97 93 C, C, C
62   0  2  GG   `_    96 95    G, G
63   0  2  GG   `Y    96 89       G
64   0  2  AA   a]    97 93    A, A
65   0  1   G    Z       90       G
66   0  1   G    _       95       G
67   0  1   T    ^       94       T
68   0  1   T    ^       94       T
69   0  1   C    Y       89
70   0  1   A   \\       92       A
71   0  1   G    ]       93       G

snipped

-- 
David.



>
> Can you help me please.
>
>
>
>
>
>
>
> Thanking you,
> Warm Regards
> Vikas Bansal
> Msc Bioinformatics
> Kings College London
> ________________________________________
> From: David Winsemius [dwinsemius at comcast.net]
> Sent: Saturday, July 09, 2011 12:04 AM
> To: Bansal, Vikas
> Cc: r-help at r-project.org
> Subject: Re: [R] For column values-Quality control
>
> On Jul 8, 2011, at 6:46 PM, Bansal, Vikas wrote:
>
>> Yes sir.you are right.after this I use this code to convert ASCII
>> values in column V10 to decimal numbers-
>>
>> dfa$V10=lapply(dfa[,4], function(c) as.numeric(charToRaw(c)))
>>
>> now u will get output something like this-
>>
>> V7 V8
>> V9                                                       V10
>> 0  1
>> G                                                        82
>> 0  1              CGT
>> c(90, 92, 96)
>> 0  1
>> GA                                                 c(78, 92)
>> 0  1              GAG
>> c(90, 92, 92)
>> 0  1
>> G                                                        88
>> 0  1
>> A                                                        96
>> 0  1              ATT
>> c(90, 96, 92)
>> 0  1
>> T                                                        94
>> 0  1
>> C                                                        97
>>
>> now after this I am facing the problem-
>>
>
> I don't think so: Here's what I getas teh top pf dfa after that
> operation:
>> str(dfa)
> 'data.frame':   111 obs. of  4 variables:
>  $ V7 : chr  "0" "0" "0" "0" ...
>  $ V8 : chr  "1" "1" "1" "1" ...
>  $ V9 : chr  "G" "T" "C" "A" ...
>  $ V10:List of 111
>   ..$ : num 96
>   ..$ : num 97
>   ..$ : num 97
>   ..$ : num 97
>   ..$ : num 95
>   ..$ : num 90
>   ..$ : num 94
>   ..$ : num 92
>   ..$ : num 90
>   ..$ : num 97
>   ..$ : num 94
>   ..$ : num 92
>   ..$ : num 95
>   ..$ : num 97
>   ..$ : num 88
>   ..$ : num 96
>   ..$ : num 97
>   ..$ : num 95
>   ..$ : num 97
>   ..$ : num 97
>   ..$ : num 97
>   ..$ : num 97
>   ..$ : num 97
>   ..$ : num 97
>   ..$ : num 97
>   ..$ : num 95
>   ..$ : num 88
>   ..$ : num 96
>   ..$ : num  92 92
>   ..$ : num  91 94
>   ..$ : num  89 94
> ,,,, more follows and output was terminated
>
> I say again/// read the Posting Guide and use dump() or dput().
>
> --
> David.
>
>
>> the values in column V10 corresponds to A,C,G T in column V9.I want
>> only those, whose score is more than 91.so output of above should be-
>>
>> V7 V8
>> V9                                                       V10
>> 0  1              GT
>> c(90, 92, 96)
>> 0  1              A
>> c(78, 92)
>> 0  1              AG
>> c(90, 92, 92)
>> 0  1
>> A                                                        96
>> 0  1              TT
>> c(90, 96, 92)
>> 0  1
>> T                                                        94
>> 0  1
>> C                                                        97
>>
>> First row should be deleted because it contains 82 which is less
>> than 91.In second row C should deleted because it has less than 91
>> score in col V10.
>>
>>
>> Thanking you,
>> Warm Regards
>> Vikas Bansal
>> Msc Bioinformatics
>> Kings College London
>> ________________________________________
>> From: David Winsemius [dwinsemius at comcast.net]
>> Sent: Friday, July 08, 2011 11:37 PM
>> To: Bansal, Vikas
>> Cc: r-help at r-project.org
>> Subject: Re: [R] For column values-Quality control
>>
>> I get something entirely different when I execute that input command
>> with the attached file:
>>
>> This is what I see as the first 14 lines for a displayed value for
>> dfa:
>>
>>> dfa
>>    V7 V8  V9  V10
>> 1    0  1   G    `
>> 2    0  1   T    a
>> 3    0  1   C    a
>> 4    0  1   A    a
>> 5    0  1   G    _
>> 6    0  1   G    Z
>> 7    0  1   C    ^
>> 8    0  1   C   \\
>> 9    0  1   A    Z
>> 10   0  1   T    a
>> 11   0  1   g    ^
>> 12   0  1   A   \\
>> 13   0  1   C    _
>> 14   0  1   G    a
>>
>> If this is different than what you see when you type dfa after input
>> of that file in that manner then you should consider alternative
>> methods of communicating an unambiguous representation of your dfa
>> object.... as I have detailed in prior private messages.
>>
>> --
>>
>> David.
>>
>> On Jul 8, 2011, at 6:10 PM, Bansal, Vikas wrote:
>>
>>>
>>> Dear all,
>>>
>>> I am really sorry for not giving the input file because in my mail,I
>>> did not explain my problem in a best way.
>>>
>>> I have a file that is summary.txt(I have attached it) .we can read
>>> this file using-
>>>
>>> dfa=read.table("summar.txt",fill=T,colClasses =  
>>> "character",header=T)
>>>
>>> In V10 column I have  ASCII values which I converted into decimal
>>> numbers using this code-
>>>
>>> dfa$V10=lapply(dfa[,4], function(c) as.numeric(charToRaw(c)))
>>>
>>> Now I have a dataframe dfa with these columns something like this-
>>>
>>> V7 V8
>>> V9                                                       V10
>>> 0  1
>>> G                                                        82
>>> 0  1              CGT
>>> c(90, 92, 96)
>>> 0  1
>>> GA                                                 c(78, 92)
>>> 0  1              GAG
>>> c(90, 92, 92)
>>> 0  1
>>> G                                                        88
>>> 0  1
>>> A                                                        96
>>> 0  1              ATT
>>> c(90, 96, 92)
>>> 0  1
>>> T                                                        94
>>> 0  1
>>> C                                                        97
>>>
>>> the values in column V10 corresponds to A,C,G T in column V9.I want
>>> only those whose score is more than 91.so output of above should be-
>>>
>>> V7 V8
>>> V9                                                       V10
>>> 0  1              GT
>>> c(90, 92, 96)
>>> 0  1              A
>>> c(78, 92)
>>> 0  1              AG
>>> c(90, 92, 92)
>>> 0  1
>>> A                                                        96
>>> 0  1              TT
>>> c(90, 96, 92)
>>> 0  1
>>> T                                                        94
>>> 0  1
>>> C                                                        97
>>>
>>> Can you please tell me the solution.
>>>
>>> Thanking you,
>>> Warm Regards
>>> Vikas Bansal
>>> Msc Bioinformatics
>>> Kings College
>>> London<summary.txt>______________________________________________
>>> R-help at r-project.org mailing list
>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
>>> and provide commented, minimal, self-contained, reproducible code.
>>
>> David Winsemius, MD
>> West Hartford, CT
>>
>
> David Winsemius, MD
> West Hartford, CT
>
> <summary.txt>

David Winsemius, MD
West Hartford, CT



More information about the R-help mailing list