[R] For column values-Quality control

David Winsemius dwinsemius at comcast.net
Sat Jul 9 22:55:07 CEST 2011


On Jul 9, 2011, at 4:38 PM, David Winsemius wrote:

>
> On Jul 9, 2011, at 12:45 PM, Bansal, Vikas wrote:
>
>> Dear sir,
>>
>> I was doing with different code that is why u did not get output  
>> which I was saying.Please use this code on summary file-
>>
>> I have a file that is summary.txt(I have attached it) .we can read
>> this file using-
>>
>> dfa=read.table("summar.txt",fill=T,colClasses = "character",header=T)
>>
>> In V10 column I have  ASCII values which I converted into decimal
>> numbers using this code-
>>
>> dfa$V10 <- sapply(dfa$V10, function(a)  
>> paste(as.integer(charToRaw(a)), collapse = ' '))
>>
>> now you will get this output.
>>
>> dfa
>>   V7 V8  V9      V10
>> 1    0  1   G       96
> snipped
>> 26   0  1   C       95
>> 27   0  1   A       88
>> 28   0  1   g       96
>> 29   0  2  GG    92 92
>> 30   0  2  GG    91 94
>> 31   0  2  AT    89 94
>> 32   0  2  GG    96 93
>>
>> the values in column V10 corresponds to A,C,G T in column V9.I want
>> only those, whose score is more than 90.so output of above should be-
>> V7 V8  V9      V10
>> 1    0  1   G       96
>
> snipped the easy lines
>> 29   0  2  GG    92 92
>> 30   0  2  GG    91 94
>> 31   0  2  T       89 94
>> 32   0  2  GG    96 93
>>
>> so in output 15th and 27th row should be deleted and 31st row  
>> should be-
>>
>> 31   0  2  T    89 94
>>
>> because 89 is score for A and 94 is score for T.Therefore A has  
>> been deleted because its score is less than 90.
>
> At the moment I have a version of dfa that has the original V10 and  
> another column named 'value' in the fifth position. Since apply  
> removes attributes and names, functions written to work with an  
> apply function need to refer to positions:

I'm not sure where I picked up that incorrect notion. You could use  
x["V9"] where I typed x[3] and c["value" where I typed x[5].

>
> dfa$newcol <-
>  apply(dfa, 1, function(x){ # create index vectors for letters in V9
>         vals <- c( sapply(strsplit(x[5], " "), as.numeric))
> # use paste to make them into single character string
> # so they will fit back into a dataframe
>      paste( unlist(
> # # unlist the list of qualifying letters in third column and
>          strsplit(x[3],"")[ which(vals >=90)] ),
>                     collapse=" ")} )

And since I want to correct a big error up above, I will mention that  
I used collapse=", " for this output:

>
> Here's the middle of that dataframe:
> > dfa
>    V7 V8  V9  V10    value  newcol
> snipped
> 25   0  1   A    a       97       A
> 26   0  1   C    _       95       C
> 27   0  1   A    X       88
> 28   0  1   g    `       96       g
> 29   0  2  GG \\\\    92 92    G, G
> 30   0  2  GG   [^    91 94    G, G
> 31   0  2  AA   Y^    89 94       A
> 32   0  2  GG   `]    96 93    G, G
> 33   0  2  AA   a^    97 94    A, A
> 34   0  2  GG   ]^    93 94    G, G
> 35   0  2  AA  a\\    97 92    A, A
> 36   0  2  GG   a]    97 93    G, G
> 37   0  2  GG   Z]    90 93    G, G
> 38   0  2  GG   ]^    93 94    G, G
> 39   0  2  CC  W\\    87 92       C
> 40   0  2  CC   a]    97 93    C, C
> 41   0  2  TT   ``    96 96    T, T
> 42   0  2  GG  a\\    97 92    G, G
> 43   0  2  GG   ``    96 96    G, G
> 44   0  2  aa   aa    97 97    a, a
> 45   0  2  AA   a^    97 94    A, A
> 46   0  2  CC   b`    98 96    C, C
> 47   0  2  AA  _\\    95 92    A, A
> 48   0  2  CC   ]`    93 96    C, C
> 49   0  2  TT  ^\\    94 92    T, T
> 50   0  2  CC   Z`    90 96    C, C
> 51   0  2  Ac   `a    96 97    A, c
> 52   0  3 AAA  b`a 98 96 97 A, A, A
> 53   0  3 GGG  aa] 97 97 93 G, G, G

snipped -- getting way too long

David Winsemius, MD
West Hartford, CT



More information about the R-help mailing list