[R] For column values-Quality control
David Winsemius
dwinsemius at comcast.net
Sat Jul 9 22:55:07 CEST 2011
On Jul 9, 2011, at 4:38 PM, David Winsemius wrote:
>
> On Jul 9, 2011, at 12:45 PM, Bansal, Vikas wrote:
>
>> Dear sir,
>>
>> I was doing with different code that is why u did not get output
>> which I was saying.Please use this code on summary file-
>>
>> I have a file that is summary.txt(I have attached it) .we can read
>> this file using-
>>
>> dfa=read.table("summar.txt",fill=T,colClasses = "character",header=T)
>>
>> In V10 column I have ASCII values which I converted into decimal
>> numbers using this code-
>>
>> dfa$V10 <- sapply(dfa$V10, function(a)
>> paste(as.integer(charToRaw(a)), collapse = ' '))
>>
>> now you will get this output.
>>
>> dfa
>> V7 V8 V9 V10
>> 1 0 1 G 96
> snipped
>> 26 0 1 C 95
>> 27 0 1 A 88
>> 28 0 1 g 96
>> 29 0 2 GG 92 92
>> 30 0 2 GG 91 94
>> 31 0 2 AT 89 94
>> 32 0 2 GG 96 93
>>
>> the values in column V10 corresponds to A,C,G T in column V9.I want
>> only those, whose score is more than 90.so output of above should be-
>> V7 V8 V9 V10
>> 1 0 1 G 96
>
> snipped the easy lines
>> 29 0 2 GG 92 92
>> 30 0 2 GG 91 94
>> 31 0 2 T 89 94
>> 32 0 2 GG 96 93
>>
>> so in output 15th and 27th row should be deleted and 31st row
>> should be-
>>
>> 31 0 2 T 89 94
>>
>> because 89 is score for A and 94 is score for T.Therefore A has
>> been deleted because its score is less than 90.
>
> At the moment I have a version of dfa that has the original V10 and
> another column named 'value' in the fifth position. Since apply
> removes attributes and names, functions written to work with an
> apply function need to refer to positions:
I'm not sure where I picked up that incorrect notion. You could use
x["V9"] where I typed x[3] and c["value" where I typed x[5].
>
> dfa$newcol <-
> apply(dfa, 1, function(x){ # create index vectors for letters in V9
> vals <- c( sapply(strsplit(x[5], " "), as.numeric))
> # use paste to make them into single character string
> # so they will fit back into a dataframe
> paste( unlist(
> # # unlist the list of qualifying letters in third column and
> strsplit(x[3],"")[ which(vals >=90)] ),
> collapse=" ")} )
And since I want to correct a big error up above, I will mention that
I used collapse=", " for this output:
>
> Here's the middle of that dataframe:
> > dfa
> V7 V8 V9 V10 value newcol
> snipped
> 25 0 1 A a 97 A
> 26 0 1 C _ 95 C
> 27 0 1 A X 88
> 28 0 1 g ` 96 g
> 29 0 2 GG \\\\ 92 92 G, G
> 30 0 2 GG [^ 91 94 G, G
> 31 0 2 AA Y^ 89 94 A
> 32 0 2 GG `] 96 93 G, G
> 33 0 2 AA a^ 97 94 A, A
> 34 0 2 GG ]^ 93 94 G, G
> 35 0 2 AA a\\ 97 92 A, A
> 36 0 2 GG a] 97 93 G, G
> 37 0 2 GG Z] 90 93 G, G
> 38 0 2 GG ]^ 93 94 G, G
> 39 0 2 CC W\\ 87 92 C
> 40 0 2 CC a] 97 93 C, C
> 41 0 2 TT `` 96 96 T, T
> 42 0 2 GG a\\ 97 92 G, G
> 43 0 2 GG `` 96 96 G, G
> 44 0 2 aa aa 97 97 a, a
> 45 0 2 AA a^ 97 94 A, A
> 46 0 2 CC b` 98 96 C, C
> 47 0 2 AA _\\ 95 92 A, A
> 48 0 2 CC ]` 93 96 C, C
> 49 0 2 TT ^\\ 94 92 T, T
> 50 0 2 CC Z` 90 96 C, C
> 51 0 2 Ac `a 96 97 A, c
> 52 0 3 AAA b`a 98 96 97 A, A, A
> 53 0 3 GGG aa] 97 97 93 G, G, G
snipped -- getting way too long
David Winsemius, MD
West Hartford, CT
More information about the R-help
mailing list