[R] Matrix nesting (was Re: Obtaining summary of frequencies of value occurrences for a variable in a multivariate dataset.)
Allan Kamau
kamauallan at yahoo.com
Mon Jul 30 16:22:30 CEST 2007
Success, thanks Patrick. Below is the final matrix construction code.
x=list()
x[length(myVariableNames)]<-NA
names(x)<-names(x.val)
for (i in myVariableNames){
residues=names(x.val[[i]])
residuesFrequencies=as.vector(x.val[[i]])
someList=list()
names(residuesFrequencies)=residues
someList<-list(frequency=residuesFrequencies)
x[i]<-someList
}
#The output
> x[16:18]
$PR12
I
10
$PR13
K R
8 2
$PR14
I V
2 8
>
----- Original Message ----
From: Patrick Burns <pburns at pburns.seanet.com>
To: Allan Kamau <kamauallan at yahoo.com>
Sent: Monday, July 30, 2007 12:01:32 PM
Subject: Re: [R] Matrix nesting (was Re: Obtaining summary of frequencies of value occurrences for a variable in a multivariate dataset.)
I think you want your main matrix to be of mode
list. S Poetry talks about this some.
Patrick Burns
patrick at burns-stat.com
+44 (0)20 8525 0696
http://www.burns-stat.com
(home of S Poetry and "A Guide for the Unwilling S User")
Allan Kamau wrote:
>Hi
>
>
>
>
>
>
>
> <!--
> @page { size: 21cm 29.7cm; margin: 2cm }
> P { margin-bottom: 0.21cm }
> -->
>
>
>I would like to nest matrices, is there
>a way of doing so, I am getting “number of items to replace is not
>a multiple of replacement length” errors (probably R is trying to
>flatten the matrix into a vector and complains if the vector is
>larger than 1 element during the insert)
>
>I have a matrix (see below) in which I
>would like to place one other matrices in to each k[2,i] position
>(where i is value between 1 to 4)
>
>Why – each value in k[1,i] may
>represent several (1or more) key-value results which I would like to
>capture in the corresponding k[2,i] element.
>
>
>
>
>
>
>
>>k
>>
>>
>
> [,1] [,2] [,3]
>[,4]
>
>myVariableNames "PR10" "PR11"
>"PR12" "PR13"
>
>x2 "0" "0"
> "0" "0"
>
>
>
>
>
>
>
>
>
>
>
>Allan.
>
>
>
>----- Original Message ----
>From: Allan Kamau <kamauallan at yahoo.com>
>To: jim holtman <jholtman at gmail.com>
>Cc: r-help at stat.math.ethz.ch
>Sent: Saturday, July 28, 2007 2:48:47 PM
>Subject: Re: [R] Obtaining summary of frequencies of value occurrences for a variable in a multivariate dataset.
>
>Hi Jim,
>The problem description.
>I am trying to identify mutations in a given gene from
>a particular genome (biological genome sequence).
>I have two CSV files consisting of sequences. One file
>consists of reference (documented,curated accepted as
>standard) sequences. The other consists of sample
>sequences I am trying to identify mutations within. In
>both files the an individual sequence is contained in
>a single record, it’s amino acid residues ( the actual
>sequence of alphabets each representing a given amino
>acid for example “A” stands for “Alanine”, “C” for
>Cysteine and so on) are each allocated a single field
>in the CSV file.
>The sequences in both files have been well aligned,
>each contain 115 residues with the first residue is
>contained in the field 5. The fields 1 to 4 are
>allocated for metadata (name of sequence and so on).
>My task is to compile a residue occurrence count for
>each residue present in a given field in the reference
>sequence dataset and use this information when reading
>each sequence in the sample dataset to identify a
>mutation. For example for position 9 of the sample
>sequence “bb” a “P” is found and according to our
>reference sequence dataset of summaries, at position 9
>“P” may not even exist or may have an occurrence of
>10% or so will be classified as mutation, (I could
>employ a cut of parameter for mutation
>classification).
>
>
>Allan.
>
>--- jim holtman <jholtman at gmail.com> wrote:
>
>
>
>>results=()#character()
>>myVariableNames=names(x.val)
>>results[length(myVariableNames)]<-NA
>>
>>for (i in myVariableNames){
>> results[i]<-names(x.val[[i]]) # this does not
>>work it returns a
>>NULL (how can i convert this to x.val$"somevalue" ?
>>)
>>}
>>
>>
>>
>>On 7/27/07, Allan Kamau <kamauallan at yahoo.com>
>>wrote:
>>
>>
>>>Hi All,
>>>I am having difficulties finding a way to find a
>>>
>>>
>>substitute to the command "names(v.val$PR14)" so
>>that I could generate the command on the fly for all
>>PR14 to PR200 (please see the previous discussion
>>below to understand what the object x.val contains)
>>. I have tried the following
>>
>>
>>>>results=()#character()
>>>>myVariableNames=names(x.val)
>>>>results[length(myVariableNames)]<-NA
>>>>
>>>>
>>>>for
>>>>
>>>>
>>as.vector(unlist(strsplit(str,",")),mode="list")
>>
>>
>>>+ results[i]<-names(x.val$i) # this does not
>>>
>>>
>>work it returns a NULL (how can i convert this to
>>x.val$"somevalue" ? )
>>
>>
>>>>}
>>>>
>>>>
>>>Allan.
>>>
>>>
>>>----- Original Message ----
>>>From: Allan Kamau <kamauallan at yahoo.com>
>>>To: r-help at stat.math.ethz.ch
>>>Sent: Thursday, July 26, 2007 10:03:17 AM
>>>Subject: Re: [R] Obtaining summary of frequencies
>>>
>>>
>>of value occurrences for a variable in a
>>multivariate dataset.
>>
>>
>>>Thanks so much Jim, Andaikalavan, Gabor and others
>>>
>>>
>>for the help and suggestions.
>>
>>
>>>The solution will result in a matrix containing
>>>
>>>
>>nested matrices to enable each variable name, each
>>variables distinct value and the count of the
>>distinct value to be accessible individually.
>>
>>
>>>The main matrix will contain the variable names,
>>>
>>>
>>the first level nested matrices will consist of the
>>variables unique values, and each such variable
>>entry will contain a one element vector to contain
>>the count or occurrence frequency.
>>
>>
>>>This matrix can now be used in comparing other
>>>
>>>
>>similar datasets for variable values and their
>>frequencies.
>>
>>
>>>Building on the input received so far, a probable
>>>
>>>
>>solution in building the matrix will include the
>>following.
>>
>>
>>>1)I reading the csv file (containing column
>>>
>>>
>>headers)
>>
>>
>>my_data=read.table("<path/to/my/data.csv>",header=TRUE,sep=",",dec=".",fill=TRUE)
>>
>>
>>>2)I group the values in each variable producing an
>>>
>>>
>>occurrence count(frequency)
>>
>>
>>>>x.val<-apply(my_data,2,table)
>>>>
>>>>
>>>3)I obtain a vector of the names of the variables
>>>
>>>
>>in the table
>>
>>
>>>>names(x.val)
>>>>
>>>>
>>>4)Now I make use of the names (obtained in step 3)
>>>
>>>
>>to obtain a vector of distinct values in a given
>>variable (in the example below the variable name is
>>$PR14)
>>
>>
>>>>names(v.val$PR14)
>>>>
>>>>
>>>5)I obtain a vector (with one element) of the
>>>
>>>
>>frequency of a value obtained from the step above
>>(in our example the value is "V")
>>
>>
>>>>as.vector(x.val$PR14["V"])
>>>>
>>>>
>>>Todo:
>>>Now I will need to place the steps above in a
>>>
>>>
>>script (consisting of loops) to build the matrix,
>>step 4 and 5 seem tricky to do programatically.
>>
>>
>>>Allan.
>>>
>>>
>>>----- Original Message ----
>>>From: jim holtman <jholtman at gmail.com>
>>>To: Allan Kamau <kamauallan at yahoo.com>
>>>Cc: Adaikalavan Ramasamy <ramasamy at cancer.org.uk>;
>>>
>>>
>>r-help at stat.math.ethz.ch
>>
>>
>>>Sent: Wednesday, July 25, 2007 1:50:55 PM
>>>Subject: Re: [R] Obtaining summary of frequencies
>>>
>>>
>>of value occurrences for a variable in a
>>multivariate dataset.
>>
>>
>>>Also if you want to access the individual values,
>>>
>>>
>>you can just leave
>>
>>
>>>it as a list:
>>>
>>>
>>>
>>>>x.val <- apply(x, 2, table)
>>>># access each value
>>>>x.val$PR14["V"]
>>>>
>>>>
>>>V
>>>8
>>>
>>>
>>>
>>>On 7/25/07, Allan Kamau <kamauallan at yahoo.com>
>>>
>>>
>>wrote:
>>
>>
>>>>A subset of the data looks as follows
>>>>
>>>>
>>>>
>>>>>df[1:10,14:20]
>>>>>
>>>>>
>>>> PR10 PR11 PR12 PR13 PR14 PR15 PR16
>>>>1 V T I K V G D
>>>>2 V S I K V G G
>>>>3 V T I R V G G
>>>>4 V S I K I G G
>>>>5 V S I K V G G
>>>>6 V S I R V G G
>>>>7 V T I K I G G
>>>>8 V S I K V E G
>>>>9 V S I K V G G
>>>>10 V S I K V G G
>>>>
>>>>The result I would like is as follows
>>>>
>>>>PR10 PR11 PR12 ...
>>>>[V:10] [S:7,T:3] [I:10]
>>>>
>>>>The result can be in a matrix or a vector and
>>>>
>>>>
>>each variablename, value and frequency should be
>>accessible so as to be used for comparisons with
>>another dataset later.
>>
>>
>>>>The frequency can be a count or a percentage.
>>>>
>>>>
>>>>Allan.
>>>>
>>>>
>>>>----- Original Message ----
>>>>From: Adaikalavan Ramasamy
>>>>
>>>>
>><ramasamy at cancer.org.uk>
>>
>>
>>>>To: Allan Kamau <kamauallan at yahoo.com>
>>>>Cc: r-help at stat.math.ethz.ch
>>>>Sent: Tuesday, July 24, 2007 10:21:51 PM
>>>>Subject: Re: [R] Obtaining summary of
>>>>
>>>>
>>frequencies of value occurrences for a variable in a
>>multivariate dataset.
>>
>>
>>>>The name of the table should give you the
>>>>
>>>>
>>"value". And if you have a
>>
>>
>>>>matrix, you just need to convert it into a
>>>>
>>>>
>>vector first.
>>
>>
>>>> > m <- matrix( LETTERS[ c(1:3, 3:5, 2:4) ],
>>>>
>>>>
>>nc=3 )
>>
>>
>>>> > m
>>>> [,1] [,2] [,3]
>>>>[1,] "A" "C" "B"
>>>>[2,] "B" "D" "C"
>>>>[3,] "C" "E" "D"
>>>> > tb <- table( as.vector(m) )
>>>> > tb
>>>>
>>>>A B C D E
>>>>1 2 3 2 1
>>>> > paste( names(tb), ":", tb, sep="" )
>>>>[1] "A:1" "B:2" "C:3" "D:2" "E:1"
>>>>
>>>>If this is not what you want, then please give a
>>>>
>>>>
>>simple example.
>>
>>
>>>>Regards, Adai
>>>>
>>>>
>>>>
>>>>Allan Kamau wrote:
>>>>
>>>>
>>>>>Hi all,
>>>>>If the question below as been answered before
>>>>>
>>>>>
>>I
>>
>>
>>>>>apologize for the posting.
>>>>>I would like to get the frequencies of
>>>>>
>>>>>
>>occurrence of
>>
>>
>>>>>all values in a given variable in a
>>>>>
>>>>>
>>multivariate
>>
>>
>>>>>dataset. In short for each variable (or field)
>>>>>
>>>>>
>>a
>>
>>
>>>>>summary of values contained with in a
>>>>>
>>>>>
>>value:frequency
>>
>>
>>
>=== message truncated ===
>
>
>
>
>____________________________________________________________________________________
>
>
>
>______________________________________________
>R-help at stat.math.ethz.ch mailing list
>https://stat.ethz.ch/mailman/listinfo/r-help
>PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
>and provide commented, minimal, self-contained, reproducible code.
>
>______________________________________________
>R-help at stat.math.ethz.ch mailing list
>https://stat.ethz.ch/mailman/listinfo/r-help
>PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
>and provide commented, minimal, self-contained, reproducible code.
>
>
>
>
More information about the R-help
mailing list