[R] Obtaining summary of frequencies of value occurrences for a variable in a multivariate dataset.

jim holtman jholtman at gmail.com
Fri Jul 27 14:13:12 CEST 2007


results=()#character()
myVariableNames=names(x.val)
results[length(myVariableNames)]<-NA

for (i in myVariableNames){
    results[i]<-names(x.val[[i]])    # this does not work it returns a
NULL (how can i convert this to x.val$"somevalue" ? )
}



On 7/27/07, Allan Kamau <kamauallan at yahoo.com> wrote:
> Hi All,
> I am having difficulties finding a way to find a substitute to the command "names(v.val$PR14)" so that I could generate the command on the fly for all PR14 to PR200 (please see the previous discussion below to understand what the object x.val contains) . I have tried the following
>
> >results=()#character()
> >myVariableNames=names(x.val)
> >results[length(myVariableNames)]<-NA
>
> >for as.vector(unlist(strsplit(str,",")),mode="list")
> +    results[i]<-names(x.val$i)    # this does not work it returns a NULL (how can i convert this to x.val$"somevalue" ? )
> >}
>
> Allan.
>
>
> ----- Original Message ----
> From: Allan Kamau <kamauallan at yahoo.com>
> To: r-help at stat.math.ethz.ch
> Sent: Thursday, July 26, 2007 10:03:17 AM
> Subject: Re: [R] Obtaining summary of frequencies of value occurrences for a variable in a multivariate dataset.
>
> Thanks so much Jim, Andaikalavan, Gabor and others for the help and suggestions.
> The solution will result in a matrix containing nested matrices to enable each variable name, each variables distinct value and the count of the distinct value to be accessible individually.
> The main matrix will contain the variable names, the first level nested matrices will consist of the variables unique values, and each such variable entry will contain a one element vector to contain the count or occurrence frequency.
> This matrix can now be used in comparing other similar datasets for variable values and their frequencies.
>
> Building on the input received so far, a probable solution in building the matrix will include the following.
>
>
> 1)I reading the csv file (containing column headers)
> >my_data=read.table("<path/to/my/data.csv>",header=TRUE,sep=",",dec=".",fill=TRUE)
>
> 2)I group the values in each variable producing an occurrence count(frequency)
> >x.val<-apply(my_data,2,table)
>
> 3)I obtain a vector of the names of the variables in the table
> >names(x.val)
>
> 4)Now I make use of the names (obtained in step 3) to obtain a vector of distinct values in a given variable (in the example below the variable name is $PR14)
> >names(v.val$PR14)
>
> 5)I obtain a vector (with one element) of the frequency of a value obtained from the step above (in our example the value is "V")
> >as.vector(x.val$PR14["V"])
>
> Todo:
> Now I will need to place the steps above in a script (consisting of loops) to build the matrix, step 4 and 5 seem tricky to do programatically.
>
> Allan.
>
>
> ----- Original Message ----
> From: jim holtman <jholtman at gmail.com>
> To: Allan Kamau <kamauallan at yahoo.com>
> Cc: Adaikalavan Ramasamy <ramasamy at cancer.org.uk>; r-help at stat.math.ethz.ch
> Sent: Wednesday, July 25, 2007 1:50:55 PM
> Subject: Re: [R] Obtaining summary of frequencies of value occurrences for a variable in a multivariate dataset.
>
> Also if you want to access the individual values, you can just leave
> it as a list:
>
> > x.val <- apply(x, 2, table)
> > # access each value
> > x.val$PR14["V"]
> V
> 8
>
>
>
> On 7/25/07, Allan Kamau <kamauallan at yahoo.com> wrote:
> > A subset of the data looks as follows
> >
> > > df[1:10,14:20]
> >   PR10 PR11 PR12 PR13 PR14 PR15 PR16
> > 1     V    T    I    K    V    G    D
> > 2     V    S    I    K    V    G    G
> > 3     V    T    I    R    V    G    G
> > 4     V    S    I    K    I    G    G
> > 5     V    S    I    K    V    G    G
> > 6     V    S    I    R    V    G    G
> > 7     V    T    I    K    I    G    G
> > 8     V    S    I    K    V    E    G
> > 9     V    S    I    K    V    G    G
> > 10    V    S    I    K    V    G    G
> >
> > The result I would like is as follows
> >
> > PR10        PR11          PR12   ...
> > [V:10]    [S:7,T:3]    [I:10]
> >
> > The result can be in a matrix or a vector and each variablename, value and frequency should be accessible so as to be used for comparisons with another dataset later.
> > The frequency can be a count or a percentage.
> >
> >
> > Allan.
> >
> >
> > ----- Original Message ----
> > From: Adaikalavan Ramasamy <ramasamy at cancer.org.uk>
> > To: Allan Kamau <kamauallan at yahoo.com>
> > Cc: r-help at stat.math.ethz.ch
> > Sent: Tuesday, July 24, 2007 10:21:51 PM
> > Subject: Re: [R] Obtaining summary of frequencies of value occurrences for a variable in a multivariate dataset.
> >
> > The name of the table should give you the "value". And if you have a
> > matrix, you just need to convert it into a vector first.
> >
> >  > m <- matrix( LETTERS[ c(1:3, 3:5, 2:4) ], nc=3 )
> >  > m
> >      [,1] [,2] [,3]
> > [1,] "A"  "C"  "B"
> > [2,] "B"  "D"  "C"
> > [3,] "C"  "E"  "D"
> >  > tb <- table( as.vector(m) )
> >  > tb
> >
> > A B C D E
> > 1 2 3 2 1
> >  > paste( names(tb), ":", tb, sep="" )
> > [1] "A:1" "B:2" "C:3" "D:2" "E:1"
> >
> > If this is not what you want, then please give a simple example.
> >
> > Regards, Adai
> >
> >
> >
> > Allan Kamau wrote:
> > > Hi all,
> > > If the question below as been answered before I
> > > apologize for the posting.
> > > I would like to get the frequencies of occurrence of
> > > all values in a given variable in a multivariate
> > > dataset. In short for each variable (or field) a
> > > summary of values contained with in a value:frequency
> > > pair, there can be many such pairs for a given
> > > variable. I would like to do the same for several such
> > > variables.
> > > I have used table() but am unable to extract the
> > > individual value and frequency values.
> > > Please advise.
> > >
> > > Allan.
> > >
> > > ______________________________________________
> > > R-help at stat.math.ethz.ch mailing list
> > > https://stat.ethz.ch/mailman/listinfo/r-help
> > > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> > > and provide commented, minimal, self-contained, reproducible code.
> > >
> > >
> > >
> >
> > ______________________________________________
> > R-help at stat.math.ethz.ch mailing list
> > https://stat.ethz.ch/mailman/listinfo/r-help
> > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> > and provide commented, minimal, self-contained, reproducible code.
> >
>
>
> --
> Jim Holtman
> Cincinnati, OH
> +1 513 646 9390
>
> What is the problem you are trying to solve?
>
> ______________________________________________
> R-help at stat.math.ethz.ch mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>
> ______________________________________________
> R-help at stat.math.ethz.ch mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>


-- 
Jim Holtman
Cincinnati, OH
+1 513 646 9390

What is the problem you are trying to solve?



More information about the R-help mailing list