[R] format.pval () and printCoefmat ()

Sat Dec 15 19:19:43 CET 2012

Hi Arun,

Thank you so much for further clarifications and help.

Pradip

Pradip K. Muhuri, PhD
Statistician
Substance Abuse & Mental Health Services Administration
The Center for Behavioral Health Statistics and Quality
Division of Population Surveys
1 Choke Cherry Road, Room 2-1071
Rockville, MD 20857

Tel: 240-276-1070
Fax: 240-276-1260
e-mail: Pradip.Muhuri at samhsa.hhs.gov

The Center for Behavioral Health Statistics and Quality your feedback.  Please click on the following link to complete a brief customer survey:   http://cbhsqsurvey.samhsa.gov

-----Original Message-----
From: arun [mailto:smartpink111 at yahoo.com]
Sent: Saturday, December 15, 2012 11:04 AM
To: Muhuri, Pradip (SAMHSA/CBHSQ)
Cc: R help; David Winsemius
Subject: Re: [R] format.pval () and printCoefmat ()

Hi Pradip,

It this is just formatting issue, it is possible to do that with ?formatC() or ?sprintf(), but it may change those variables from numeric to character.
One possibilty from `res`:
res<-data.frame(dat1[,1:2],read.table(text=Lines2,header=TRUE))

varsNum<-sapply(res,is.numeric)
res[varsNum]<-lapply(res[varsNum],round,digits=3)
#Here, the numeric columns with digits<3 are not changed, but the ones with >3 were all changed to digits3.

As I mentioned, sprintf() changes the number of digits
 as.data.frame(do.call(cbind,lapply(res[varsNum],function(x) sprintf("%.3f",x))))
#   mean_level1 mean_level2 rel_diff p_mean cohens_d
#1       18.700      11.910    0.574  0.000    0.175
#2       18.700      14.460    0.297  0.000    0.110
#3       18.700      13.540    0.384  0.000    0.134

A.K.

----- Original Message -----
From: "Muhuri, Pradip (SAMHSA/CBHSQ)" <Pradip.Muhuri at samhsa.hhs.gov>
To: arun <smartpink111 at yahoo.com>
Cc: R help <r-help at r-project.org>; David Winsemius <dwinsemius at comcast.net>
Sent: Saturday, December 15, 2012 10:12 AM
Subject: RE: [R] format.pval () and printCoefmat ()

Dear Arun and David,

I am so grateful to you for all your help with the code.  Thanks and regards, Pradip

Arun - All this  is very helpful.  In general, I can follow the code. I only have the following questions:

What changes in the code would be required to have 3 places after decimal for all numeric variables in the "res" data frame?

Thanks,

Pradip

####### below is the display of the data from Lines1, Lines2, and res

> head (data.frame(Lines1))
                                                 Lines1
1    mean_level1 mean_level2 rel_diff p_mean cohens_d
2 1       18.744      11.911    0.574   0.00    0.175
3 2       18.744      14.455    0.297   0.00    0.110
4 3       18.744      13.540    0.384   0.00    0.133
5 4       18.744       6.002    2.123   0.00    0.333
6 5       18.744       5.834    2.213   0.00    0.349
> head (data.frame(Lines2))
                                               Lines2
1    mean_level1 mean_level2 rel_diff p_mean cohens_d
2 1       18.744      11.911    0.574   0.00    0.175
3 2       18.744      14.455    0.297   0.00    0.110
4 3       18.744      13.540    0.384   0.00    0.133
5 4       18.744       6.002    2.123   0.00    0.333
6 5       18.744       5.834    2.213   0.00    0.349
> head (res)
  contrast_level1 contrast_level2 mean_level1 mean_level2 rel_diff p_mean cohens_d
1              wh            2+hi        18.7       11.91    0.574      0    0.175
2              wh            2+rc        18.7       14.46    0.297      0    0.110
3              wh            aian        18.7       13.54    0.384      0    0.133
4              wh            asan        18.7        6.00    2.123      0    0.333
5              wh            blck        18.7        5.83    2.213      0    0.349
6              wh            csam        18.7        7.93    1.363      0    0.279

________________________________________
From: arun [smartpink111 at yahoo.com]
Sent: Friday, December 14, 2012 10:12 PM
To: Muhuri, Pradip (SAMHSA/CBHSQ)
Cc: R help; David Winsemius
Subject: Re: [R] format.pval () and printCoefmat ()

Hi Pradip,

May be this helps:
dat1<-read.table(text="
contrast_level1 contrast_level2 mean_level1 mean_level2 rel_diff    p_mean cohens_d
1              wh            2+hi        18.7      11.91    0.574  1.64e-05  0.1753
2              wh            2+rc        18.7      14.46    0.297  9.24e-06  0.1101
3              wh            aian        18.7      13.54    0.384  9.01e-05  0.1335
4              wh            asan        18.7        6.00    2.123 2.20e-119  0.3326
5              wh            blck        18.7        5.83    2.213  0.00e+00  0.3490
6              wh            csam        18.7        7.93    1.363  1.27e-47  0.2793
7              wh            cub        18.7      10.85    0.728  6.12e-08  0.2025
8              wh            dmcn        18.7        7.13    1.629  1.59e-15  0.2981
9              wh            hisp        18.7        9.72    0.928 3.27e-125  0.2420
10              wh            mex        18.7        9.60    0.952 8.81e-103  0.2420
11              wh            nhpi        18.7      16.14    0.162  1.74e-01  0.0669
12              wh            othh        18.7          NA      NA        NA      NA
13              wh              pr        18.7      10.47    0.791  3.64e-23  0.2131
14              wh            spn        18.7      15.15    0.237  1.58e-02  0.0922
",sep="",header=TRUE,stringsAsFactors=FALSE)
Lines1<-capture.output(printCoefmat(dat1[,-c(1:2)],has.Pvalue=TRUE,eps.Pvalue=0.001))
Lines2<-gsub("\\s+$","",gsub("\\.$","",Lines1[1:15]))
res<-data.frame(dat1[,1:2],read.table(text=Lines2,header=TRUE))
#or
# res<-cbind(dat1[,1:2],read.table(text=Lines2,header=TRUE))

res
#   contrast_level1 contrast_level2 mean_level1 mean_level2 rel_diff p_mean
#1               wh            2+hi        18.7       11.91    0.574 0.0000
#2               wh            2+rc        18.7       14.46    0.297 0.0000
#3               wh            aian        18.7       13.54    0.384 0.0001
-------------------------------------------------------------

----------------------------------------------------------

# cohens_d
#1    0.1753
#2    0.1101
#3    0.1335
-------------------------------------------------
-------------------------------------------------

str(res)
#'data.frame':    14 obs. of  7 variables:
# $ contrast_level1: chr  "wh" "wh" "wh" "wh" ...
# $ contrast_level2: chr  "2+hi" "2+rc" "aian" "asan" ...
# $ mean_level1    : num  18.7 18.7 18.7 18.7 18.7 18.7 18.7 18.7 18.7 18.7 ...
# $ mean_level2    : num  11.91 14.46 13.54 6 5.83 ...
# $ rel_diff       : num  0.574 0.297 0.384 2.123 2.213 ...
# $ p_mean         : num  0e+00 0e+00 1e-04 0e+00 0e+00 0e+00 0e+00 0e+00 0e+00 0e+00 ...
# $ cohens_d       : num  0.175 0.11 0.134 0.333 0.349 ...

A.K.

----- Original Message -----
From: "Muhuri, Pradip (SAMHSA/CBHSQ)" <Pradip.Muhuri at samhsa.hhs.gov>
To: 'David Winsemius' <dwinsemius at comcast.net>
Cc: R help <r-help at r-project.org>
Sent: Friday, December 14, 2012 5:18 PM
Subject: Re: [R] format.pval () and printCoefmat ()

Hi David,

Thank you so much for helping me with the code.

Your suggested code gives me the following results. Please see below. I don't understand why I am getting two  blocks of prints (5 columns, and then 7 columns), with some columns repeated.

Regards,

Pradip
#############################

> cbind(  y0410_1825_mf_alc[ 1:2],
+         printCoefmat(y0410_1825_mf_alc[ -(1:2) ], has.Pvalue=TRUE, eps.Pvalue=0.0001)
+ )
   mean_level1 mean_level2 rel_diff p_mean cohens_d
1       18.744      11.911    0.574   0.00    0.175
2       18.744      14.455    0.297   0.00    0.110
3       18.744      13.540    0.384   0.00    0.133
4       18.744       6.002    2.123   0.00    0.333
5       18.744       5.834    2.213   0.00    0.349
6       18.744       7.933    1.363   0.00    0.279
7       18.744      10.849    0.728   0.00    0.203
8       18.744       7.130    1.629   0.00    0.298
9       18.744       9.720    0.928   0.00    0.242
10      18.744       9.600    0.952   0.00    0.242
11      18.744      16.135    0.162   0.17    0.067 .
12      18.744          NA       NA     NA       NA
13      18.744      10.465    0.791   0.00    0.213
14      18.744      15.149    0.237   0.02    0.092 .
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
   contrast_level1 contrast_level2 mean_level1 mean_level2 rel_diff    p_mean cohens_d
1               wh            2+hi        18.7       11.91    0.574  1.64e-05   0.1753
2               wh            2+rc        18.7       14.46    0.297  9.24e-06   0.1101
3               wh            aian        18.7       13.54    0.384  9.01e-05   0.1335
4               wh            asan        18.7        6.00    2.123 2.20e-119   0.3326
5               wh            blck        18.7        5.83    2.213  0.00e+00   0.3490
6               wh            csam        18.7        7.93    1.363  1.27e-47   0.2793
7               wh             cub        18.7       10.85    0.728  6.12e-08   0.2025
8               wh            dmcn        18.7        7.13    1.629  1.59e-15   0.2981
9               wh            hisp        18.7        9.72    0.928 3.27e-125   0.2420
10              wh             mex        18.7        9.60    0.952 8.81e-103   0.2420
11              wh            nhpi        18.7       16.14    0.162  1.74e-01   0.0669
12              wh            othh        18.7          NA       NA        NA       NA
13              wh              pr        18.7       10.47    0.791  3.64e-23   0.2131
14              wh             spn        18.7       15.15    0.237  1.58e-02   0.0922

Pradip K. Muhuri, PhD
Statistician
Substance Abuse & Mental Health Services Administration
The Center for Behavioral Health Statistics and Quality
Division of Population Surveys
1 Choke Cherry Road, Room 2-1071
Rockville, MD 20857

Tel: 240-276-1070
Fax: 240-276-1260
e-mail: Pradip.Muhuri at samhsa.hhs.gov

The Center for Behavioral Health Statistics and Quality your feedback.  Please click on the following link to complete a brief customer survey:  http://cbhsqsurvey.samhsa.gov

-----Original Message-----
From: David Winsemius [mailto:dwinsemius at comcast.net]
Sent: Friday, December 14, 2012 3:22 PM
To: Muhuri, Pradip (SAMHSA/CBHSQ)
Cc: R help
Subject: Re: [R] format.pval () and printCoefmat ()

On Dec 14, 2012, at 11:48 AM, Muhuri, Pradip (SAMHSA/CBHSQ) wrote:

> Hi List,
>
> My goal is to force R not to print in scientific notation in the sixth column (rel_diff - for the p-value) of my data frame (not a matrix).
>
> I have used the format.pval () and printCoefmat () functions on the data frame. The R script is appended below.
>
> This issue is that use of the format.pval () and printCoefmat () functions on the data frame gives me the desired results, but coerces the character string into NAs for the two character variables, because my object is a data frame, not a matrix. Please see the first output below: contrast_level1 contrast_level2).
>
> Is there a way I could have avoid printing the NAs in the character fields

They are probably factor columns.

> when using the format.pval () and printCoefmat () on the data frame?
>
> I would appreciate receiving your help.
>
> Thanks,
>
> Pradip
> setwd ("F:/PR1/R_PR1")
>
> load (file = "sigtests_overall_withid.rdata")
>
> #format.pval(tt$p.value, eps=0.0001)
>
> # keep only selected columns from the above data frame
> keep_cols1 <- c("contrast_level1", "contrast_level2","mean_level1",
>                "mean_level2", "rel_diff",
>                  "p_mean", "cohens_d")
>
> #subset the data frame
> y0410_1825_mf_alc <- subset (sigtests_overall_withid,
>                          years=="0410" & age_group=="1825"
>                          & gender_group=="all" & drug=="alc"
>                          & contrast_level1=="wh",
>                          select=keep_cols1)
> #change the row.names
> row.names (y0410_1825_mf_alc)= 1:dim(y0410_1825_mf_alc)[1]
>
> #force
> format.pval(y0410_1825_mf_alc$p_mean, eps=0.0001)

Presumably that call will produce desired results since it is on only one column. (I'm not sure why you think format.pval contributed to your NA output.)

>
> #print the observations from the sub-data frame
> options (width=120,digits=3 )
> #y0410_1825_mf_alc
>
> printCoefmat(y0410_1825_mf_alc, has.Pvalue=TRUE, eps.Pvalue=0.0001)

Why not use `cbind.data.frame` rather than trying to get `printCoefmat` to do something it (apparently) wasn't designed to do?

cbind(  y0410_1825_mf_alc[ 1:2],
        printCoefmat(y0410_1825_mf_alc[ -(1:2) ], has.Pvalue=TRUE, eps.Pvalue=0.0001)
     )

--
David.

>
> ####################### When format.pval () and printCoefmat () used
>
>
> contrast_level1 contrast_level2 mean_level1 mean_level2 rel_diff p_mean cohens_d
>
> 1               NA              NA      18.744      11.911    0.574   0.00    0.175
> 2               NA              NA      18.744      14.455    0.297   0.00    0.110
> 3               NA              NA      18.744      13.540    0.384   0.00    0.133
> 4               NA              NA      18.744       6.002    2.123   0.00    0.333
> 5               NA              NA      18.744       5.834    2.213   0.00    0.349
> 6               NA              NA      18.744       7.933    1.363   0.00    0.279
> 7               NA              NA      18.744      10.849    0.728   0.00    0.203
> 8               NA              NA      18.744       7.130    1.629   0.00    0.298
> 9               NA              NA      18.744       9.720    0.928   0.00    0.242
> 10              NA              NA      18.744       9.600    0.952   0.00    0.242
> 11              NA              NA      18.744      16.135    0.162   0.17    0.067 .
> 12              NA              NA      18.744          NA       NA     NA       NA
> 13              NA              NA      18.744      10.465    0.791   0.00    0.213
> 14              NA              NA      18.744      15.149    0.237   0.02    0.092 .
> ---
> Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
> Warning messages:
> 1: In data.matrix(x) : NAs introduced by coercion
> 2: In data.matrix(x) : NAs introduced by coercion
>
> ####################### When format.pval () and printCoefmat () not used
>
> contrast_level1 contrast_level2 mean_level1 mean_level2 rel_diff    p_mean cohens_d
> 1               wh            2+hi        18.7       11.91    0.574  1.64e-05   0.1753
> 2               wh            2+rc        18.7       14.46    0.297  9.24e-06   0.1101
> 3               wh            aian        18.7       13.54    0.384  9.01e-05   0.1335
> 4               wh            asan        18.7        6.00    2.123 2.20e-119   0.3326
> 5               wh            blck        18.7        5.83    2.213  0.00e+00   0.3490
> 6               wh            csam        18.7        7.93    1.363  1.27e-47   0.2793
> 7               wh             cub        18.7       10.85    0.728  6.12e-08   0.2025
> 8               wh            dmcn        18.7        7.13    1.629  1.59e-15   0.2981
> 9               wh            hisp        18.7        9.72    0.928 3.27e-125   0.2420
> 10              wh             mex        18.7        9.60    0.952 8.81e-103   0.2420
> 11              wh            nhpi        18.7       16.14    0.162  1.74e-01   0.0669
> 12              wh            othh        18.7          NA       NA        NA       NA
> 13              wh              pr        18.7       10.47    0.791  3.64e-23   0.2131
> 14              wh             spn        18.7       15.15    0.237  1.58e-02   0.0922
>
>
>
> Pradip K. Muhuri, PhD
>

David Winsemius
Alameda, CA, USA

______________________________________________
R-help at r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.