[R] Interpreting the output of str on a data frame created using aggregate function
@vi@e@gross m@iii@g oii gm@ii@com
@vi@e@gross m@iii@g oii gm@ii@com
Sat Jan 25 02:49:00 CET 2025
John,
Others have helped educate you on your initial guesses and I just want to
add that "str()" is not the only function people use to see their data.
People often want to also look at what class(es) an object has or the names
inside it and so on or what dimensions it contains. There are various
commands that can be helpful.
One to consider is a function called glimpse(), as in the dplyr package. It
sort of shows a rotated version of the data that may be of use at times.
And, if you were using a GUI like RSTUDIO, you have one pane (typically
upper right, sharing the space with other possible tabs) in which you can
click on a variable being shown and have it open up to show parts as needed
and even to view the data in another VIEW window, typically replacing the
upper left you normally edit in as a tab. In that environment you can even
ask to view or edit a variable such as a data frame.
One very useful technique I use is to NOT study something complex. Copy the
part you want and look at it. Yor second component called $x can be copied
out and examined. It looks in one sense like a matrix with the same 844 rows
as the whole dataframe and seven columns. Or, is it a sub-data.frame of some
kind? Modern R lets you embed all kinds of objects including other lists if
you do it carefully, within each cell. Extracting it as a whole, may let you
examine it using whatever tools apply.
And, believe it or not, sometimes it pays to read the damn documentation.
When I typed:
?aggregate
Into my session, I looked a bit further down and found this section:
---
Value
For the time series method, a time series of class "ts" or class c("mts",
"ts").
For the data frame method, a data frame with columns corresponding to the
grouping variables in by followed by aggregated columns from x. If the by
has names, the non-empty times are used to label the columns in the results,
with unnamed grouping variables being named Group.i for by[[i]].
---
You seem to have invoked the data.frame method. Perhaps the above makes
sense to you. And, it suggests perhaps a way to get a time series out that
you can investigate further and see if that may be helpful.
OR, and I hesitate to say this, since you do want to master base R methods,
consider whether using aggragate() is a good route to get what you want. I
am sure it is fine, but some may like to combine several dplyr verbs in the
tidyverse packages or use other methods from yet other packages where you
may more easily understand the output.
Another possibility once you figure out what you have and compare it to what
you want, is to use base R primitives or dplyr ones to do things like
unnest() to transform an embedded data structure into something a tad
different and perhaps suitable for your purposes.
I am not someone who believes the old ways are best, especially when many
other aspects of the world have moved forward.
-----Original Message-----
From: R-help <r-help-bounces using r-project.org> On Behalf Of Sorkin, John
Sent: Friday, January 24, 2025 2:03 PM
To: r-help using r-project.org (r-help using r-project.org) <r-help using r-project.org>
Subject: [R] Interpreting the output of str on a data frame created using
aggregate function
I ran the following code:
marginalcats <- aggregate(meanbyCensusIDAndDay3$cats,
list(meanbyCensusIDAndDay3$CensusID),table)
followed by
str(marginalcats)
I received the following output:
'data.frame': 844 obs. of 2 variables:
$ Group.1: num 6e+09 6e+09 6e+09 6e+09 6e+09 ...
$ x : int [1:844, 1:7] 14 14 14 14 14 14 14 14 14 14 ...
..- attr(*, "dimnames")=List of 2
.. ..$ : NULL
.. ..$ : chr [1:7] "Good" "Moderate" "Unhealthy For Some" "Unhealthy" ...
I am trying to understand the output. I believe it says that marginalcats
(1) is a data frame
(2) the df has two elements (I) Group.1 and (II) x
(3) Group.1 is a ?list? of number
(4) x which is a 844x7 matrix having value "Good", "Moderate", etc.
A few questions:
(A) Is the interpretation given above correct?
(B) Does the .. ..$ : NULL mean that the matrix has no row names?
(C) What does "attr(*, "dimnames")=List of 2" mean?
(D) Does it mean that the dimensions of the matrix are stored as two
separate lists?
(E) If so, how do I access the lists?
When I enter
dimnames(marginalcatsx$x)
I receive:
[[1]]
NULL
[[2]]
[1] "Good" "Moderate" "Unhealthy For Some"
"Unhealthy" "Very Unhealthy" "Hazardous1"
[7] "Hazardous2"
Thank you,
John
John David Sorkin M.D., Ph.D.
Professor of Medicine, University of Maryland School of Medicine;
Associate Director for Biostatistics and Informatics, Baltimore VA Medical
Center Geriatrics Research, Education, and Clinical Center;
PI Biostatistics and Informatics Core, University of Maryland School of
Medicine Claude D. Pepper Older Americans Independence Center;
Senior Statistician University of Maryland Center for Vascular Research;
Division of Gerontology and Paliative Care,
10 North Greene Street
GRECC (BT/18/GR)
Baltimore, MD 21201-1524
Cell phone 443-418-5382
______________________________________________
R-help using r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide
https://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
More information about the R-help
mailing list