[R] Interpreting the output of str on a data frame created using aggregate function
Rui Barradas
ru|pb@rr@d@@ @end|ng |rom @@po@pt
Fri Jan 24 21:22:05 CET 2025
Às 19:03 de 24/01/2025, Sorkin, John escreveu:
> I ran the following code:
> marginalcats <- aggregate(meanbyCensusIDAndDay3$cats,
> list(meanbyCensusIDAndDay3$CensusID),table)
> followed by
> str(marginalcats)
>
> I received the following output:
> 'data.frame': 844 obs. of 2 variables:
> $ Group.1: num 6e+09 6e+09 6e+09 6e+09 6e+09 ...
> $ x : int [1:844, 1:7] 14 14 14 14 14 14 14 14 14 14 ...
> ..- attr(*, "dimnames")=List of 2
> .. ..$ : NULL
> .. ..$ : chr [1:7] "Good" "Moderate" "Unhealthy For Some" "Unhealthy" ...
>
> I am trying to understand the output. I believe it says that marginalcats
> (1) is a data frame
> (2) the df has two elements (I) Group.1 and (II) x
> (3) Group.1 is a ?list? of number
> (4) x which is a 844x7 matrix having value "Good", "Moderate", etc.
>
> A few questions:
> (A) Is the interpretation given above correct?
> (B) Does the .. ..$ : NULL mean that the matrix has no row names?
> (C) What does "attr(*, "dimnames")=List of 2" mean?
> (D) Does it mean that the dimensions of the matrix are stored as two separate lists?
> (E) If so, how do I access the lists?
> When I enter
> dimnames(marginalcatsx$x)
> I receive:
>
> [[1]]
> NULL
>
> [[2]]
> [1] "Good" "Moderate" "Unhealthy For Some" "Unhealthy" "Very Unhealthy" "Hazardous1"
> [7] "Hazardous2"
>
> Thank you,
> John
>
> John David Sorkin M.D., Ph.D.
> Professor of Medicine, University of Maryland School of Medicine;
> Associate Director for Biostatistics and Informatics, Baltimore VA Medical Center Geriatrics Research, Education, and Clinical Center;
> PI Biostatistics and Informatics Core, University of Maryland School of Medicine Claude D. Pepper Older Americans Independence Center;
> Senior Statistician University of Maryland Center for Vascular Research;
>
> Division of Gerontology and Paliative Care,
> 10 North Greene Street
> GRECC (BT/18/GR)
> Baltimore, MD 21201-1524
> Cell phone 443-418-5382
>
>
>
> ______________________________________________
> R-help using r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide https://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
Hello,
What str is telling you is that the 2nd column is a matrix column, it
has a dim attribute and has two dimensions. Those dimensions have
colnames but not rownames assigned.
The example below tries to produce a result similar to yours, numbers
will vary.
df1 <- data.frame(x = rep(letters[1:3], 8),
y = rep(1:12, each = 2))
agg <- aggregate(df1$x, by = list(df1$y), table)
str(agg)
#> 'data.frame': 12 obs. of 2 variables:
#> $ Group.1: int 1 2 3 4 5 6 7 8 9 10 ...
#> $ x : int [1:12, 1:2] 1 1 1 1 1 1 1 1 1 1 ...
#> ..- attr(*, "dimnames")=List of 2
#> .. ..$ : NULL
#> .. ..$ : chr [1:2] "a" "b"
n <- ncol(agg)
cbind(agg[-n], agg[[n]])
#> Group.1 a b
#> 1 1 1 1
#> 2 2 1 1
#> 3 3 1 1
#> 4 4 1 1
#> 5 5 1 1
#> 6 6 1 1
#> 7 7 1 1
#> 8 8 1 1
#> 9 9 1 1
#> 10 10 1 1
#> 11 11 1 1
#> 12 12 1 1
The 2nd column is a matrix because the values of y are always present
for each value of x.
Confusing? I think it is, another example makes it more clear
agg2 <- aggregate(cyl ~ gear, mtcars, table)
str(agg2)
#> 'data.frame': 3 obs. of 2 variables:
#> $ gear: num 3 4 5
#> $ cyl :List of 3
#> ..$ : 'table' int [1:3(1d)] 1 2 12
#> .. ..- attr(*, "dimnames")=List of 1
#> .. .. ..$ : chr [1:3] "4" "6" "8"
#> ..$ : 'table' int [1:2(1d)] 8 4
#> .. ..- attr(*, "dimnames")=List of 1
#> .. .. ..$ : chr [1:2] "4" "6"
#> ..$ : 'table' int [1:3(1d)] 2 1 2
#> .. ..- attr(*, "dimnames")=List of 1
#> .. .. ..$ : chr [1:3] "4" "6" "8"
m <- ncol(agg2)
cbind(agg2[-m], agg2[[m]])
#> Error in (function (..., row.names = NULL, check.rows = FALSE,
check.names = TRUE, : arguments imply differing number of rows: 3, 2
agg2[[m]]
#> [[1]]
#>
#> 4 6 8
#> 1 2 12
#>
#> [[2]]
#>
#> 4 6
#> 8 4
#>
#> [[3]]
#>
#> 4 6 8
#> 2 1 2
Now the second vector doesn't have gear == 8, there is an imbalance in
the table()'s results lengths'. So the vectors cannot be cbind'ed and
there is an error.
What the question asks for, to interpret str's output, is visible above.
Since the output of table are 3 vectors all of the same length then the
output of aggregate cannot cbind those vectors and cannot output a
matrix, like both of these can:
$ x : int [1:844, 1:7] # OP
$ x : int [1:12, 1:2] # my 1st example
This is not different of the *apply functions that default to
simplifying if possible, if not output a list.
In fact I believe it's exactly the same behavior.
Here is a 3rd example with commented code. I hope it is simple to follow.
need_some_stats <- function(x) {
c(Count = length(x), Mean = mean(x), Var = var(x))
}
agg3 <- aggregate(mpg ~ gear, mtcars, need_some_stats)
# 2nd column is a matrix 3x3
str(agg3)
#> 'data.frame': 3 obs. of 2 variables:
#> $ gear: num 3 4 5
#> $ mpg : num [1:3, 1:3] 15 12 5 16.1 24.5 ...
#> ..- attr(*, "dimnames")=List of 2
#> .. ..$ : NULL
#> .. ..$ : chr [1:3] "Count" "Mean" "Var"
# ugly output, the matrix column has its colnames
# prefixed with the data.frame's 2nd column name (mpg).
agg3
#> gear mpg.Count mpg.Mean mpg.Var
#> 1 3 15.00000 16.10667 11.36781
#> 2 4 12.00000 24.53333 27.84424
#> 3 5 5.00000 21.38000 44.34200
# make it a data.frame with all columns atomic vectors.
# the single `[` is meant to extract a sub-data.frame
# the double `[[` is meant to extract the last vector (a matrix)
p <- ncol(agg3)
cbind(agg3[-p], agg3[[p]])
#> gear Count Mean Var
#> 1 3 15 16.10667 11.36781
#> 2 4 12 24.53333 27.84424
#> 3 5 5 21.38000 44.34200
Hope this helps,
Rui Barradas
--
Este e-mail foi analisado pelo software antivírus AVG para verificar a presença de vírus.
www.avg.com
More information about the R-help
mailing list