[R] getting data from a "vertical" table into a "2-dimensional" grid
Bert Gunter
bgunter@4567 @end|ng |rom gm@||@com
Sat Oct 22 01:20:06 CEST 2022
"As my end result, I want a matrix or data frame, with one row for each
year, and one column for each category."
If I understand you correctly, no reshaping gymnastics are needed --
just use ?tapply:
set.seed(1)
do <- data.frame(year = rep(1990:1999, length = 50),
category = sample(1:5, size = 50, replace = TRUE),
sales = sample(0:99999, size = 50 , replace = TRUE) )
with(do, tapply(sales, list(year, category),sum))
## which gives the matrix:
1 2 3 4 5
1990 13283 NA 55083 87522 64877
1991 NA 80963 NA 30100 28277
1992 9391 202916 NA 55090 NA
1993 29696 167344 NA NA 17625
1994 98015 99521 NA 70536 52252
1995 157003 NA 26875 NA 11366
1996 32986 88683 6562 79475 95282
1997 13601 NA 134757 12398 NA
1998 30537 51117 31333 20204 NA
1999 39240 87845 62479 NA 98804
If this is not what you wanted, you may need to explain further or
await a response from someone more insightful than I.
Cheers,
Bert
On Fri, Oct 21, 2022 at 3:34 PM Kelly Thompson <kt1572757 using gmail.com> wrote:
>
> As my end result, I want a matrix or data frame, with one row for each
> year, and one column for each category.
>
> On Fri, Oct 21, 2022 at 6:23 PM Kelly Thompson <kt1572757 using gmail.com> wrote:
> >
> > # I think this might be a better example.
> >
> > # I have data presented in a "vertical" dataframe as shown below in
> > data_original.
> > # I want this data in a matrix or "grid", as shown below.
> > # What I show below seems like one way this can be done.
> >
> > # My question: Are there easier or better ways to do this, especially
> > in Base R, and also in R packages?
> >
> > #create data
> > set.seed(1)
> > data_original <- data.frame(year = rep(1990:1999, length = 50),
> > category = sample(1:5, size = 50, replace = TRUE), sales =
> > sample(0:99999, size = 50 , replace = TRUE) )
> > dim(data_original)
> >
> > #remove rows where data_original$year == 1990 & data_original$category
> > == 5, to ensure there is at least one NA in the "grid"
> > data_original <- data_original[ (data_original$year == 1990 &
> > data_original$category == 5) == FALSE, ]
> > dim(data_original)
> >
> > #aggregate data
> > data_aggregate_sum_by_year_and_category <- aggregate(x =
> > data_original$sales, by = list(year = data_original$year, category =
> > data_original$category), FUN = sum)
> > colnames(data_aggregate_sum_by_year_and_category) <- c('year',
> > 'category', 'sum_of_sales')
> > dim(data_aggregate_sum_by_year_and_category)
> >
> > data_expanded <- expand.grid(year =
> > unique(data_aggregate_sum_by_year_and_category$year), category =
> > unique(data_aggregate_sum_by_year_and_category$category))
> > dim(data_expanded)
> > data_expanded <- merge(data_expanded,
> > data_aggregate_sum_by_year_and_category, all = TRUE)
> > dim(data_expanded)
> >
> > mat <- matrix(data = data_expanded$sum_of_sales, nrow =
> > length(unique(data_expanded$year)), ncol =
> > length(unique(data_expanded$category)) , byrow = TRUE, dimnames =
> > list( unique(data_expanded$year), unique(data_expanded$category) ) )
> >
> >
> > data_original
> > data_expanded
> > mat
> >
> > On Fri, Oct 21, 2022 at 5:03 PM Kelly Thompson <kt1572757 using gmail.com> wrote:
> > >
> > > ###
> > > #I have data presented in a "vertical" data frame as shown below in
> > > data_original.
> > > #I want this data in a matrix or "grid", as shown below.
> > > #What I show below seems like one way this can be done.
> > >
> > > #My question: Are there easier or better ways to do this, especially
> > > in Base R, and also in R packages?
> > >
> > > #reproducible example
> > >
> > > data_original <- data.frame(year = c('1990', '1999', '1990', '1989'),
> > > size = c('s', 'l', 'xl', 'xs'), n = c(99, 33, 3, 4) )
> > >
> > > data_expanded <- expand.grid(unique(data_original$year),
> > > unique(data_original$size), stringsAsFactors = FALSE )
> > > colnames(data_expanded) <- c('year', 'size')
> > > data_expanded <- merge(data_expanded, data_original, all = TRUE)
> > >
> > > mat <- matrix(data = data_expanded $n, nrow =
> > > length(unique(data_expanded $year)), ncol =
> > > length(unique(data_expanded $size)) , byrow = TRUE, dimnames = list(
> > > unique(data_expanded$year), unique(data_expanded$size) ) )
> > >
> > > data_original
> > > data_expanded
> > > mat
>
> ______________________________________________
> R-help using r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
More information about the R-help
mailing list