[R] query about counting rows of a dataframe
David Winsemius
dwinsemius at comcast.net
Thu Nov 3 22:40:19 CET 2011
On Nov 3, 2011, at 12:28 PM, Stefano Sofia wrote:
> Dear R users,
> I have got the following data frame, called my_df:
>
> gender day_birth month_birth year_birth labour
> 1 F 22 10
> 2001 1
> 2 M 29 10
> 2001 2
> 3 M 1 11
> 2001 1
> 4 F 3 11
> 2001 1
> 5 M 3 11
> 2001 2
> 6 F 4 11
> 2001 1
> 7 F 4 11
> 2001 2
> 8 F 5 12
> 2001 2
> 9 M 22 14
> 2001 2
> 10 F 29 13
> 2001 2
> ...
>
> I need to count data in different ways:
>
> 1. count the births for each day (having 0 when necessary)
> independently from the value of the "labour" column
xtabs sometimes give better results. If you want all 31 days then make
day_birth a factor with levels=1:31)
> xtabs( ~ day_birth + month_birth + year_birth, data=dat)
, , year_birth = 2001
month_birth
day_birth 10 11 12 13 14
1 0 1 0 0 0
3 0 2 0 0 0
4 0 2 0 0 0
5 0 0 1 0 0
22 1 0 0 0 1
29 1 0 0 1 0
>
> 2. count the births for each day (having 0 when necessary), divided
> by the value of "labour" (which can have two valuers, 1 or 2)
Cannot figure out what is being asked here. What to do with the two
values? Just count them? This would give a partitioned count
> xtabs( labour==1 ~ day_birth + month_birth , data=dat)
month_birth
day_birth 10 11 12 13 14
1 0 1 0 0 0
3 0 1 0 0 0
4 0 1 0 0 0
5 0 0 0 0 0
22 1 0 0 0 0
29 0 0 0 0 0
> xtabs( labour==2 ~ day_birth + month_birth , data=dat)
month_birth
day_birth 10 11 12 13 14
1 0 0 0 0 0
3 0 1 0 0 0
4 0 1 0 0 0
5 0 0 1 0 0
22 0 0 0 0 1
29 1 0 0 1 0
>
> 3. count the births for each day of all the years (i.e. the 22nd of
> October of all the years present in the data frame) independently
> from the value of "labour"
If I understand correctly:
> xtabs( ~ day_birth + month_birth + year_birth, data=dat)
, , year_birth = 2001
month_birth
day_birth 10 11 12 13 14
1 0 1 0 0 0
3 0 2 0 0 0
4 0 2 0 0 0
5 0 0 1 0 0
22 1 0 0 0 1
29 1 0 0 1 0
>
> 4. count the births for each day of all the years (i.e. the 22nd of
> October of all the years present in the data frame), divided by the
> value of "labour"
Again confusing. Do you mean to use separate tables for labour==1 and
labour==2? Perhaps context to explain what these values represent.
Some of us are "concrete". The results of xtabs are tables and can be
divided like matrices.
>
> I tried with the command
>
> table(my_df$year_birth, my_df$month_birth, my_df$day_birth)
>
> which satisfies (partially) question numer 1 (I am not able to have
> 0 in the not available days).
>
> Is there a smart way to do that without invoking too many loops?
>
> thank you for your help
David Winsemius, MD
Heritage Laboratories
West Hartford, CT
More information about the R-help
mailing list