[R] weight median by count for multiple records

Kirsten Beyer kirsten-beyer at uiowa.edu
Thu Jul 30 19:58:47 CEST 2009


Hello everyone,

I have a .csv file with the following format:

uniqueID     SubjectID      Distance_miles     Tag
1                  1001                    5.5               3
2                  1001                    7                  1
3                  1001                    6.5               1
4                  1001                    5                  1
5                  1002                    2                  2
6                  1002                    2                  2
7                  1002                    1.5               2
8                  1003                    15                2
9                  1003                    17                2
10                1003                    18                2


For each SubjectID, I want to calculate the median distance, where the
Tag variable indicates the number of times that distance was recorded.
 My final output table would be...

SubjectID   Median Distance
1001                5.5
1002                2
1003                17

I have used the following script to calculate the median for a data
frame where each recorded distance has its own row, and where temp is
a dataframe containing each unique SubjectID and routes is the file I
describe above.

for(i in 1:nrow(temp)){
temp$mediandistance[i] <-
median(routes$Distance_miles[routes$Subject_ID==temp$Subject_ID[i]])
}

I am interested to know...
(1) Is there a way to incorporate a weighted median into this script,
where the weights are the number of times each distance is recorded?
(2) Can I transform my current matrix into one that gives each
distance its own row?

Help is much appreciated,
Kirsten Beyer




More information about the R-help mailing list