[R] Detect and replace omitted data

David Winsemius dwinsemius at comcast.net
Tue Oct 18 21:19:55 CEST 2011


On Oct 18, 2011, at 2:53 PM, Dennis Murphy wrote:

> Prompted by David's xtabs() suggestion, one way to do what I think the
> OP wants is to
> * define day and unit as factors whose levels comprise the full range
> of desired values;
> * use xtabs();
> * return the result as a data frame.
> Something like
>
> x <- data.frame( day = factor(rep(c(4, 6), each = 8), levels = 4:6),
>                 unit = factor(c(1:8, seq(2,16,2)), levels = 1:16),
>                 value = floor(rnorm(16,25,10)) )
> as.data.frame(with(x, xtabs(value ~ unit + day)))

Oh, ... sometimes I'm "slow". Dennis' code has it's virtues, but  
sometimes people want to avoid factors. Could also create a zero- 
numeric-matrix to fill the interiors and rbind to the analysis matrix  
just in the data= input to xtabs:

  zeroes <- cbind(day =seq( min(day), max(day), by=1),
                 unit=seq(min(unit), max(unit), by=1),
                 value=0)   # ignore warning

xtabs(value~day+unit, data=rbind(x, zeroes) )
    unit
day  1  2  3  4  5  6  7  8  9 10 11 12 13 14 15 16
   4 25 34  3 25 38 18 19 33  0  0  0  0  0  0  0  0
   5  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0
   6  0 22  0 42  0 37  0  4  0 12  0 31  0 17  0 28


-- 
David.


>
> HTH,
> Dennis
>
> On Tue, Oct 18, 2011 at 11:33 AM, David Winsemius
> <dwinsemius at comcast.net> wrote:
>>
>> On Oct 18, 2011, at 2:24 PM, Sarah Goslee wrote:
>>
>>> Hi Jonny,
>>>
>>> On Tue, Oct 18, 2011 at 1:02 PM, Jonny Armstrong
>>> <jonny5armstrong at gmail.com> wrote:
>>>>
>>>> I am analyzing the spatial distribution of fish in a stream. The  
>>>> stream
>>>> is
>>>> divided into equally sized units, and the number of fish in each  
>>>> unit is
>>>> counted. My problem is that my dataset is missing rows where the  
>>>> count in
>>>> a
>>>> unit equals zero. I need to create zero data for the missing units.
>>>>
>>>> For example:
>>>> day<-(c(rep(4,8),rep(6,8)))
>>>> unit<-c(seq(1,8,1),seq(2,16,2))
>>>> value<-floor(rnorm(16,25,10))
>>>> x<-cbind(day,unit,value)
>>>
>>> Thanks for the actual reproducible example.
>>>
>>>> x
>>>>     day unit value
>>>>  [1,]   4    1    19
>>>>  [2,]   4    2    15
>>>>  [3,]   4    3    16
>>>>  [4,]   4    4    20
>>>>  [5,]   4    5    17
>>>>  [6,]   4    6    15
>>>>  [7,]   4    7    14
>>>>  [8,]   4    8    29
>>>>  [9,]   6    2    18
>>>> [10,]   6    4    22
>>>> [11,]   6    6    27
>>>> [12,]   6    8    16
>>>> [13,]   6   10    45
>>>> [14,]   6   12    36
>>>> [15,]   6   14    34
>>>> [16,]   6   16    13
>>>>
>>>> Lets say the stream has 16 units. For each day, I want to fill in  
>>>> rows
>>>> for
>>>> any missing units (e.g., units 9-16 for day 4, the odd numbered  
>>>> units on
>>>> day
>>>> 6) with values of zero.
>>
>> I could not figure out what you wanted precisely. If "day" is the row
>> designator, and you want values by 'unit' and 'day' with zeros for  
>> the
>> missing, then that is exactly what `xtab` delivers:
>>
>>> xtabs(value ~ day+unit, data=x)
>>   unit
>> day  1  2  3  4  5  6  7  8 10 12 14 16
>>  4 25 34  3 25 38 18 19 33  0  0  0  0
>>  6  0 22  0 42  0 37  0  4 12 31 17 28
>>
>> You cannot get much more concise than that.
>>
>> --
>> david.
>>>
>>> Here's one option, though it may not be terribly concise:
>>>
>>> all.samples <- expand.grid(day=unique(x[,"day"]), unit=1:16)
>>> all.samples <- all.samples[order(all.samples[,"day"],
>>> all.samples[,"unit"]),]
>>> x.final <- merge(x, all.samples, all.y=TRUE)
>>> x.final[is.na(x.final[,"value"]), "value"] <- 0
>>>
>>> Sarah
>>>
>>>> Does anyone know a relatively concise way to do this?
>>>> Thank you.
>>>>
>>>>       [[alternative HTML version deleted]]
>>>>
>>>
>>> --
>>> Sarah Goslee
>>> http://www.functionaldiversity.org
>>>
>>> ______________________________________________
>>> R-help at r-project.org mailing list
>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>> PLEASE do read the posting guide
>>> http://www.R-project.org/posting-guide.html
>>> and provide commented, minimal, self-contained, reproducible code.
>>
>> David Winsemius, MD
>> West Hartford, CT
>>
>> ______________________________________________
>> R-help at r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>>

David Winsemius, MD
West Hartford, CT



More information about the R-help mailing list