[R] missing handling

Don MacQueen macq at llnl.gov
Tue Oct 4 22:36:57 CEST 2005


At 8:35 PM +0100 10/4/05, Prof Brian Ripley wrote:
>On Tue, 4 Oct 2005, Weiwei Shi wrote:
>
>>  Hi, Jim:
>>  I tried your code and get the following error:
>>  trn1<-read.table('trn1.svm', header=F, na.string='.', sep='|')
>>  Med<-apply(trn1, 2, median, na.rm=T)
>>  Ind<-which(is.na(trn1), arr.ind=T)
>>  trn1[Ind]<-Med[Ind[,'col']]
>>  Error in "[<-.data.frame"(`*tmp*`, Ind, value = c(1.00802124455,
>>  1.00802124455, :
>>  only logical matrix subscripts are allowed in replacement
>>
>>
>>  I cannot figure out why.
>
>Read the help for "[<-.data.frame" to be told the answer.
>
>A data frame (as given by read.table) is not a matrix, as the example
>presumably was.  Indexing whole matrices at once is efficient, but it
>hides loops for data frames.
>
>You will not do better than looping over columns for a data frame, but you
>certainly do not need to loop over rows which is very inefficient.
>Something like
>
>trn2 <- trn1
>for(i in names(trn2)) {
>      Med <- median(trn2[[i]], na.rm = TRUE)
>      trn2[i, is.na(trn2[[i]])] <- Med
>}
>

But exchange the indices:

    trn2[ is.na(trn2[[i]]) , i] <- Med


>  >
>>  Thanks for help,
>>
>>  On 9/27/05, jim holtman <jholtman at gmail.com> wrote:
>>>
>>>  Use 'which(...arr.ind=T)'
>>>  > x.1
>>>  [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10]
>>>  [1,] 6 10 3 4 10 7 9 8 4 10
>>>  [2,] 8 7 4 7 4 8 3 NA 3 4
>>>  [3,] 7 7 10 10 3 5 3 2 2 2
>>>  [4,] 3 4 5 10 10 2 6 9 4 5
>>>  [5,] 3 5 9 5 6 NA 3 NA 6 7
>>>  [6,] 9 6 10 5 10 4 2 10 NA 5
>>>  [7,] 5 2 5 10 3 7 6 4 6 8
>>>  [8,] 2 6 1 8 9 2 7 8 3 8
>>>  [9,] 9 1 4 9 8 10 2 NA 1 7
>>>  [10,] 2 4 8 7 NA 4 3 NA 5 5
>>>>  x.4
>>>  [1] 5.5 5.5 5.0 7.5 8.0 5.0 3.0 8.0 4.0 6.0
>>>>  Med <- apply(x.1, 2, median, na.rm=T) # get median
>>>>  Ind <- which(is.na(x.1), arr.ind=T) # determine which are NA
>>>>  x.1[Ind] <- Med[Ind[,'col']] # replace with median
>>>>  x.1
>>>  [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10]
>>>  [1,] 6 10 3 4 10 7 9 8 4 10
>>>  [2,] 8 7 4 7 4 8 3 8 3 4
>>>  [3,] 7 7 10 10 3 5 3 2 2 2
>>>  [4,] 3 4 5 10 10 2 6 9 4 5
>>>  [5,] 3 5 9 5 6 5 3 8 6 7
>>>  [6,] 9 6 10 5 10 4 2 10 4 5
>>>  [7,] 5 2 5 10 3 7 6 4 6 8
>>>  [8,] 2 6 1 8 9 2 7 8 3 8
>>>  [9,] 9 1 4 9 8 10 2 8 1 7
>>>  [10,] 2 4 8 7 8 4 3 8 5 5
>>>>
>>>
>>>
>>>   On 9/27/05, Weiwei Shi <helprhelp at gmail.com> wrote:
>>>
>>>>  Hi,
>>>>  I have the following codes to replace missing using median, assuming
>  >>> missing
>  >>> only occurs on continuous variables:
>  >>>
>  >>> trn1<-read.table('trn1.fv', header=F, na.string='.', sep='|')
>  >>>
>  >>> # median
>  >>> m.trn1<-sapply(1:ncol(trn1), function(i) median(trn1[,i], na.rm=T))
>>>>
>>>>  #replace
>>>>  trn2<-trn1
>>>>  for (each in 1:nrow(trn1)){
>>>>  index.missing=which(is.na(trn1[each,]))
>>>>  trn2[each,]<-replace(trn1[each,], index.missing, m.trn1[index.missing])
>>>>  }
>>>>
>>>>
>>>>  Anyone can suggest some ways to improve it since replacing 10 
>>>>takes 1.5sec:
>>>>>  system.time(for (each in 1:10){index.missing=which(is.na
>>>>  (trn1[each,]));
>>>>  trn2[each,]<-replace(trn1[each,], index.missing, m.trn1[index.missing
>>>>  ]);})
>>>>  [1] 1.53 0.00 1.53 0.00 0.00
>>>>
>>>>
>>>>  Another general question is
>>>>  are there some packages in R doing missing handling?
>>>>
>>>>  Thanks,
>>>>
>>>>  --
>>>>  Weiwei Shi, Ph.D
>>>>
>>>>  "Did you always know?"
>>>>  "No, I did not. But I believed..."
>>>>  ---Matrix III
>>>>
>>>>  [[alternative HTML version deleted]]
>>>>
>>>>  ______________________________________________
>>>>  R-help at stat.math.ethz.ch mailing list
>>>>  https://stat.ethz.ch/mailman/listinfo/r-help
>>>>  PLEASE do read the posting guide!
>>>>  http://www.R-project.org/posting-guide.html
>>>>
>>>
>>>
>>>
>>>  --
>>>  Jim Holtman
>>>  Cincinnati, OH
>>>  +1 513 247 0281
>>>
>>>  What the problem you are trying to solve?
>>
>>
>>
>>
>>  --
>>  Weiwei Shi, Ph.D
>>
>>  "Did you always know?"
>>  "No, I did not. But I believed..."
>>  ---Matrix III
>>
>>	[[alternative HTML version deleted]]
>>
>>  ______________________________________________
>>  R-help at stat.math.ethz.ch mailing list
>>  https://stat.ethz.ch/mailman/listinfo/r-help
>  > PLEASE do read the posting guide! 
>http://www.R-project.org/posting-guide.html
>>
>
>--
>Brian D. Ripley,                  ripley at stats.ox.ac.uk
>Professor of Applied Statistics,  http://www.stats.ox.ac.uk/~ripley/
>University of Oxford,             Tel:  +44 1865 272861 (self)
>1 South Parks Road,                     +44 1865 272866 (PA)
>Oxford OX1 3TG, UK                Fax:  +44 1865 272595
>
>______________________________________________
>R-help at stat.math.ethz.ch mailing list
>https://stat.ethz.ch/mailman/listinfo/r-help
>PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


-- 
--------------------------------------
Don MacQueen
Environmental Protection Department
Lawrence Livermore National Laboratory
Livermore, CA, USA




More information about the R-help mailing list