[R] R help

Jim Lemon jim at bitwrit.com.au
Thu Aug 1 03:07:12 CEST 2013


On 07/31/2013 10:03 PM, Mª Teresa Martinez Soriano wrote:
> Hi
>
> First of all, thanks for this service, it is being very useful for me. I am new in R so I have a lot of doubts.
>
> I have to do imputation in a data set, this is a sample of my data set which looks like:
>
>
>   NUMERO      Data1      Data2 IE.2003 IE.2004 IE.2005 IE.2006 IE.2007 IE.2008 IE.2009 IE.2010
> 20    133 30/09/2002 18/06/2013     153     279     289     370     412     262     115      75
> 21    138 11/07/2002 13/05/2009    5460    7863    8365   12009   16763      NA      NA      NA
> 22    146 16/10/2009 18/06/2013      NA      NA      NA      NA      NA      NA      NA      35
> 23    152 27/05/1999 18/06/2013      NA      80      77      60      89     137     144     146
> 24    154 21/12/2004 18/06/2013      NA      NA     148     186     302     233     194     204
> 25    166  8/02/2008 18/06/2013      NA      NA      NA      NA      NA      NA      98     160
> 26    177 20/02/1996 18/06/2013      16       4      NA       3       3      NA       5       5
>>
>
>
> The problem is that I have cells which have to be empty, this depends on Data1 and Data2
>
> For instance in the third row, you can see that Data1 is equal to 16/10/2009, so I don't have to
>
> have any information until year 2009, therefore IE.2003,IE.2004,IE.2005,IE.2006, IE.2007, IE.2008
>
> have
>
>
> to be totally empty, but this doesn't mean that they are  missing values, in fact they are not. I
>
> don't  want to get any imputation in this cells.
>
>   Ie.2009 and IE.2010 have to be full and they are not, so this cells are missing values and I want to get imputed values for them. (I would delete this row, because it is impossible to get any imformation about it, but it is ok for this example)
>
> On the other hand, in the last row NA is a real missing value.
>
>
>
> How can I specify that this cells are empty and don't get this imputed values??
>
> I have tried to put NaN but I have problems in some functions that I need to do it before the
>
> imputation.
>
Hi Teresa,
I didn't see an answer to this, so I'll offer a couple of suggestions. 
First, NA is probably the best thing to have in your "empty" cells. If 
you change the NA cells to "", the columns will become factors, and if 
you then change the values back to numeric, the blanks will become NAs 
again.

I would get a set of vectors of logical values that indicated which 
cells you _don't_ want to impute (say your data frame is tmsdf):

dontimpute2003<-which(
  as.numeric(unlist(sapply(strsplit(tmsdf$Data1,"/"),"[",3))) < 2003 &
  is.na(tmsdf$IE.2003))
dontimpute2004<-which(
  as.numeric(unlist(sapply(strsplit(tmsdf$Data1,"/"),"[",3))) < 2004 &
  is.na(tmsdf$IE.2004))
...

then do your imputation on the entire data frame and reset the ones you 
don't want imputed to NA:

tmsdf$2003[dontimpute2003]<-NA
...

Jim



More information about the R-help mailing list