[R] R help
Jim Lemon
jim at bitwrit.com.au
Thu Aug 1 03:07:12 CEST 2013
On 07/31/2013 10:03 PM, Mª Teresa Martinez Soriano wrote:
> Hi
>
> First of all, thanks for this service, it is being very useful for me. I am new in R so I have a lot of doubts.
>
> I have to do imputation in a data set, this is a sample of my data set which looks like:
>
>
> NUMERO Data1 Data2 IE.2003 IE.2004 IE.2005 IE.2006 IE.2007 IE.2008 IE.2009 IE.2010
> 20 133 30/09/2002 18/06/2013 153 279 289 370 412 262 115 75
> 21 138 11/07/2002 13/05/2009 5460 7863 8365 12009 16763 NA NA NA
> 22 146 16/10/2009 18/06/2013 NA NA NA NA NA NA NA 35
> 23 152 27/05/1999 18/06/2013 NA 80 77 60 89 137 144 146
> 24 154 21/12/2004 18/06/2013 NA NA 148 186 302 233 194 204
> 25 166 8/02/2008 18/06/2013 NA NA NA NA NA NA 98 160
> 26 177 20/02/1996 18/06/2013 16 4 NA 3 3 NA 5 5
>>
>
>
> The problem is that I have cells which have to be empty, this depends on Data1 and Data2
>
> For instance in the third row, you can see that Data1 is equal to 16/10/2009, so I don't have to
>
> have any information until year 2009, therefore IE.2003,IE.2004,IE.2005,IE.2006, IE.2007, IE.2008
>
> have
>
>
> to be totally empty, but this doesn't mean that they are missing values, in fact they are not. I
>
> don't want to get any imputation in this cells.
>
> Ie.2009 and IE.2010 have to be full and they are not, so this cells are missing values and I want to get imputed values for them. (I would delete this row, because it is impossible to get any imformation about it, but it is ok for this example)
>
> On the other hand, in the last row NA is a real missing value.
>
>
>
> How can I specify that this cells are empty and don't get this imputed values??
>
> I have tried to put NaN but I have problems in some functions that I need to do it before the
>
> imputation.
>
Hi Teresa,
I didn't see an answer to this, so I'll offer a couple of suggestions.
First, NA is probably the best thing to have in your "empty" cells. If
you change the NA cells to "", the columns will become factors, and if
you then change the values back to numeric, the blanks will become NAs
again.
I would get a set of vectors of logical values that indicated which
cells you _don't_ want to impute (say your data frame is tmsdf):
dontimpute2003<-which(
as.numeric(unlist(sapply(strsplit(tmsdf$Data1,"/"),"[",3))) < 2003 &
is.na(tmsdf$IE.2003))
dontimpute2004<-which(
as.numeric(unlist(sapply(strsplit(tmsdf$Data1,"/"),"[",3))) < 2004 &
is.na(tmsdf$IE.2004))
...
then do your imputation on the entire data frame and reset the ones you
don't want imputed to NA:
tmsdf$2003[dontimpute2003]<-NA
...
Jim
More information about the R-help
mailing list