[R] imputing the numerical columns of a dataframe, returning the rest unchanged
Yihui Xie
xieyihui at gmail.com
Wed Dec 24 06:46:24 CET 2008
Hi,
?sapply will tell you
....
'sapply' is a user-friendly version of 'lapply' by default
returning a vector or matrix if appropriate.
....
so 'x' has lost its class in sapply(); e.g.
## iris is a data.frame
> str(iris)
'data.frame': 150 obs. of 5 variables:
$ Sepal.Length: num 5.1 4.9 4.7 4.6 5 5.4 4.6 5 4.4 4.9 ...
$ Sepal.Width : num 3.5 3 3.2 3.1 3.6 3.9 3.4 3.4 2.9 3.1 ...
$ Petal.Length: num 1.4 1.4 1.3 1.5 1.4 1.7 1.4 1.5 1.4 1.5 ...
$ Petal.Width : num 0.2 0.2 0.2 0.2 0.2 0.4 0.3 0.2 0.2 0.1 ...
$ Species : Factor w/ 3 levels "setosa","versicolor",..: 1 1 1 1
1 1 1 1 1 1 ...
## but sapply() will coerce it into a numeric matrix
> str(sapply(iris, function(x)x))
num [1:150, 1:5] 5.1 4.9 4.7 4.6 5 5.4 4.6 5 4.4 4.9 ...
- attr(*, "dimnames")=List of 2
..$ : NULL
..$ : chr [1:5] "Sepal.Length" "Sepal.Width" "Petal.Length" "Petal.Width" ...
I'd suggest you get the class of each column first, then apply
impute() to these columns (i.e. DF[, sapply(DF, class) == "numeric"])
and assign the new values to the original columns.
Regards,
Yihui
--
Yihui Xie <xieyihui at gmail.com>
Phone: +86-(0)10-82509086 Fax: +86-(0)10-82509086
Mobile: +86-15810805877
Homepage: http://www.yihui.name
School of Statistics, Room 1037, Mingde Main Building,
Renmin University of China, Beijing, 100872, China
On Mon, Dec 22, 2008 at 11:38 PM, Mark Heckmann <mark.heckmann at gmx.de> wrote:
> Hi R-experts,
>
> how can I apply a function to each numeric column of a data frame and return
> the whole data frame with changes in numeric columns only?
> In my case I want to do a median imputation of the numeric columns and
> retain the other columns. My dataframe (DF) contains factors, characters and
> numerics.
>
> I tried the following but that does not work:
>
> foo <- function(x){
> if(is.numeric(x)==TRUE) return(impute(x))
> else(return(x))
> }
>
> sapply(DF, foo)
>
> day version ID V1 V2 V3
> [1,] "4" "A" "1a" "1" "5" "5"
> [2,] "4" "A" "2a" "2" "3" "5"
> [3,] "4" "B" "3a" "3" "5" "5"
>
> All the variables are coerced to characters now ("day" and "version" were
> factors, "id" a character). I only want imputations on the numerics, but the
> rest to be returned unchanged.
>
> Is there a function available. If not, how can I do it?
>
> TIA and merry x-mas,
> Mark
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>
More information about the R-help
mailing list