[R] XGBoost continuos outcome case --- reg:linear in R

Sandeep Rana sunnysingha.analytics at gmail.com
Tue Feb 9 07:18:23 CET 2016

While learning how to implement XGBoost in R I came across below case and want to know how to go about it.

Outcome variable: continous
independent features: mix of categorical and continuous 
nrow(train_set): 8523

Since, XGBoost natively supports only numeric features, I applied one hot encoding on the training data set:

target <- train_set$Outlet_sales
sparsed_train_set <- sparse.model.matrix(~.-1, data=train_set)

nrow(sparsed_train_set) : 4526 #As expected, the row count is reduced.

Note: The target variable is continuous and has as many rows as in train_set i.e 8523, before one hot encoding is applied.

# To build mode:
bst <- xgboost(data = sparsed_train_set, label = target, max.depth = 4,
               eta = 1, nthread = 4, nround = 50, objective=reg:linear)

# Above execution would fail as 

My questions:
- How should I handle above disparity between sparsed training data and label  while building the model ?
- How should I use XGBoost to perform regression where outcome is continuous ? Most of the web portals refers to the cases related to classification.
  If any could lead me to the source explaining this. I have gone through the documentation but not much cleared in this case.

Sandeep S. Rana

	[[alternative HTML version deleted]]

More information about the R-help mailing list