A peek into xgboost with Python | elastacloud-channels

Extreme Gradient Boosting (xgboost) is a very fast, scalable implementation of gradient boosting that has taken the data science world by storm, with xgboost regularly online data science competitions and use at scale across different industries. Xgboost was originally developed by Tiangi Chen and is renowned for execution speed and model performance. I have recently been conducting some experiments with xgboost for the Renewables AI products. Below I show how to run a simple regression type tree and linear based model. Thereafter we go on to explore grid search and random search with xgboost.

But first a little background information…

Boosting is what gives xgboost it’s state of the art performance. Boosting is not a specific machine learning algorithm, but a concept that can be applied to set of machine learning algorithms, hence boosting is known as a meta algorithm. Essentially, xgboost is an ensemble method, used to convert many weak learners (models performing slightly better than chance) to a strong learner. This is achieved via boosting, where a set of weak learners on subsets of the data is iteratively learnt. Each weak learner is weighted according to performance. Thereafter, each weak learner’s predictions are combined and multiplied by their weight to obtain a final weighted prediction, which is better than any of the individual predictions themselves.

The Python API is capable of running the xgboost on regression and classification problems, using decision tree and linear learners. Below we apply xgboost to regression type problem using a tree-based learner. Decision trees are an iterative contruction of binary decisions (one decision at a time) until a stopping criterion is met (ie. The majority of one decision split consists of one category/value or another). Individual trees tend to overfit (low bias, high variance), hence perform well on training data but don’t generalise as well, hence ensemble methods are useful in this scenario. Notice as this is a regression type problem we use the loss function "reg:linear", whereas for a classification problem we would use "reg:logistic" or "binary: logistic" depending on whether you are interested in the class or the probability of the class. A loss function maps the difference between the actual and predicted values - we aim to find the model with the lowest loss function.

import numpy as np
import pandas as pd
from sklearn.metrics import r2_score
import xgboost as xgb
from sklearn.metrics import mean_squared_error
from xgboost import plot_tree

X_train = pd.read_csv("X_train.csv")
Y_train = pd.read_csv("Y_train.csv")
X_test = pd.read_csv("X_test.csv")
Y_test = pd.read_csv("\Y_test.csv")

list(X_train)
list(X_test)

####################################
# XGBoost Decision Tree
###################################
xg_reg = xgb.XGBRegressor(objective='reg:linear', n_estimators=10, seed= 123)
xg_reg.fit(X_train,Y_train)

#XGBRegressor(base_score=0.5, booster='gbtree', colsample_bylevel=1,
#       colsample_bytree=1, gamma=0, learning_rate=0.1, max_delta_step=0,
#       max_depth=3, min_child_weight=1, missing=None, n_estimators=10,
#       n_jobs=1, nthread=None, objective='reg:linear', random_state=0,
#       reg_alpha=0, reg_lambda=1, scale_pos_weight=1, seed=123,
#       silent=True, subsample=1)

preds = xg_reg.predict(X_test)
rmse = np.sqrt(mean_squared_error(Y_test, preds))
print("RMSE: %f" % (rmse))#RMSE: 164.866642

r2 = r2_score(Y_test, preds) 

# Plot the first tree
xgb.plot_tree(xg_reg,num_trees=0)
plt.show()

In the following code we apply xgboost to a regression type problem using linear based learners.

#####################################
# XGBoost Linear Regression
#####################################
DM_train = xgb.DMatrix(data=X_train, label=Y_train)
DM_test = xgb.DMatrix(data=X_test, label=Y_test)
params = {"booster":"gblinear", "objective":"reg:linear"}
xg_reg = xgb.train(params = params, dtrain=DM_train, num_boost_round=10)
preds = xg_reg.predict(DM_test)
rmse = np.sqrt(mean_squared_error(Y_test, preds))
print("RMSE: %f" % (rmse)) #RMSE: 169.731848

r2 = r2_score(Y_test, preds)

Like many other algorithms, performance can be enhanced by tuning the hyperparameters. Below shows an example of a xgboost grid search. Grid search can be quite computationally expensive as we exhaustively search over a given set of hyperparameters, and pick the best performing hyperparameters. For example, if we have 2 hyperparameters to tune and 4 possible values for each parameter, that’s 16 possible parameter configurations. An alterative to grid search is random search, where you can define how many models/iterations to try before stopping. During each iteration, the algorithm randomly selects a value in the range specified for each hyperparameter.

import pandas as pd
import xgboost as xgb
import numpy as np
from sklearn.model_selection import GridSeachCV
housing_data = pd.read_csv("ames_housing_trimmed_processed.csv")
X, y = housing_data[housing_data.columns_tolist()[:-1]],
housing_data[housing_data.columns.tolist()[-1]]
housing_dmatrix = xgb.DMatrix(data=X, label=y)
gbm_param_grid = {'learning_rate':[0.01,0.1,0.5,0.9],
                  'n_estimators': [200],
                  'subsample': [0.3,0.5,0.9]}
gbm = xgb.Regressor()
grid_mse = GridSearchCV(estimator = gbm,
                        param_grid = gbm_param_grid,
                        scoring = 'neg_mean_squared_error', cv = 4, verbose = 1)
grid_mse.fit(X,y)
print("Best parameters found: ", grid_mse.best_params_)
print("lowest RMSE: ", np.sqrt(np.abs(grid_mse.best_score_)))

A quick overview of the hyperparameters that can be tuned for tree based models:

- eta/learning rate (how quickly the model fits residual error using additional base lase learners

-gamma; minimum loss reduction to create new tree split

-lambda: L2 regularisation on lead weights

-alpha: L1 regularisation on leaf weights

-max depth: how big a tree can grow

-subsample: Percentage of sample that can be used for any given boosting round

-colsample_tree: the fraction of features that can be called on during any boosting round (ranges from 0-1)

An overview of hyperparameters that can be tuned for linear learners:

-lambda: L2 regularisation on weights

-alpha: L1 regularisation on weights

-lambda_bias: L2 regularisation term on bias

Another useful blog posts related to xgboost can be found here.

Happy experimenting!