You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
T-learner is a meta-learner that uses two machine learning models to estimate the individual-level heterogeneous causal treatment effect. In this tutorial, we will talk about how to use the python package causalML to build a T-learner. We will cover:
How to implement T-learner using the XGBoost model, the light GBM model, and the neural network model separately?
How to make individual treatment effect (ITE) and average treatment effect (ATE) estimations using a T-learner?
How to check the T-learner feature importance?
How to interpret a T-learner uplift model using SHAP?
If you are not a Medium member and want to support me to keep providing free content (😄 Buy me a cup of coffee ☕), join Medium membership through this link. You will get full access to posts on Medium for $5 per month, and I will receive a portion of it. Thank you for your support 🙏
Give me a tip to show your appreciation and help me keep providing free content. Thank you for your generosity 🙏
Let’s get started!
Step 1: Install and Import Libraries
In step 1, we will install and import the python libraries.
Firstly, let’s install causalml .
Install package
!pip install causalml
After the installation is completed, we can import the libraries.
pandas and numpy are imported for data processing.
synthetic_data is imported for synthetic data creation.
XGBTRegressor, MLPTRegressor, BaseTRegressor, LGBMRegressor and XGBRegressor are for the machine learning model training.
Data processing
importpandasaspdimportnumpyasnp# Create synthetic data fromcausalml.datasetimportsynthetic_data# Machine learning model fromcausalml.inference.metaimportXGBTRegressor, MLPTRegressor, BaseTRegressorfromxgboostimportXGBRegressorfromlightgbmimportLGBMRegressor
Step 2: Create Dataset
In step 2, we will create a synthetic dataset for the T-learner uplift model.
Firstly, a random seed is set to make the synthetic dataset reproducible.
Then, using the synthetic_data method from the causalml python package, we created a dataset with five features, one treatment variable, and one continuous outcome variable.
Set a seed for reproducibility
np.random.seed(42)# Create a synthetic dataset y, X, treatment, ite, _, _=synthetic_data(mode=1, n=5000, p=5, sigma=1.0)
feature_names= ['X1', 'X2', 'X3', 'X4', 'X5']
After that, using value_counts on the treatment variable, we can see that out of 5000 samples, 2582 units received treatment and 2418 units did not receive treatment.
Check treatment vs. control counts
pd.Series(treatment).value_counts()
Output:
1 2582
0 2418
dtype: int64
Finally, we get the true average treatment effect (ATE) by taking the mean of the true individual treatment effect (ITE). The true average treatment effect (ATE) is about 0.5.
True ATE
ite.mean()
Output:
0.4988477022092744
Step 3: T-learner Using XGBoost Model
In step 3, we will use the XGBoost model with T-learner to estimate the average treatment effect (ATE) and the individual treatment effect (ITE).
XGBTRegressor is a built-in XGBoost T-learner model that comes with the causalML package.
To estimate the average treatment effect (ATE) using XGBTRegressor, we first initiate the XGBTRegressor, then get the average treatment effect (ATE) and its upper bound and lower bound using the estimate_ate method.
We can see that the estimated average treatment effect (ATE) is 0.61, which is 0.11 higher than the true average treatment effect (ATE). The lower bound for the average treatment effect (ATE) is 0.56 and the upper bound for the average treatment effect (ATE) is 0.67.
Average Treatment Effect: 0.61 (0.56, 0.67)
The method fit_predict produces the estimated individual treatment effect (ITE).
If the confidence interval for the individual treatment effect (ITE) is needed, we can use bootstrap by specifying the bootstrap number, bootstrap size, and setting return_ci=True.
The output gives us both the estimated individual treatment effect (ITE) and the estimated upper and lower bound for each individual.
In step 4, we will talk about how to use BaseTRegressor with a light GBM model for the T-learner.
BaseTRegressor is a generalized method that can take in existing machine learning models from packages such as sklearn and xgboost, and run T-learners with those models. In this step, we will run the BaseTRegressor with LGBMRegressor.
If we run BaseTRegressor with xgboost, the result is the same as the XGBTRegressor that comes with the causalml python package.
To estimate the average treatment effect (ATE) using BaseTRegressor, we first initiate the BaseTRegressor with the LGBMRegressor, then get the average treatment effect (ATE) and its upper bound and lower bound using the estimate_ate method.
We can see that the estimated average treatment effect (ATE) is 0.58, which is 0.08 higher than the true average treatment effect (ATE). The lower bound for the average treatment effect (ATE) is 0.54 and the upper bound for the average treatment effect (ATE) is 0.62. The confidence interval range is smaller than the confidence interval from the built-in XGBTRegressor.
Average Treatment Effect: 0.58 (0.54, 0.62)
The results show that using the BaseSRegressor in combination with LGBMRegressor produced a better estimation for the average treatment effect (ATE) than the built-in XGBTRegressor.
To estimate the individual treatment effect (ITE), we use the method fit_predict on the light GBM model.
We can also use bootstrap by specifying the bootstrap number, bootstrap size, and setting return_ci=True to get the estimated individual treatment effect (ITE) and the estimated upper and lower bound for each individual.
In step 5, we will talk about how to use a neural network model with T-learner.
The python package causalml has a built-in function MLPTRegressor that runs the multilayer perceptron neural network models with T-learner.
hidden_layer_sizes specifies the number of hidden layers and the number of neurons in each layer. hidden_layer_sizes=(35, 25, 10, 5) means that there are four hidden layers for the neural network model. The first hidden layer has 35 neurons, the second hidden layer has 25 neurons, the third hidden layer has 10 neurons, and the fourth hidden layer has 5 neurons.
learning_rate_init specifies the initial learning rate of the neural network model. We set the initial value to 0.01.
early_stopping=True means the neural network model stops training if the model loss does not improve.
random_state gives us reproducible results.
After initiating the neural network model using MLPTRegressor, we gave it a name nn and run estimate_ate on it to get the average treatment effect (ATE), and the upper and lower bound of the average treatment effect (ATE).
We can see that the estimated average treatment effect (ATE) is 0.58, which is 0.08 higher than the true average treatment effect (ATE), and the same as the light GBM results.
Average Treatment Effect: 0.58 (0.53, 0.64)
Tuning the hyperparameters such as the number of layers, the number of neurons in each layer, and the initial learning rate can potentially improve the model performance.
Calculating the individual treatment effect (ITE) for the neural network model is the same as other T-learner models. We use the method fit_predict on the neural network model to get the individual treatment effect (ITE).
We can also use bootstrap by specifying the bootstrap number, bootstrap size, and setting return_ci=True to get the estimated individual treatment effect (ITE) and the estimated upper and lower bound for each individual.
Step 6: T-learner Neural Network Model Feature Importance
In step 6, we will talk about how to get feature importance for T-learner.
The syntax for getting the feature importance is the same for any modeling algorithm. We will use the neural network model as an example to illustrate the process.
The feature importance is calculated by building a new machine learning model on the backend, where the dependent variable is the individual treatment effect (ITE) and the independent variables are the features of the model.
get_importance is the function to get the feature importance values.
X takes in the feature matrix.
tau takes in the individual treatment effect (ITE).
features=feature_names prints out the feature names in the outputs.
random_state makes the results reproducible.
method specifies whether to use auto or permutation for the feature importance calculation. auto works on a tree-based estimator. It uses the estimator's default feature importance. If no tree-based estimator is provided, it falls back to the LGBMRegressor and gain as the importance type. permutation works on any estimator. It permutes a feature column and calculates the decrease in accuracy. The feature importance is ordered based on the magnitude of the decrease in accuracy. When the sample size is large, downsampling is suggested.
Step 7: T-learner Neural Network Model Interpretation
In step 7, we will interpret the T-learner model using SHAP (SHapley Additive exPlanations).
The syntax for SHAP interpretation is the same for any modeling algorithm. We will use the neural network model as an example to illustrate the process.
The sharpley values are calculated based on a machine learning model, where the dependent variable is the individual treatment effect (ITE) and the independent variables are the features of the model.
plot_shap_values is the function to visualize SHAP values.
X takes in the feature matrix.
tau takes in the individual treatment effect (ITE).
features=feature_names prints out the feature names in the outputs.
The SHAP plot includes both the feature importance and the feature impacts.
The y-axis is the list of features ordered from the most important to the least important.
The x-axis is the SHAP value, representing how each feature impacts the model output.
The color of the dots represents the feature values. Blue indicates low values, and red indicates high values.
The overlapping dots are jittered, which helps us to see the distribution of each feature.
For example, from the SHAP plot we can see that X2 is the most important feature. High X2 values affect the predictions in a positive direction and low X2 values affect the predictions in a negative direction. Most samples have high X2 values.
reacted with thumbs up emoji reacted with thumbs down emoji reacted with laugh emoji reacted with hooray emoji reacted with confused emoji reacted with heart emoji reacted with rocket emoji reacted with eyes emoji
-
Explainable T-learner Deep Learning Uplift Model Using Python Package CausalML
T-learner uplift models using XGBoost, lightGBM, and neural network model with feature importance and model interpretation
Photo by Zdeněk Macháček on Unsplash
T-learner is a meta-learner that uses two machine learning models to estimate the individual-level heterogeneous causal treatment effect. In this tutorial, we will talk about how to use the python package
causalML
to build a T-learner. We will cover:If you are interested in building a T-learner manually, please check out my previous tutorial T Learner Uplift Model for Individual Treatment Effect (ITE) in Python.
Resources for this post:
Let’s get started!
Step 1: Install and Import Libraries
In step 1, we will install and import the python libraries.
Firstly, let’s install
causalml
.Install package
!pip install causalml
After the installation is completed, we can import the libraries.
pandas
andnumpy
are imported for data processing.synthetic_data
is imported for synthetic data creation.XGBTRegressor
,MLPTRegressor
,BaseTRegressor
,LGBMRegressor
andXGBRegressor
are for the machine learning model training.Data processing
Step 2: Create Dataset
In step 2, we will create a synthetic dataset for the T-learner uplift model.
Firstly, a random seed is set to make the synthetic dataset reproducible.
Then, using the
synthetic_data
method from thecausalml
python package, we created a dataset with five features, one treatment variable, and one continuous outcome variable.Set a seed for reproducibility
After that, using
value_counts
on thetreatment
variable, we can see that out of 5000 samples, 2582 units received treatment and 2418 units did not receive treatment.Check treatment vs. control counts
Output:
Finally, we get the true average treatment effect (ATE) by taking the mean of the true individual treatment effect (ITE). The true average treatment effect (ATE) is about 0.5.
True ATE
ite.mean()
Output:
0.4988477022092744
Step 3: T-learner Using XGBoost Model
In step 3, we will use the XGBoost model with T-learner to estimate the average treatment effect (ATE) and the individual treatment effect (ITE).
XGBTRegressor
is a built-in XGBoost T-learner model that comes with thecausalML
package.To estimate the average treatment effect (ATE) using
XGBTRegressor
, we first initiate theXGBTRegressor
, then get the average treatment effect (ATE) and its upper bound and lower bound using theestimate_ate
method.Use XGBTRegressor
We can see that the estimated average treatment effect (ATE) is 0.61, which is 0.11 higher than the true average treatment effect (ATE). The lower bound for the average treatment effect (ATE) is 0.56 and the upper bound for the average treatment effect (ATE) is 0.67.
Average Treatment Effect: 0.61 (0.56, 0.67)
The method
fit_predict
produces the estimated individual treatment effect (ITE).If the confidence interval for the individual treatment effect (ITE) is needed, we can use bootstrap by specifying the bootstrap number, bootstrap size, and setting
return_ci=True
.The output gives us both the estimated individual treatment effect (ITE) and the estimated upper and lower bound for each individual.
ITE
Step 4: T-learner Using Light GBM Model
In step 4, we will talk about how to use
BaseTRegressor
with a light GBM model for the T-learner.BaseTRegressor
is a generalized method that can take in existing machine learning models from packages such assklearn
andxgboost
, and run T-learners with those models. In this step, we will run theBaseTRegressor
withLGBMRegressor
.If we run
BaseTRegressor
withxgboost
, the result is the same as theXGBTRegressor
that comes with thecausalml
python package.To estimate the average treatment effect (ATE) using
BaseTRegressor
, we first initiate theBaseTRegressor
with theLGBMRegressor
, then get the average treatment effect (ATE) and its upper bound and lower bound using theestimate_ate
method.Use LGBMRegressor with BaseSRegressor
We can see that the estimated average treatment effect (ATE) is 0.58, which is 0.08 higher than the true average treatment effect (ATE). The lower bound for the average treatment effect (ATE) is 0.54 and the upper bound for the average treatment effect (ATE) is 0.62. The confidence interval range is smaller than the confidence interval from the built-in
XGBTRegressor
.Average Treatment Effect: 0.58 (0.54, 0.62)
The results show that using the
BaseSRegressor
in combination with LGBMRegressor produced a better estimation for the average treatment effect (ATE) than the built-inXGBTRegressor
.To estimate the individual treatment effect (ITE), we use the method
fit_predict
on the light GBM model.We can also use bootstrap by specifying the bootstrap number, bootstrap size, and setting
return_ci=True
to get the estimated individual treatment effect (ITE) and the estimated upper and lower bound for each individual.ITE
Step 5: T-learner Using Neural Network Model
In step 5, we will talk about how to use a neural network model with T-learner.
The python package
causalml
has a built-in functionMLPTRegressor
that runs the multilayer perceptron neural network models with T-learner.hidden_layer_sizes
specifies the number of hidden layers and the number of neurons in each layer.hidden_layer_sizes=(35, 25, 10, 5)
means that there are four hidden layers for the neural network model. The first hidden layer has 35 neurons, the second hidden layer has 25 neurons, the third hidden layer has 10 neurons, and the fourth hidden layer has 5 neurons.learning_rate_init
specifies the initial learning rate of the neural network model. We set the initial value to 0.01.early_stopping=True
means the neural network model stops training if the model loss does not improve.random_state
gives us reproducible results.After initiating the neural network model using
MLPTRegressor
, we gave it a namenn
and runestimate_ate
on it to get the average treatment effect (ATE), and the upper and lower bound of the average treatment effect (ATE).Use MLPTRegressor with BaseSRegressor
We can see that the estimated average treatment effect (ATE) is 0.58, which is 0.08 higher than the true average treatment effect (ATE), and the same as the light GBM results.
Average Treatment Effect: 0.58 (0.53, 0.64)
Tuning the hyperparameters such as the number of layers, the number of neurons in each layer, and the initial learning rate can potentially improve the model performance.
Calculating the individual treatment effect (ITE) for the neural network model is the same as other T-learner models. We use the method
fit_predict
on the neural network model to get the individual treatment effect (ITE).We can also use bootstrap by specifying the bootstrap number, bootstrap size, and setting return_ci=True to get the estimated individual treatment effect (ITE) and the estimated upper and lower bound for each individual.
ITE
Step 6: T-learner Neural Network Model Feature Importance
In step 6, we will talk about how to get feature importance for T-learner.
The syntax for getting the feature importance is the same for any modeling algorithm. We will use the neural network model as an example to illustrate the process.
The feature importance is calculated by building a new machine learning model on the backend, where the dependent variable is the individual treatment effect (ITE) and the independent variables are the features of the model.
get_importance
is the function to get the feature importance values.X
takes in the feature matrix.tau
takes in the individual treatment effect (ITE).features=feature_names
prints out the feature names in the outputs.random_state
makes the results reproducible.method
specifies whether to useauto
orpermutation
for the feature importance calculation.auto
works on a tree-based estimator. It uses the estimator's default feature importance. If no tree-based estimator is provided, it falls back to theLGBMRegressor
andgain
as the importance type.permutation
works on any estimator. It permutes a feature column and calculates the decrease in accuracy. The feature importance is ordered based on the magnitude of the decrease in accuracy. When the sample size is large, downsampling is suggested.Feature importance using permutation
From the output, we can see that
X2
is the most important feature,X5
is the least important feature.{1: X2 0.975168
X1 0.812584
X3 0.215354
X4 0.097804
X5 0.055254
dtype: float64}
We can also visualize the feature importance using the
plot_importance
function.Visualization
nn.plot_importance(X=X, tau=nn_ite, method='permutation', features=feature_names, random_state=42)
Step 7: T-learner Neural Network Model Interpretation
In step 7, we will interpret the T-learner model using SHAP (SHapley Additive exPlanations).
The syntax for SHAP interpretation is the same for any modeling algorithm. We will use the neural network model as an example to illustrate the process.
The sharpley values are calculated based on a machine learning model, where the dependent variable is the individual treatment effect (ITE) and the independent variables are the features of the model.
plot_shap_values
is the function to visualize SHAP values.X
takes in the feature matrix.tau
takes in the individual treatment effect (ITE).features=feature_names
prints out the feature names in the outputs.Plot shap values
The SHAP plot includes both the feature importance and the feature impacts.
For example, from the SHAP plot we can see that
X2
is the most important feature. HighX2
values affect the predictions in a positive direction and lowX2
values affect the predictions in a negative direction. Most samples have highX2
values.More tutorials are available on GrabNGoInfo YouTube Channel and GrabNGoInfo.com
Beta Was this translation helpful? Give feedback.
All reactions