Skip to content

This Project is aimed at predicting geothermal characteristics for Colombia, specifically the geothermal gradient, based on available geological and well data.

Notifications You must be signed in to change notification settings

jcmefra/Geothermal-Gradient-Machine-Learning

Repository files navigation

Geothermal Gradient Prediction Project

Table of Contents

  1. About the Project
  2. Methodology
  3. Conclussions
  4. Further Improvements

About the Project

This Project is aimed at predicting geothermal characteristics for Colombia, specifically the geothermal gradient, based on available geological and geophysical data. We employ machine learning techniques to make these predictions. This project focuses on predicting the Apparent Geothermal Gradient (°C/Km) as an essential factor in geothermal exploration. The code and the results are in Model.ipynb.

Methodology:

The project utilizes geospatial data, geophysical information, and geothermal measurements. These datasets are located in the data folder of this repository. The data includes information on well depths, temperatures, geological features, and proximity to volcanic structures.

ETL (Extract, Transform, Load)

The ETL phase involves importing the necessary datasets and performing initial preprocessing. This step ensures that the data is in the right format for further analysis.

Feature Engineering

In this phase, we create new relevant features, such as computing distances to volcanoes and estimating Moho depth, which can significantly impact geothermal gradient prediction.

Data Cleaning

Data cleaning involves removing irrelevant or incomplete records and columns to obtain a clean and structured dataset for machine learning.

Machine Learning

We use a XGBoost Regressor to train a machine learning model that predicts apparent geothermal gradients based on the chosen features. The model's hyperparameters are tuned for optimal performance.

Evaluation

We evaluate the model's performance using standard regression metrics like Mean Absolute Error (MAE), Root Mean Squared Error (RMSE), and R-squared (R2), providing insight into its predictive accuracy. We also generate plots for feature importance, actual vs predicted, residuals, etc.

Interpretation

We analyze feature importance, visualize results, and create various plots and graphs to understand the relationship between features and the target variable, facilitating the model's interpretation.

Prediction of new data

We generate a new dataset of 8000 points across the country to predict the geothermal gradient for areas where there are no available data.

Mapa_Final

Conclusions

The project explores the use of Machine Learning to predict the geothermal gradient in areas where wells do not exist and are difficult to access. A favorable result was obtained, with an MAE of 2.68, an RMSE of 3.58, and an R-squared value of 0.55. The major source of dispersion and error is found in the extreme values (high, mainly), which corresponds to the minority in the training data. It is suggested that, with a larger amount of high geothermal gradient data, the model will have a much higher accuracy.

Further Improvements

We aim to refine our machine learning models for better accuracy in geothermal potential assessment, improving the resolution and the quality of the datasets, do further variable analysis to reduce overfitting and replace features for more significant ones if possible.

Feel free to contribute to the project and help us improve geothermal exploration and energy generation.

Full manuscript:

The full manuscript can be accessed at https://doi.org/10.1016/j.geothermics.2024.103074

About

This Project is aimed at predicting geothermal characteristics for Colombia, specifically the geothermal gradient, based on available geological and well data.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published