Research · AI & ESG

Can Machine Learning Predict Corporate Carbon Emissions?

New research shows XGBoost outperforms other models for forecasting carbon emissions — and corporate governance metrics can fill the gap when disclosure data is missing.

Published paper: Yishuang Xu, Peng Wei, Yinze Ji — "Machine learning for predicting corporate carbon emissions: The role of corporate governance," Journal of Digital Economy, Volume 5, 2026, pp. 1–16.
Read the full paper →

ESG data remains remarkably inconsistent across companies and markets. Some firms report detailed carbon emissions. Others give partial figures. Some barely report anything at all. That creates a real problem for investors who want to compare companies on climate risk, efficiency, and sustainability.

This paper asks a direct question: can machine learning fill the gap? The answer is yes — and the way it does so reveals something important about the relationship between how companies are governed and how they manage their environmental footprint.

The study: what the researchers tested

The authors tested multiple machine learning models to forecast carbon emission intensity for publicly listed firms. The models were evaluated across both pre-pandemic and post-pandemic periods to check whether they hold up when the business environment fundamentally changes.

The key result: XGBoost, a non-linear ensemble model, consistently outperformed other algorithms. This isn't just a statistical win — it means that carbon emissions are shaped by complex, non-linear relationships that simpler linear approaches miss. And the model remains robust across different economic regimes, which matters for real-world deployment.

Governance as a proxy for emissions

The most striking finding is what happens when direct emissions data are unavailable. The authors found that corporate governance metrics can serve as powerful proxies for environmental performance.

The governance variables that stood out:

These variables carry meaningful signals about a firm's likely carbon profile. In other words, the way a company is structured and managed tells you something about how it handles its carbon footprint — even before you see the emissions data.

Why this matters for investors

If emissions data are missing or unreliable, you don't have to stop your ESG analysis. Governance data can serve as a meaningful signal, combined with ML models that predict likely emission intensity. This is especially valuable in markets where disclosure standards are weak or inconsistent.

Why management quality predicts carbon performance

Firms with better board oversight and stronger governance structures tend to be more disciplined about monitoring emissions, responding to climate risks, and implementing environmental controls. Management quality isn't just about decision-making — it reflects broader operational discipline that extends to environmental performance.

This confirms an intuition many ESG practitioners share: governance is not a box-ticking exercise. It actually reflects how well a company manages its environmental footprint.

Who should pay attention

AudienceWhat this means for you
ESG fund managersUse governance data to screen companies when emissions disclosure is incomplete — don't wait for perfect data
Climate-risk analystsXGBoost models can benchmark likely emission intensity across portfolios with uneven disclosure
Credit teamsGovernance metrics add a predictive layer to credit assessments for climate-exposed sectors
PolicymakersAI-based methods can identify firms needing closer scrutiny and support sustainable finance regulation

The bigger picture

This research points toward a more sophisticated view of sustainability analysis — one that is data-driven, adaptive, and predictive. Instead of treating emissions as a single number, you can think about the systems and structures that shape them.

Machine learning won't solve every disclosure problem. But it can help investors and policymakers make better judgments in a world where carbon data are often incomplete. And as reporting standards improve, these models will become even more accurate.

The combination of machine learning, governance data, and carbon analysis is where ESG is heading. Not just reporting better numbers — but building smarter systems that work with imperfect data.

Explore more on AI and ESG in real estate

Sherry Xu's books cover the practical intersection of AI, sustainability, and real estate investment.

Frequently asked questions

Can machine learning predict corporate carbon emissions?
Yes. Research testing multiple ML models on publicly listed firms shows that XGBoost, a non-linear ensemble model, outperforms other algorithms in predicting carbon emission intensity across both pre-pandemic and post-pandemic periods.
What can investors use when emissions data is missing?
Corporate governance metrics — including board characteristics, oversight structures, and management quality — can serve as meaningful proxies for environmental performance when direct emissions data are unavailable or unreliable.
Why does corporate governance predict carbon performance?
Firms with better board oversight and stronger governance structures tend to be more disciplined about monitoring emissions, responding to climate risks, and implementing environmental controls. Management quality reflects broader operational discipline that includes environmental performance.
Which machine learning model is best for carbon emission forecasting?
XGBoost consistently outperformed other algorithms tested, including linear models. This suggests carbon emissions are shaped by complex, non-linear relationships that ensemble models capture better than simpler approaches.

Get notified when I publish new work

New research articles, books, and practitioner tools on AI, ESG, and real estate — straight to your inbox. No fixed schedule, no spam. Written by a researcher who also builds the tools.

Subscribe (free) →