Zero-Shot LLM Sentiment and Reasoning Feature Extraction for Stock Market Prediction: A Multi-Stock XGBoost Framework with SHAP Explainability

Siddharth Jain; Kamalpreet Kaur

doi:10.2991/978-94-6239-697-5_34

<Previous Article In Volume

Next Article In Volume>

Zero-Shot LLM Sentiment and Reasoning Feature Extraction for Stock Market Prediction: A Multi-Stock XGBoost Framework with SHAP Explainability

Authors

Siddharth Jain¹^{, *}, Kamalpreet Kaur²

¹Research Scholar, Department of Computer Science and Engineering, Lovely Professional University, Phagwara, 144411, Punjab, India

²Assistant Professor, Department of Computer Science and Engineering, Lovely Professional University, Phagwara, 144411, Punjab, India

^*Corresponding author. Email: siddharthjain139@gmail.com

Corresponding Author

Siddharth Jain

Available Online 4 June 2026.

DOI: 10.2991/978-94-6239-697-5_34 How to use a DOI?
Keywords: Zero-shot sentiment analysis; Natural Language Inference; FinBERT; DeBERTa-v3; XGBoost; SHAP explainability; Stock prediction; Walk-forward validation
Abstract: Predicting short-term stock price movements remains a formidable challenge due to the non-stationary, noisy, and high-dimensional nature of financial time series. While large language models (LLMs) have shown strong capabilities in financial sentiment analysis, most existing hybrid methods compress their outputs into single sentiment scores, losing important distributional information and failing to capture the reasoning behind market sentiment. This paper proposes a dual-LLM feature extraction framework that integrates FinBERT-based sentiment analysis with DeBERTa-v3 zero-shot Natural Language Inference (NLI) to extract multi-dimensional reasoning signals from financial news. Each headline is mapped into six reasoning categories earnings upon financial performance, product upon innovation, market upon macroeconomics, analyst upon ratings, regulatory upon risk, and growth upon expansion—producing a reasoning feature vector that captures not only what the sentiment is but why it occurs. These LLM-derived features are combined with 26 technical indicators and used to train an XGBoost classifier with SHAP (SHaply Additive exPlanations) explainability. The framework is evaluated on five U.S. stocks (NVDA, AAPL, MSFT, TSLA, and JPM) from January 2019 to December 2024 using five-fold walk-forward cross-validation. The full model achieves a mean accuracy of 58.0%, a mean AUC-ROC of 0.621, and a mean Sharpe ratio of 1.80. Ablation studies show that removing reasoning features decreases accuracy by 2.1 percentage points (p < 0.05), while removing all LLM features reduces it by 5.9 percentage points (p < 0.01). SHAP analysis indicates contributions of 38.7% from sentiment features, 35.2% from technical indicators, and 26.1% from reasoning features. The proposed framework consistently outperforms LSTM, Transformer, Random Forest, and Logistic Regression baselines across all evaluated stocks.
Copyright: © 2026 The Author(s)
Open Access: Open Access This chapter is licensed under the terms of the Creative Commons Attribution-NonCommercial 4.0 International License (http://creativecommons.org/licenses/by-nc/4.0/), which permits any noncommercial use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.

Download article (PDF)

<Previous Article In Volume

Next Article In Volume>

Volume Title: Proceedings of the Conference on Bridging Engineering Disciplines with AI and Machine Learning (BEDAIML 2026)
Series: Advances in Intelligent Systems Research
Publication Date: 4 June 2026
ISBN: 978-94-6239-697-5
ISSN: 1951-6851
DOI: 10.2991/978-94-6239-697-5_34 How to use a DOI?
Open Access: Open Access This chapter is licensed under the terms of the Creative Commons Attribution-NonCommercial 4.0 International License (http://creativecommons.org/licenses/by-nc/4.0/), which permits any noncommercial use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.

Cite this article

ris enw bib

TY  - CONF
AU  - Siddharth Jain
AU  - Kamalpreet Kaur
PY  - 2026
DA  - 2026/06/04
TI  - Zero-Shot LLM Sentiment and Reasoning Feature Extraction for Stock Market Prediction: A Multi-Stock XGBoost Framework with SHAP Explainability
BT  - Proceedings of the Conference on Bridging Engineering Disciplines with AI and Machine Learning (BEDAIML 2026)
PB  - Atlantis Press
SP  - 410
EP  - 423
SN  - 1951-6851
UR  - https://doi.org/10.2991/978-94-6239-697-5_34
DO  - 10.2991/978-94-6239-697-5_34
ID  - Jain2026
ER  -

download .riscopy to clipboard