Proceedings of the Conference on Bridging Engineering Disciplines with AI and Machine Learning (BEDAIML 2026)

Zero-Shot LLM Sentiment and Reasoning Feature Extraction for Stock Market Prediction: A Multi-Stock XGBoost Framework with SHAP Explainability

Authors
Siddharth Jain1, *, Kamalpreet Kaur2
1Research Scholar, Department of Computer Science and Engineering, Lovely Professional University, Phagwara, 144411, Punjab, India
2Assistant Professor, Department of Computer Science and Engineering, Lovely Professional University, Phagwara, 144411, Punjab, India
*Corresponding author. Email: siddharthjain139@gmail.com
Corresponding Author
Siddharth Jain
Available Online 4 June 2026.
DOI
10.2991/978-94-6239-697-5_34How to use a DOI?
Keywords
Zero-shot sentiment analysis; Natural Language Inference; FinBERT; DeBERTa-v3; XGBoost; SHAP explainability; Stock prediction; Walk-forward validation
Abstract

Predicting short-term stock price movements remains a formidable challenge due to the non-stationary, noisy, and high-dimensional nature of financial time series. While large language models (LLMs) have shown strong capabilities in financial sentiment analysis, most existing hybrid methods compress their outputs into single sentiment scores, losing important distributional information and failing to capture the reasoning behind market sentiment. This paper proposes a dual-LLM feature extraction framework that integrates FinBERT-based sentiment analysis with DeBERTa-v3 zero-shot Natural Language Inference (NLI) to extract multi-dimensional reasoning signals from financial news. Each headline is mapped into six reasoning categories earnings upon financial performance, product upon innovation, market upon macroeconomics, analyst upon ratings, regulatory upon risk, and growth upon expansion—producing a reasoning feature vector that captures not only what the sentiment is but why it occurs. These LLM-derived features are combined with 26 technical indicators and used to train an XGBoost classifier with SHAP (SHaply Additive exPlanations) explainability. The framework is evaluated on five U.S. stocks (NVDA, AAPL, MSFT, TSLA, and JPM) from January 2019 to December 2024 using five-fold walk-forward cross-validation. The full model achieves a mean accuracy of 58.0%, a mean AUC-ROC of 0.621, and a mean Sharpe ratio of 1.80. Ablation studies show that removing reasoning features decreases accuracy by 2.1 percentage points (p < 0.05), while removing all LLM features reduces it by 5.9 percentage points (p < 0.01). SHAP analysis indicates contributions of 38.7% from sentiment features, 35.2% from technical indicators, and 26.1% from reasoning features. The proposed framework consistently outperforms LSTM, Transformer, Random Forest, and Logistic Regression baselines across all evaluated stocks.

Copyright
© 2026 The Author(s)
Open Access
Open Access This chapter is licensed under the terms of the Creative Commons Attribution-NonCommercial 4.0 International License (http://creativecommons.org/licenses/by-nc/4.0/), which permits any noncommercial use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.

Download article (PDF)

Volume Title
Proceedings of the Conference on Bridging Engineering Disciplines with AI and Machine Learning (BEDAIML 2026)
Series
Advances in Intelligent Systems Research
Publication Date
4 June 2026
ISBN
978-94-6239-697-5
ISSN
1951-6851
DOI
10.2991/978-94-6239-697-5_34How to use a DOI?
Copyright
© 2026 The Author(s)
Open Access
Open Access This chapter is licensed under the terms of the Creative Commons Attribution-NonCommercial 4.0 International License (http://creativecommons.org/licenses/by-nc/4.0/), which permits any noncommercial use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.

Cite this article

TY  - CONF
AU  - Siddharth Jain
AU  - Kamalpreet Kaur
PY  - 2026
DA  - 2026/06/04
TI  - Zero-Shot LLM Sentiment and Reasoning Feature Extraction for Stock Market Prediction: A Multi-Stock XGBoost Framework with SHAP Explainability
BT  - Proceedings of the Conference on Bridging Engineering Disciplines with AI and Machine Learning (BEDAIML 2026)
PB  - Atlantis Press
SP  - 410
EP  - 423
SN  - 1951-6851
UR  - https://doi.org/10.2991/978-94-6239-697-5_34
DO  - 10.2991/978-94-6239-697-5_34
ID  - Jain2026
ER  -