Multi-Agent Deep Reinforcement Learning with Dynamic Portfolio Weighting: A Novel Approach to Algorithmic Trading

Cemal Öztürk

doi:10.35377/saucis...1825313

EN

Multi-Agent Deep Reinforcement Learning with Dynamic Portfolio Weighting: A Novel Approach to Algorithmic Trading

Abstract

This paper presents a novel ensemble reinforcement learning framework for multi-asset portfolio management, referred to as the Confidence-Weighted Dynamic Ensemble (CWDE). The proposed model integrates five state-of-the-art actor–critic algorithms—PPO, A2C, DDPG, TD3, and SAC—under a dynamic aggregation mechanism that adjusts model weights based on entropy-derived confidence and historical performance. Using a diversified dataset spanning equities, bonds, commodities, and real estate ETFs, CWDE is benchmarked against its constituent DRL agents. Experimental results demonstrate that CWDE outperforms all baselines, achieving the highest risk-adjusted returns and the lowest drawdowns. Statistical analysis confirms the ensemble’s robustness and adaptability to market volatility. The findings highlight CWDE’s potential to serve as a scalable and interpretable framework for trading intelligence. The study concludes with a discussion of computational and practical limitations and outlines future directions for integrating explainability, macroeconomic features, and real-time deployment.

Keywords

References

M. Rezaei and H. Nezamabadi-Pour, “A taxonomy of literature reviews and experimental study of deep reinforcement learning in portfolio management,” Artif. Intell. Rev., vol. 58, 2025, doi: 10.1007/s10462-024-11066-w
A. Aboussalah and C. Lee, “Continuous control with stacked deep dynamic recurrent reinforcement learning for portfolio optimization,” Expert Syst. Appl., vol. 140, Art. no. 112891, 2020, doi: 10.1016/j.eswa.2019.112891
Y. Zhang, P. Zhao, Q. Wu, B. Li, J. Huang, and M. Tan, “Cost-sensitive portfolio selection via deep reinforcement learning,” IEEE Trans. Knowl. Data Eng., vol. 34, no. 10, pp. 1–12, 2020, doi: 10.1109/tkde.2020.2979700
A. Aboussalah, Z. Xu, and C. Lee, “What is the value of the cross-sectional approach to deep reinforcement learning?,” Quant. Finance, vol. 22, no. 12, pp. 1–14, 2020, doi: 10.1080/14697688.2021.2001032
Y. Jiang, J. Olmo, and M. Atwi, “Deep reinforcement learning for portfolio selection,” Global Finance J., vol. 65, Art. no. 101016, 2024, doi: 10.1016/j.gfj.2024.101016
Y. Lin, C. Chen, C. Sang, and S. Huang, “Multiagent-based deep reinforcement learning for risk-shifting portfolio management,” Appl. Soft Comput., vol. 123, Art. no. 108894, 2022, doi: 10.1016/j.asoc.2022.108894
C. Chen, J. Zhang, Z. Li, and S. Xu, “Multi-agent deep reinforcement learning algorithm with trend consistency regularization for portfolio management,” Neural Comput. Appl., vol. 35, no. 8, pp. 1–18, 2022, doi: 10.1007/s00521-022-08011-9
T. Zhao, X. Xu, X. Li, and C. Zhang, “Asset correlation-based deep reinforcement learning for the portfolio selection,” Expert Syst. Appl., vol. 221, Art. no. 119707, 2023, doi: 10.1016/j.eswa.2023.119707

C. Baca and W. Chen, “Deep reinforcement learning for portfolio management of markets with a dynamic number of assets,” Expert Syst. Appl., vol. 164, Art. no. 114002, 2021, doi: 10.1016/j.eswa.2020.114002
Z. Huang and F. Tanaka, “MSPM: A modularized and scalable multi-agent reinforcement learning-based system for financial portfolio management,” PLOS ONE, vol. 17, no. 2, Art. no. e0263689, 2021, doi: 10.1371/journal.pone.0263689
S. Shi, J. Li, G. Li, P. Pan, Q. Chen, and Q. Sun, “GPM: A graph convolutional network-based reinforcement learning framework for portfolio management,” Neurocomputing, vol. 498, pp. 105–118, 2022, doi: 10.1016/j.neucom.2022.04.105
S. Yang, “Deep reinforcement learning for portfolio management,” Knowl.-Based Syst., vol. 278, Art. no. 110905, 2023, doi: 10.1016/j.knosys.2023.110905
Q. Sun, X. Wei, and X. Yang, “GraphSAGE with deep reinforcement learning for financial portfolio optimization,” Expert Syst. Appl., vol. 238, Art. no. 122027, 2023, doi: 10.1016/j.eswa.2023.122027
C. Alzaman, “Optimizing portfolio selection through stock ranking and matching: A reinforcement learning approach,” Expert Syst. Appl., vol. 269, Art. no. 126430, 2025, doi: 10.1016/j.eswa.2025.126430
A. De La Rica Escudero, E. C. Garrido-Merchán, and M. Coronado-Vaca, “Explainable post hoc portfolio management financial policy of a deep reinforcement learning agent,” PLOS ONE, vol. 20, no. 1, Art. no. e0315528, 2025, doi: 10.1371/journal.pone.0315528
M. Guan and X. Liu, “Explainable deep reinforcement learning for portfolio management: An empirical approach,” in Proc. 2nd ACM Int. Conf. AI Finance, 2021, pp. 102–109, doi: 10.1145/3490354.3494415
J. Jang and N. Seong, “Deep reinforcement learning for stock portfolio optimization by connecting with modern portfolio theory,” Expert Syst. Appl., vol. 218, Art. no. 119556, 2023, doi: 10.1016/j.eswa.2023.119556
V. Ngo, H. Nguyen, and P. V. Nguyen, “Does reinforcement learning outperform deep learning and traditional portfolio optimization models in frontier and developed financial markets?,” Res. Int. Bus. Finance, vol. 67, Art. no. 101936, 2023, doi: 10.1016/j.ribaf.2023.101936
M. Wu, J. Syu, J. Lin, and J. Ho, “Portfolio management system in equity market neutral using reinforcement learning,” Appl. Intell., vol. 51, no. 5, pp. 1–17, 2021, doi: 10.1007/s10489-021-02262-0
G. Santos, D. Garruti, F. Barboza, K. De Souza, J. Domingos, and A. Veiga, “Management of investment portfolios employing reinforcement learning,” PeerJ Comput. Sci., vol. 9, Art. no. e1695, 2023, doi: 10.7717/peerj-cs.1695
T. Cui, N. Du, X. Yang, and S. Ding, “Multi-period portfolio optimization using a deep reinforcement learning hyper-heuristic approach,” Technol. Forecast. Soc. Change, vol. 198, Art. no. 122944, 2024, doi: 10.1016/j.techfore.2023.122944
M. Kang, G. Templeton, D. Kwak, and S. Um, “Development of an AI framework using neural process continuous reinforcement learning to optimize highly volatile financial portfolios,” Knowl.-Based Syst., vol. 300, Art. no. 112017, 2024, doi: 10.1016/j.knosys.2024.112017
W. Jiang, M. Liu, M. Xu, S. Chen, K. Shi, P. Liu, C. Zhang, and F. Zhao, “New reinforcement learning based on representation transfer for portfolio management,” Knowl.-Based Syst., vol. 293, Art. no. 111697, 2024, doi: 10.1016/j.knosys.2024.111697
R. Sun, Y. Xi, A. Stefanidis, Z. Jiang, and J. Su, “A novel multi-agent dynamic portfolio optimization learning system based on hierarchical deep reinforcement learning,” arXiv preprint arXiv:2501.06832, 2025, doi: 10.48550/arxiv.2501.06832
T. Bauman, L. Mrčela, S. Goluža, and Z. Kostanjčar, “A deep learning approach to goal-based portfolio optimization in non-stationary environments,” IEEE Access, vol. 13, Art. no. 3588247, 2025, doi: 10.1109/access.2025.3588247
Y. Liu, D. Mikriukov, O. Tjahyadi, G. Li, T. Payne, Y. Yue, K. Siddique, and K. Man, “Revolutionising financial portfolio management: The non-stationary transformer’s fusion of macroeconomic indicators and sentiment analysis in a deep reinforcement learning framework,” Appl. Sci., vol. 14, no. 1, Art. no. 274, 2023, doi: 10.3390/app14010274
C. Chen and S. Nan, “Dynamic graph reinforcement learning algorithm for portfolio management: A novel time-frequency correlated model,” Finance Res. Lett., vol. 60, Art. no. 105373, 2024, doi: 10.1016/j.frl.2024.105373
L. Cheng and J. Sun, “Multiagent-based deep reinforcement learning framework for multi-asset adaptive trading and portfolio management,” Neurocomputing, vol. 594, Art. no. 127800, 2024, doi: 10.1016/j.neucom.2024.127800
G. Kruthof and S. Müller, “Can deep reinforcement learning beat 1/N?,” Finance Res. Lett., vol. 70, Art. no. 106866, 2025, doi: 10.1016/j.frl.2025.106866
H. Yang, X.-Y. Liu, S. Zhong, and A. Walid, “Deep reinforcement learning for automated stock trading: An ensemble strategy,” in Proc. 1st ACM Int. Conf. AI Finance (ICAIF), New York, NY, USA: ACM, 2020, pp. 1–8, doi: 10.1145/3383455.3422540
A. Tuck, “YFinance: Yahoo! Finance market data downloader,” GitHub repository, 2025. [Online]. Available: https://github.com/ranaroussi/yfinance. Accessed: Nov. 3, 2025.
M.-C. Lee and R. J. Xu, “Cryptocurrency price forecasting using deep learning model with technical indicators,” in Proc. Int. Conf. Electron., Inf., Commun. (ICEIC), Busan, Korea, 2025, pp. 1–4.
S. Albahli, T. Nazir, M. Nawaz, and A. Irtaza, “An improved DenseNet model for prediction of stock market using stock technical indicators,” Expert Syst. Appl., vol. 232, Art. no. 120903, 2023, doi: 10.1016/j.eswa.2023.120903
P. Chowdhury, “Analytical detection of smart stock trading system utilizing AI model,” Int. J. Sci. Res. Eng. Manag., vol. 11, no. 2, pp. 1–7, 2024, doi: 10.55041/ijsrem34829
S. Albahli, T. Nazir, A. Mehmood, A. Irtaza, A. Alkhalifah, and W. Albattah, “AEI-DNET: A novel DenseNet model with an autoencoder for stock market predictions using stock technical indicators,” Electronics, vol. 11, no. 4, pp. 611–621, 2022, doi: 10.3390/electronics11040611
S. Guo, “An achievable portfolio trading strategy,” Highlights Bus. Econ. Manag., vol. 1, pp. 25–33, 2024, doi: 10.54097/05fjgd13
G. Brockman, V. Cheung, L. Pettersson, J. Schneider, J. Schulman, J. Tang, and W. Zaremba, “OpenAI Gym,” arXiv preprint arXiv:1606.01540, 2016.
J. W. Wilder, New concepts in technical trading systems. Greensboro, NC, USA: Trend Research, 1978.
RA. Raffin, A. Hill, M. Ernestus, A. Gleave, A. Kanervisto, N. Dormann, and M. Plappert, “Stable-Baselines3: Reliable reinforcement learning implementations,” J. Mach. Learn. Res., vol. 22, no. 268, pp. 1–8, 2021, doi: 10.48550/arXiv.2007.01895
M. Kritzman and Y. Li, “Skulls, financial turbulence, and risk management,” Financial Analysts J., vol. 66, no. 5, pp. 30–41, 2010, doi: 10.2469/faj.v66.n5.1
A. Paszke et al., “PyTorch: An imperative style, high-performance deep learning library,” in Adv. Neural Inf. Process. Syst. (NeurIPS), vol. 32, 2019, pp. 8024–8035.
V. Mnih, A. P. Badia, M. Mirza, A. Graves, T. Lillicrap, T. Harley, D. Silver, and K. Kavukcuoglu, “Asynchronous methods for deep reinforcement learning,” in Proc. Int. Conf. Mach. Learn. (ICML), 2016, pp. 1928–1937, doi: 10.48550/arXiv.1602.01783
T. P. Lillicrap, J. J. Hunt, A. Pritzel, N. Heess, T. Erez, Y. Tassa, D. Silver, and D. Wierstra, “Continuous control with deep reinforcement learning,” arXiv preprint arXiv:1509.02971, 2015, doi: 10.48550/arXiv.1509.02971
S. Fujimoto, H. van Hoof, and D. Meger, “Addressing function approximation error in actor–critic methods,” in Proc. Int. Conf. Mach. Learn. (ICML), 2018, pp. 1582–1591, doi: 10.48550/arXiv.1707.06347
Fujimoto, S., Van Hoof, H. and Meger, D., “Addressing Function Approximation Error in Actor–Critic Methods”, Proceedings of the International Conference on Machine Learning (ICML), 1582–1591, (2018), doi: 10.48550/arXiv.1802.09477
T. Haarnoja, A. Zhou, P. Abbeel, and S. Levine, “Soft actor–critic: Off-policy maximum entropy deep reinforcement learning with a stochastic actor,” arXiv preprint arXiv:1801.01290, 2018, doi: 10.48550/arXiv.1801.01290
W. F. Sharpe, “The Sharpe ratio,” J. Portfolio Manag., vol. 21, no. 1, pp. 49–58, 1994, doi: 10.3905/jpm.1994.409501
F. A. Sortino and L. N. Price, “Performance measurement in a downside risk framework,” J. Investing, vol. 3, no. 3, pp. 59–64, 1994, doi: 10.3905/joi.3.3.59
X. Fang, W. Lin, and X. Lin, “Deep reinforcement learning for financial portfolio management under uncertainty,” Expert Syst. Appl., vol. 168, Art. no. 114356, 2021, doi: 10.1016/j.eswa.2020.114356

Details

Primary Language

English

Subjects

Artificial Intelligence (Other)

Journal Section

Research Article

Authors

Cemal Öztürk ^*
0000-0003-3850-7416
Türkiye

Early Pub Date

June 15, 2026

Publication Date

June 17, 2026

Submission Date

November 17, 2025

Acceptance Date

February 2, 2026

Published in Issue

Year 2026 Volume: 9 Number: 2

DOI

https://doi.org/10.35377/saucis...1825313

IZ

https://izlik.org/JA27SS74ZT

Cite

RIS / Bibtex

APA

Öztürk, C. (2026). Multi-Agent Deep Reinforcement Learning with Dynamic Portfolio Weighting: A Novel Approach to Algorithmic Trading. Sakarya University Journal of Computer and Information Sciences, 9(2), 591-608. https://doi.org/10.35377/saucis...1825313

AMA

1.Öztürk C. Multi-Agent Deep Reinforcement Learning with Dynamic Portfolio Weighting: A Novel Approach to Algorithmic Trading. SAUCIS. 2026;9(2):591-608. doi:10.35377/saucis.1825313

Chicago

Öztürk, Cemal. 2026. “Multi-Agent Deep Reinforcement Learning With Dynamic Portfolio Weighting: A Novel Approach to Algorithmic Trading”. Sakarya University Journal of Computer and Information Sciences 9 (2): 591-608. https://doi.org/10.35377/saucis. 1825313.

EndNote

Öztürk C (June 1, 2026) Multi-Agent Deep Reinforcement Learning with Dynamic Portfolio Weighting: A Novel Approach to Algorithmic Trading. Sakarya University Journal of Computer and Information Sciences 9 2 591–608.

IEEE

[1]C. Öztürk, “Multi-Agent Deep Reinforcement Learning with Dynamic Portfolio Weighting: A Novel Approach to Algorithmic Trading”, SAUCIS, vol. 9, no. 2, pp. 591–608, June 2026, doi: 10.35377/saucis...1825313.

ISNAD

Öztürk, Cemal. “Multi-Agent Deep Reinforcement Learning With Dynamic Portfolio Weighting: A Novel Approach to Algorithmic Trading”. Sakarya University Journal of Computer and Information Sciences 9/2 (June 1, 2026): 591-608. https://doi.org/10.35377/saucis. 1825313.

JAMA

1.Öztürk C. Multi-Agent Deep Reinforcement Learning with Dynamic Portfolio Weighting: A Novel Approach to Algorithmic Trading. SAUCIS. 2026;9:591–608.

MLA

Öztürk, Cemal. “Multi-Agent Deep Reinforcement Learning With Dynamic Portfolio Weighting: A Novel Approach to Algorithmic Trading”. Sakarya University Journal of Computer and Information Sciences, vol. 9, no. 2, June 2026, pp. 591-08, doi:10.35377/saucis. 1825313.

Vancouver

1.Cemal Öztürk. Multi-Agent Deep Reinforcement Learning with Dynamic Portfolio Weighting: A Novel Approach to Algorithmic Trading. SAUCIS. 2026 Jun. 1;9(2):591-608. doi:10.35377/saucis. 1825313