Recording of research papers that I have read, with a brief summary and my thoughts. I will update this page regularly as I read more papers.
Table of contents
Reinforcement learning
RL for inventory management
- B. Rolf, I. Jackson, M. Müller, S. Lang, T. Reggelin, and D. Ivanov, “A review on reinforcement learning algorithms and applications in supply chain management,” Int. J. Prod. Res., vol. 61, no. 20, pp. 7151–7179, Oct. 2023.
- Review paper
- F. Stranieri, F. Stella, and C. Kouki, “Performance of deep reinforcement learning algorithms in two-echelon inventory control systems,” Int. J. Prod. Res., vol. 62, no. 17, pp. 6211–6226, Sept. 2024. Code
- Problem: two-echelon inventory control systems, seasonal demand, multi-products
- Method:
- MDP formulation; DRL algorithms; Balance allocation rule;
- BO for heuristic policies;
- X. Liu, C. Alexopoulos, and Y. Peng, “A simulation-driven machine learning framework for large-scale inventory management,” Ann. Oper. Res., pp. 1–27, Oct. 2025. Code
- Imitation learning with target heuristic policies
- Real data from JD.com (not public)
- Problem: Multi-product, single and multi-echelon
- Computational complexity; Optimiality proofs;
- T. Temizöz, C. Imdahl, R. Dijkman, D. Lamghari-Idrissi, and W. van Jaarsveld, “Deep controlled learning for inventory control,” Eur. J. Oper. Res., vol. 324, no. 1, pp. 104–117, July 2025. Code written in C++
- Problem: lost sales, perishable inventory, and random lead times
- Methods:
- New algorithm, Deep controlled learning, for Input-Driven MDPs
- RL as classification problem
- H. Dehaybe, D. Catanzaro, and P. Chevalier, “Deep Reinforcement Learning for inventory optimization with non-stationary uncertain demand,” Eur. J. Oper. Res., vol. 314, no. 2, pp. 433–445, Apr. 2024. Code
- Problem:
- Single-Item Stochastic Lot-Sizing Problem (SISLSP) with non-stationary uncertain demand
- Methods:
- State Embedding of Forecast Windows
- Problem:
- I. Kaynov, M. van Knippenberg, V. Menkovski, A. van Breemen, and W. van Jaarsveld, “Deep Reinforcement Learning for One-Warehouse Multi-Retailer inventory management,” Int. J. Prod. Econ., vol. 267, no. 109088, p. 109088, Jan. 2024.
- Problem: One-Warehouse Multi-Retailer (OWMR) inventory management
- Methods:
- Sequential allocation rule
- Experiments:
- Shows the proportional allocation rule does not work well and the sequential allocation rule performs better
RL for operations research problems
- A. Ramanujam et al., “SafeOR-Gym: A benchmark suite for safe reinforcement learning algorithms on practical operations research problems,” arXiv [cs.LG], 02-June-2025. Code
- Problems: 9 OR environments
- Methods: Safe RL algorithms
- Constrained Markov Decision Process (CMDP)
- Constraints handling methods
- Constraint RL algorithms
Federated learning
Federated reinforcement learning
- H. Jin, Y. Peng, W. Yang, S. Wang, and Z. Zhang, “Federated Reinforcement Learning with environment heterogeneity,” arXiv [cs.LG], 2022. code
- Problem setting: \(n\) agents located in \(n\) different environments, with the same state space, action space, reward function, but different transition dynamics.
- Algorithm: Learn a uniformly good policy (QAvg and PAvg) and personalization (embedding-based method, applied to DQNAvg and DDPGAvg).
Federated learning for supply chain management
- H. Wang, F. Xie, Q. Duan, and J. Li, “Federated learning for supply chain demand forecasting,” Math. Probl. Eng., vol. 2022, pp. 1–8, Nov. 2022. Code
- vertical federated LSTM model.