Recording of recent research papers related to reinforcement learning for operations research problems.
RL for inventory management
- F. Stranieri, F. Stella, and C. Kouki, “Performance of deep reinforcement learning algorithms in two-echelon inventory control systems,” Int. J. Prod. Res., vol. 62, no. 17, pp. 6211–6226, Sept. 2024. Code
- Problem: two-echelon inventory control systems, seasonal demand, multi-products
- Method:
- MDP formulation; DRL algorithms; Balance allocation rule;
- BO for heuristic policies;
- X. Liu, C. Alexopoulos, and Y. Peng, “A simulation-driven machine learning framework for large-scale inventory management,” Ann. Oper. Res., pp. 1–27, Oct. 2025. Code
- Imitation learning with target heuristic policies
- Real data from JD.com (not public)
- Problem: Multi-product, single and multi-echelon
- Computational complexity; Optimiality proofs;
- T. Temizöz, C. Imdahl, R. Dijkman, D. Lamghari-Idrissi, and W. van Jaarsveld, “Deep controlled learning for inventory control,” Eur. J. Oper. Res., vol. 324, no. 1, pp. 104–117, July 2025. Code written in C++
- Problem: lost sales, perishable inventory, and random lead times
- Methods:
- New algorithm, Deep controlled learning, for Input-Driven MDPs
- RL as classification problem
- H. Dehaybe, D. Catanzaro, and P. Chevalier, “Deep Reinforcement Learning for inventory optimization with non-stationary uncertain demand,” Eur. J. Oper. Res., vol. 314, no. 2, pp. 433–445, Apr. 2024. Code
- Problem:
- Single-Item Stochastic Lot-Sizing Problem (SISLSP) with non-stationary uncertain demand
- Methods:
- State Embedding of Forecast Windows
- Problem:
- I. Kaynov, M. van Knippenberg, V. Menkovski, A. van Breemen, and W. van Jaarsveld, “Deep Reinforcement Learning for One-Warehouse Multi-Retailer inventory management,” Int. J. Prod. Econ., vol. 267, no. 109088, p. 109088, Jan. 2024.
- Problem: One-Warehouse Multi-Retailer (OWMR) inventory management
- Methods:
- Sequential allocation rule
- Experiments:
- Shows the proportional allocation rule does not work well and the sequential allocation rule performs better
RL for operations research problems
- A. Ramanujam et al., “SafeOR-Gym: A benchmark suite for safe reinforcement learning algorithms on practical operations research problems,” arXiv [cs.LG], 02-June-2025. Code
- Problems: 9 OR environments
- Methods: Safe RL algorithms
- Constrained Markov Decision Process (CMDP)
- Constraints handling methods
- Constraint RL algorithms