Papers

research
Author

Ziang Liu

Published

February 23, 2026

Recording of research papers that I have read, with a brief summary and my thoughts. I will update this page regularly as I read more papers.

Table of contents

Reinforcement learning

RL for inventory management

  • B. Rolf, I. Jackson, M. Müller, S. Lang, T. Reggelin, and D. Ivanov, “A review on reinforcement learning algorithms and applications in supply chain management,” Int. J. Prod. Res., vol. 61, no. 20, pp. 7151–7179, Oct. 2023.
    • Review paper
  • F. Stranieri, F. Stella, and C. Kouki, “Performance of deep reinforcement learning algorithms in two-echelon inventory control systems,” Int. J. Prod. Res., vol. 62, no. 17, pp. 6211–6226, Sept. 2024. Code
    • Problem: two-echelon inventory control systems, seasonal demand, multi-products
    • Method:
      • MDP formulation; DRL algorithms; Balance allocation rule;
      • BO for heuristic policies;
  • X. Liu, C. Alexopoulos, and Y. Peng, “A simulation-driven machine learning framework for large-scale inventory management,” Ann. Oper. Res., pp. 1–27, Oct. 2025. Code
    • Imitation learning with target heuristic policies
    • Real data from JD.com (not public)
    • Problem: Multi-product, single and multi-echelon
    • Computational complexity; Optimiality proofs;
  • T. Temizöz, C. Imdahl, R. Dijkman, D. Lamghari-Idrissi, and W. van Jaarsveld, “Deep controlled learning for inventory control,” Eur. J. Oper. Res., vol. 324, no. 1, pp. 104–117, July 2025. Code written in C++
    • Problem: lost sales, perishable inventory, and random lead times
    • Methods:
      • New algorithm, Deep controlled learning, for Input-Driven MDPs
      • RL as classification problem
  • H. Dehaybe, D. Catanzaro, and P. Chevalier, “Deep Reinforcement Learning for inventory optimization with non-stationary uncertain demand,” Eur. J. Oper. Res., vol. 314, no. 2, pp. 433–445, Apr. 2024. Code
    • Problem:
      • Single-Item Stochastic Lot-Sizing Problem (SISLSP) with non-stationary uncertain demand
    • Methods:
      • State Embedding of Forecast Windows
  • I. Kaynov, M. van Knippenberg, V. Menkovski, A. van Breemen, and W. van Jaarsveld, “Deep Reinforcement Learning for One-Warehouse Multi-Retailer inventory management,” Int. J. Prod. Econ., vol. 267, no. 109088, p. 109088, Jan. 2024.
    • Problem: One-Warehouse Multi-Retailer (OWMR) inventory management
    • Methods:
      • Sequential allocation rule
    • Experiments:
      • Shows the proportional allocation rule does not work well and the sequential allocation rule performs better

RL for operations research problems

  • A. Ramanujam et al., “SafeOR-Gym: A benchmark suite for safe reinforcement learning algorithms on practical operations research problems,” arXiv [cs.LG], 02-June-2025. Code
    • Problems: 9 OR environments
    • Methods: Safe RL algorithms
      • Constrained Markov Decision Process (CMDP)
      • Constraints handling methods
      • Constraint RL algorithms

Federated learning

Federated reinforcement learning

  • H. Jin, Y. Peng, W. Yang, S. Wang, and Z. Zhang, “Federated Reinforcement Learning with environment heterogeneity,” arXiv [cs.LG], 2022. code
    • Problem setting: \(n\) agents located in \(n\) different environments, with the same state space, action space, reward function, but different transition dynamics.
    • Algorithm: Learn a uniformly good policy (QAvg and PAvg) and personalization (embedding-based method, applied to DQNAvg and DDPGAvg).

Federated learning for supply chain management

  • H. Wang, F. Xie, Q. Duan, and J. Li, “Federated learning for supply chain demand forecasting,” Math. Probl. Eng., vol. 2022, pp. 1–8, Nov. 2022. Code
    • vertical federated LSTM model.