14 動的計画法

学習目標

動的計画法による最短路問題とナップサック問題の解法を理解する。
動的計画法の基本的な考え方を理解する。
*ベルマン方程式の意味を理解する。

Principle of Optimality: An optimal policy has the property that whatever the initial state and initial decision are, the remaining decisions must constitute an optimal policy with regard to the state resulting from the first decision.

最適性の原理（Principle of Optimality）：最適な方策は、初期状態と初期決定がどんなものであれ、その結果得られる次の状態に関して、以降の決定が必ず最適方策になっているという性質をもつ。

– Richard Bellman, 1957

動的計画法（dynamic programming, DP）は Richard E. Bellman によって提案された，多段階決定過程（multi-stage decision process）に対する最適化手法である．

ノート

動的計画法を algorithmic paradigm の一つとして扱う場合もあるが，本書では最適化手法として説明する．

大学では，情報，制御工学，経営工学など，様々な分野で動的計画法が教えられている．

多段階決定過程は，複数の段階にわたって行われる一連の意思決定を行う問題である．各段階 \(t\) において，意思決定者は現在の状態を観測し，意思決定を行う．その意思決定は，将来の状態に影響を与える．目的として，状態に関するある目的関数を最小化（または最大化）することである．動的計画法は，多段階決定過程における最適方策（optimal policy）を求めるための一般的な手法である．

最適化，制御，オペレーションズ・リサーチなどの分野において，多数の応用例が知られている．

在庫管理問題
自動運転
ロボット制御
生物学
経済学・金融工学
…

さらに，現在では，強化学習の基礎理論としても重要な役割を果たしている．

ここでは，動的計画法の基本的な考え方を理解するために，最短路問題（shortest path problem）とナップサック問題（knapsack problem）を取り上げる．

動的計画法の名前の由来

I spent the Fall quarter (of 1950) at RAND. My first task was to find a name for multistage decision processes. An interesting question is, “Where did the name, dynamic programming, come from?” The 1950s were not good years for mathematical research. We had a very interesting gentleman in Washington named Wilson. He was Secretary of Defense, and he actually had a pathological fear and hatred of the word “research”. I’m not using the term lightly; I’m using it precisely. His face would suffuse, he would turn red, and he would get violent if people used the term research in his presence. You can imagine how he felt, then, about the term mathematical. The RAND Corporation was employed by the Air Force, and the Air Force had Wilson as its boss, essentially. Hence, I felt I had to do something to shield Wilson and the Air Force from the fact that I was really doing mathematics inside the RAND Corporation. What title, what name, could I choose? In the first place I was interested in planning, in decision making, in thinking. But planning, is not a good word for various reasons. I decided therefore to use the word “programming”. I wanted to get across the idea that this was dynamic, this was multistage, this was time-varying. I thought, let’s kill two birds with one stone. Let’s take a word that has an absolutely precise meaning, namely dynamic, in the classical physical sense. It also has a very interesting property as an adjective, and that is it’s impossible to use the word dynamic in a pejorative sense. Try thinking of some combination that will possibly give it a pejorative meaning. It’s impossible. Thus, I thought dynamic programming was a good name. It was something not even a Congressman could object to. So I used it as an umbrella for my activities.

— Richard Bellman, Eye of the Hurricane: An Autobiography (1984, page 159)

14.1 最短路問題

多段階の有向グラフ \(G = (V, E)\) における最短路問題を考える。各辺 \((v, u) \in E\) に非負の重み \(w(v, u)\) が与えられているとする。始点 \(s \in V\) から終点 \(t \in V\) への最短路を求める問題である。\(s\) と \(t\) 以外の頂点を段階 \(D = \{ 1, 2, \ldots, n \}\) に分けることができるとする。結果として，\(n\) 回の意思決定を行い，最適解は \((s, v_1, v_2, \ldots, v_n, t)\) の形になる。

\(P^* = (s, v_1, v_2, \ldots, v_n, t)\) は最短路であれば，任意の \(1 \leq k \leq n\) に対して，パス \((v_k, v_{k+1}, \ldots, v_n, t)\) も \(v_k\) から \(t\) への最短路である。これは，もし \((v_k, v_{k+1}, \ldots, v_n, t)\) が最短路でないとすると，より短いパスが存在し，\(P^*\) も最短路でなくなるためである。この性質は最適性の原理（Principle of Optimality）と呼ばれる。

点 \(v\) から点 \(t\) までの最短路の長さ（またはコスト）を \(d(v)\) と定義する。最適性の原理により，点 \(v \in V \setminus \{ t \}\) に対して，

\[ d(v) = \min_{u \in \mathcal{S}(v)} \{ w(v, u) + d(u) \} \]

が成り立つ。ここで，\(\mathcal{S}(v) = \{ u \mid (v, u) \in E \}\) とする．

例 14.1 下のグラフにおいて，\(d(a)\)，\(d(b)\)，\(d(c)\) が与えられたとき，\(d(s)\) を計算せよ。

\(d(a) = 8\)
\(d(b) = 7\)
\(d(c) = 7\)

コード

import networkx as nx
import matplotlib.pyplot as plt

G = nx.DiGraph()
edges = [
    ("s", "a", 5),
    ("s", "b", 6),
    ("s", "c", 4),
    ("a", "d", 2),
    ("a", "e", 5),
    ("b", "d", 1),
    ("b", "e", 3),
    ("c", "e", 4),
    ("c", "f", 1),
    ("d", "g", 7),
    ("d", "h", 2),
    ("e", "h", 1),
    ("e", "i", 3),
    ("f", "h", 2),
    ("f", "i", 2),
    ("g", "j", 4),
    ("g", "k", 2),
    ("h", "k", 3),
    ("h", "l", 2),
    ("i", "l", 4),
    ("i", "m", 3),
    ("j", "t", 5),
    ("k", "t", 1),
    ("l", "t", 4),
    ("m", "t", 2),
]
G.add_weighted_edges_from(edges)

for layer, nodes in enumerate(nx.topological_generations(G)):
    for node in nodes:
        G.nodes[node]["layer"] = layer

pos = {
    "s": (0, 0),
    "a": (1, 1),
    "b": (1, 0),
    "c": (1, -1),
    "d": (2, 1),
    "e": (2, 0),
    "f": (2, -1),
    "g": (3, 1),
    "h": (3, 0),
    "i": (3, -1),
    "j": (4, 1.5),
    "k": (4, 0.5),
    "l": (4, -0.5),
    "m": (4, -1.5),
    "t": (5, 0),
}

nx.draw(G, pos, with_labels=True, node_size=700, node_color="lightblue", font_size=10)
edge_labels = nx.get_edge_attributes(G, "weight")
nx.draw_networkx_edge_labels(
    G, pos, edge_labels=edge_labels, label_pos=0.3, font_size=8, rotate=False
)
plt.show()

コード

# find shortest path from s to t
source = "s"
target = "t"
shortest_path = nx.shortest_path(G, source=source, target=target, weight="weight")
shortest_path_length = nx.shortest_path_length(
    G, source=source, target=target, weight="weight"
)
shortest_path, shortest_path_length

(['s', 'c', 'f', 'h', 'k', 't'], 11)

\[\begin{align*} d(s) &= \min \{ w(s, a) + d(a), w(s, b) + d(b), w(s, c) + d(c) \} \\ &= \min \{ 5 + 8, 6 + 7, 4 + 7 \} \\ &= \min \{ 13, 13, 11 \} \\ &= 11 \end{align*}\]

動的計画法による最短路問題の解法を以下に示す。

Algorithm 14.1

Input: 有向グラフ \(G = (V, E)\)，重み関数 \(w: E \to \mathbb{R}_{\geq 0}\)，始点 \(s \in V\)，終点 \(t \in V\)
Output: \(s\) から \(t\) への最短路の長さ \(d(s)\)

\(d(t) \leftarrow 0\)
\(d(v) \leftarrow \infty\) for all \(v \in V \setminus \{ t \}\)
\(k \leftarrow n-1\) (段階数)
while \(k \geq 0\) do
1. for each \(v \in D_k\) do
  1. \(d(v) \leftarrow \min_{u \in \mathcal{P}(v)} \{ w(v, u) + d(u) \}\)
2. \(k \leftarrow k - 1\)
return \(d(s)\)

def dp_shortest_path(G):
    # Initialize distances
    d = {v: float("inf") for v in G.nodes()}
    d["t"] = 0  # Distance to target is 0

    # Get layers in topological order
    layers = list(nx.topological_generations(G))

    # Process layers in reverse order
    for layer in reversed(layers[:-1]):  # Exclude the last layer (target)
        for v in layer:
            d[v] = min(
                (G[v][u]["weight"] + d[u] for u in G.successors(v)),
            )
    return d["s"]


if __name__ == "__main__":

    shortest_path_length_dp = dp_shortest_path(G)
    print(shortest_path_length_dp)

まず，\(d(t) = 0\) とし，第 4 段の点 \(j\)，\(k\)，\(l\)，\(m\) について，\(d(j)\)，\(d(k)\)，\(d(l)\)，\(d(m)\) を計算する。

\[\begin{align*} d(j) &= \min \{ w(j, t) + d(t) \} = \min \{ 5 + 0 \} = 5 \\ d(k) &= \min \{ w(k, t) + d(t) \} = \min \{ 1 + 0 \} = 1 \\ d(l) &= \min \{ w(l, t) + d(t) \} = \min \{ 4 + 0 \} = 4 \\ d(m) &= \min \{ w(m, t) + d(t) \} = \min \{ 2 + 0 \} = 2 \\ \end{align*}\]

次に，第 3 段の点 \(g\)，\(h\)，\(i\) について，\(d(g)\)，\(d(h)\)，\(d(i)\) を計算する。

\[\begin{align*} d(g) &= \min \{ w(g, j) + d(j), w(g, k) + d(k) \} = \min \{ 4 + 5, 2 + 1 \} = \min \{ 9, 3 \} = 3 \\ d(h) &= \min \{ w(h, k) + d(k), w(h, l) + d(l) \} = \min \{ 3 + 1, 2 + 4 \} = \min \{ 4, 6 \} = 4 \\ d(i) &= \min \{ w(i, l) + d(l), w(i, m) + d(m) \} = \min \{ 4 + 4, 3 + 2 \} = \min \{ 8, 5 \} = 5 \\ \end{align*}\]

第 2 段の点 \(d\)，\(e\)，\(f\) について，\(d(d)\)，\(d(e)\)，\(d(f)\) を計算する。

\[\begin{align*} d(d) &= \min \{ w(d, g) + d(g), w(d, h) + d(h) \} = \min \{ 7 + 3, 2 + 4 \} = \min \{ 10, 6 \} = 6 \\ d(e) &= \min \{ w(e, h) + d(h), w(e, i) + d(i) \} = \min \{ 1 + 4, 3 + 5 \} = \min \{ 5, 8 \} = 5 \\ d(f) &= \min \{ w(f, h) + d(h), w(f, i) + d(i) \} = \min \{ 2 + 4, 2 + 5 \} = \min \{ 6, 7 \} = 6 \\ \end{align*}\]

第 1 段の点 \(a\)，\(b\)，\(c\) について，\(d(a)\)，\(d(b)\)，\(d(c)\) を計算する。

\[\begin{align*} d(a) &= \min \{ w(a, d) + d(d), w(a, e) + d(e) \} = \min \{ 2 + 6, 5 + 5 \} = \min \{ 8, 10 \} = 8 \\ d(b) &= \min \{ w(b, d) + d(d), w(b, e) + d(e) \} = \min \{ 1 + 6, 3 + 5 \} = \min \{ 7, 8 \} = 7 \\ d(c) &= \min \{ w(c, e) + d(e), w(c, f) + d(f) \} = \min \{ 4 + 5, 1 + 6 \} = \min \{ 9, 7 \} = 7 \\ \end{align*}\]

最後に，始点 \(s\) について，\(d(s)\) を計算する。

\[\begin{align*} d(s) &= \min \{ w(s, a) + d(a), w(s, b) + d(b), w(s, c) + d(c) \} \\ &= \min \{ 5 + 8, 6 + 7, 4 + 7 \} \\ &= \min \{ 13, 13, 11 \} \\ &= 11 \end{align*}\]

最短路 \(P^*\) は，\(d(s)\) の計算過程を逆にたどることで求めることができる。点 \(v\) に対して，次の点 \(u\) を

\[ u = \arg\min_{u' \in \mathcal{P}(v)} \{ w(v, u') + d(u') \} \]

と定義する。始点 \(s\) から始めて，次の点を順にたどることで，最短路 \(P^*\) を構築できる。この問題において，最短路は \(P^* = (s, c, f, h, k, t)\) であり，その長さは 11 である。

例 14.2 動的計画法による最長路問題のアルゴリズムを考え，例題の最長路を求めよ。点 \(v\) から点 \(t\) までの最長路の長さ \(d(v)\) は， \[ d(v) = \max_{u \in \mathcal{P}(v)} \{ w(v, u) + d(u) \} \] と与えられる。

14.2 0/1 ナップサック問題

0/1 ナップサック問題は，有限個の品物からなる集合 \(I = \{ 1, 2, \ldots, n \}\) と，ナップサックの容量 \(W\) が与えられたとき，以下の最適化問題を解くことである。

\[\begin{align*} \text{maximize} \quad & \sum_{i \in I} v_i x_i \\ \text{subject to} \quad & \sum_{i \in I} w_i x_i \leq W \\ & x_i \in \{ 0, 1 \} \quad \forall i \in I \end{align*}\]

ここで，\(v_i\) は品物 \(i\) の価値，\(w_i\) は品物 \(i\) の重量，\(x_i\) は品物 \(i\) をナップサックに入れるかどうかを示す 0/1 変数である。目的は，ナップサックの容量を超えないように品物を選び，その価値の総和を最大化することである。

例 14.3 ここでは，0/1 ナップサック問題の例題を示す．ナップサックの容量 \(W = 15\) のとき，最適解を求めよ。

品物 \(i\)	重量 \(w_i\)	価値 \(v_i\)
1	12	4
2	2	2
3	1	1
4	1	2

\(\mathbf{x} = (x_1, x_2, x_3, x_4)\) とすると，最適解は \(\mathbf{x^*} = (1, 1, 0, 1)\) であり，価値の総和は 8 である。

品物 \(1, 2, \ldots, n\) を取るか取らないかを順に決定していく多段階決定過程として 0/1 ナップサック問題を定式化できる。段階 \(i\) において，品物 \(i\) を取るか取らないかを決定する。\(v_i(w)\) を，品物 \(i, i+1, \ldots, n\) から選んで，利用可能な容量が \(w\) のときの最大価値と定義する。すなわち，\(v_i(w)\) は，以下の最適化問題の最適値である。

\[\begin{align*} \text{maximize} \quad & \sum_{j = i}^{n} v_j x_j \\ \text{subject to} \quad & \sum_{j = i}^{n} w_j x_j \leq w \\ & x_j \in \{ 0, 1 \} \quad \forall j = i, i+1, \ldots, n \end{align*}\]

\(\mathbf{x^*} = (x_1, x_2, \ldots, x_n)\) は最適解であれば，\((x_i, x_{i+1}, \ldots, x_n)\) は，利用可能な容量が \(W - \sum_{j=1}^{i-1} w_j x_j\) のときの品物 \(i, i+1, \ldots, n\) に対する最適解である。これは，もし \((x_i, x_{i+1}, \ldots, x_n)\) が最適解でないとすると，より価値の高い解が存在し，\(\mathbf{x^*}\) も最適解でなくなるためである。

以上の性質により，\(v_i(w)\) は以下の再帰式で与えられる。

\[ v_i(w) = \begin{cases} v_{i+1}(w) & \text{if } w_i > w \\ \max \{ v_{i+1}(w), v_{i+1}(w - w_i) + v_i \} & \text{if } w_i \leq w \end{cases} \]

また，\(n+1\) 段階目では品物が存在しないため，すべての容量 \(w\) に対して \(v_{n+1}(w) = 0\) となる。

ヒント

\(v_{i+1}(w)\) は，すべての \(w = 0, 1, \ldots, W\) に対して，品物 \(i+1, i+2, \ldots, n\) から選んで利用可能な容量が \(w\) のときの最大価値を表す。これを用いて，\(v_i(w), \forall w = 0, 1, \ldots, W\) を計算する。最適性の原理により，\(v_i(w)\) は部分問題の最適値である。

品物 \(i\) の重量 \(w_i\) が利用可能な容量 \(w\) を超える場合，品物 \(i\) は取れないため，\(v_i(w) = v_{i+1}(w)\) となる。

\(w_i \leq w\) の場合，品物 \(i\) を取らない場合と取る場合のうち，価値の高い方を選ぶ．ただし，品物 \(i\) を取る場合は，利用可能な容量が \(w - w_i\) になり，価値が \(v_i\) 増加する．

Algorithm 14.2

Input: 品物の集合 \(I = \{ 1, 2, \ldots, n \}\)，品物 \(i\) の重量 \(w_i\)，品物 \(i\) の価値 \(v_i\)，ナップサックの容量 \(W\)
Output: 最適値 \(v_1(W)\)

for all \(w = 0, 1, \ldots, W\) do
1. \(v_{n+1}(w) \leftarrow 0\)
for \(i = n, n-1, \ldots, 1\) do
1. for all \(w = 0, 1, \ldots, W\) do
  1. if \(w_i > w\) then
    1. \(v_i(w) \leftarrow v_{i+1}(w)\)
  2. else
    1. \(v_i(w) \leftarrow \max \{ v_{i+1}(w), v_{i+1}(w - w_i) + v_i \}\)
return \(v_1(W)\)

def dp_knapsack(weights, values, W):
    n = len(weights)
    # Initialize value table
    v = [[0 for _ in range(W + 1)] for _ in range(n + 2)]

    # Fill the table in reverse order
    for i in range(n, 0, -1):
        for w in range(W + 1):
            if weights[i - 1] > w:
                v[i][w] = v[i + 1][w]
            else:
                v[i][w] = max(v[i + 1][w], v[i + 1][w - weights[i - 1]] + values[i - 1])
    return v[1][W]


if __name__ == "__main__":
    weights = [12, 2, 1, 1]
    values = [4, 2, 1, 2]
    W = 15
    optimal_value = dp_knapsack(weights, values, W)
    print(optimal_value)  # Output: 8

容量 \(W = 2\) のときの計算過程を以下に示す。

\(v_5(w) = 0, \forall w\) であり，次に \(i = 4\) について，

\[\begin{align*} v_4(0) &= 0 \\ v_4(1) &= \max \{ v_5(1), v_5(0) + 2 \} = \max \{ 0, 0 + 2 \} = 2 \\ v_4(2) &= \max \{ v_5(2), v_5(1) + 2 \} = \max \{ 0, 0 + 2 \} = 2 \\ \end{align*}\]

が得られる。次に \(i = 3\) について，次の結果が得られる。

\[\begin{align*} v_3(0) &= 0 \\ v_3(1) &= \max \{ v_4(1), v_4(0) + 1 \} = \max \{ 2, 0 + 1 \} = 2 \\ v_3(2) &= \max \{ v_4(2), v_4(1) + 1 \} = \max \{ 2, 2 + 1 \} = 3 \\ \end{align*}\]

次に \(i = 2\) について，次の結果が得られる。 \[\begin{align*} v_2(0) &= 0 \\ v_2(1) &= v_3(1) = 2 \\ v_2(2) &= \max \{ v_3(2), v_3(0) + 2 \} = \max \{ 3, 0 + 2 \} = 3 \\ \end{align*}\]

最後に \(i = 1\) について，次の結果が得られる。 \[\begin{align*} v_1(0) &= 0 \\ v_1(1) &= v_2(1) = 2 \\ v_1(2) &= v_2(2) = 3 \\ \end{align*}\]

したがって，最適値は \(v_1(2) = 3\) である。最適解は，計算過程を逆にたどることで求めることができ，\(\mathbf{x^*} = (0, 0, 1, 1)\) となる。

\(v_1(2) = v_2(2)\) より，品物 1 は取らない。\(x_1^* = 0\)。
\(v_2(2) = v_3(2)\) より，品物 2 は取らない。\(x_2^* = 0\)。
\(v_3(2) = v_4(1) + 1\) より，品物 3 は取る。\(x_3^* = 1\)。利用可能な容量は \(2 - 1 = 1\) となる。
\(v_4(1) = v_5(0) + 2\) より，品物 4 は取る。\(x_4^* = 1\)。

例 14.4 容量 \(W = 3\) のときの最適値と最適解を求めよ。

14.3 ベルマン方程式（確定的場合）*

動的計画法における最適性の原理は，ベルマン方程式（Bellman equation）として知られる再帰方程式により表現される。状態 \(s\) におけるの価値関数 \(v(s)\) は，以下のように定義される。

\[ v(s) = \max_{a \in A(s)} \{ r(s, a) + \gamma v(s') \} \]

ここで，\(A(s)\) は状態 \(s\) における可能な行動の集合，\(R(s, a)\) は状態 \(s\) で行動 \(a\) を取ったときの報酬，\(\gamma\) は割引率，\(s'\) は行動 \(a\) を取った後の次の状態である。

記号	最短路問題	0/1 ナップサック問題
\(s\)	現在の頂点	現在の品物と利用可能な容量
\(a\)	次に移動する頂点	品物を取るか取らないか
\(A(s)\)	現在の頂点から到達可能な頂点の集合	現在の品物を取るか取らないかの選択肢
\(R(s, a)\)	負の辺の重み	品物の価値
\(s'\)	次の頂点	次の品物と新しい利用可能な容量