17 勾配降下法

関数 \(y = f(x)\) を最小化する問題を考える．ここで，\(x\), \(y\) は実数であるとする．\(f(x)\) の導関数は \(f'(x)\) や \(\frac{df}{dx}\) と書く．導関数 \(f'(x)\) は，点 \(x\) における関数 \(f(x)\) の傾きを表す．

Example 17.1 (導関数の例) 関数 \(f(x) = x^2\) の導関数は \(f'(x) = 2x\) である．例えば，点 \(x = 1\) における導関数の値は \(f'(1) = 2\) であり，これは点 \((1, 1)\) における関数 \(f(x)\) の傾きが \(2\) であることを意味する．下記の図は，関数 \(f(x) = x^2\) と点 \((1, 1)\) における接線を示している．

Code

import numpy as np
import matplotlib.pyplot as plt


def f(x):
    return x**2


def df(x):
    return 2 * x


# point (1, 1)
def tangent_line(x):
    return 2 * (x - 1) + 1


x = np.linspace(-4, 4, 100)
plt.plot(x, f(x), label="f(x) = x^2")
plt.plot(x, tangent_line(x), label="Tangent line at x=1", ls="--")
plt.scatter(1, f(1), color="red", label="Point (1, 1)")
plt.title("Function and its Derivative")
plt.xlabel("x")
plt.ylabel("y")
plt.axhline(0, color="black", lw=0.5, ls="--")
plt.axvline(0, color="black", lw=0.5, ls="--")
plt.legend()
plt.grid()
plt.show()

17.1 アルゴリズム

\(\mathbf{x} \in \mathbb{R}^n\)の関数\(f(\mathbf{x})\)を最小化する問題を考える。ここで、\(f\) は微分可能であるとする。

勾配降下法は、以下の式で与えられる更新式を繰り返すことで、\(f\)の極小値を求めるアルゴリズムである。

\[ \mathbf{x}^{(k+1)} = \mathbf{x}^{(k)} - \alpha \nabla f(\mathbf{x}^{(k)}) \]

ここで、\(\nabla f(\mathbf{x})\)は\(f\)の勾配を表し、\(\alpha\)はステップサイズを表す。

勾配\(\nabla f(\mathbf{x})\)は、以下のように定義される。

\[ \nabla f(\mathbf{x}) = \left[ \frac{\partial f}{\partial x_1}, \frac{\partial f}{\partial x_2}, \ldots, \frac{\partial f}{\partial x_n} \right] \]

\(\alpha\)は、ステップサイズを表すハイパーパラメータであり、適切な値を選択することでアルゴリズムの収束性能を向上させることができる。

勾配降下法は、勾配の逆方向に進むことで、関数\(f\)の極小値を探索するアルゴリズムである。

勾配降下法の停止条件として、よく使われるのは以下の式である。

\(\|\nabla f(\mathbf{x})\| < \text{tol}\)

\(\|\cdot\|\) はベクトルのノルムを表し、\(\text{tol}\)はあらかじめ与えられた許容誤差である。

Algorithm 17.1 (Gradient descent)

Inputs: function \(f\), gradient \(\nabla f\), initial guess \(\mathbf{x}^{(0)}\), step size \(\alpha\), tolerance \(\text{tol}\)
Output: estimate of the minimum \(\mathbf{x}\)

Initialize \(\mathbf{x} \leftarrow \mathbf{x}^{(0)}\)
While \(\|\nabla f(\mathbf{x})\| > \text{tol}\):
1. \(\mathbf{x} \leftarrow \mathbf{x} - \alpha \nabla f(\mathbf{x})\)
Return \(\mathbf{x}\)

17.2 Pythonによる実装

import numpy as np

def gradient_descent(f, grad, x0, alpha=1e-3, tol=1e-6):
    """
    Gradient descent (optimization algorithm)

    Parameters
    ----------
    f : function
        The function to minimize
    grad : function
        The gradient of the function
    x0 : np.ndarray
        Initial guess
    alpha : float
        Step size
    tol : float
        Tolerance

    Returns
    -------
    x : np.ndarray
        The estimate of the minimum
    """
    x = x0
    while np.linalg.norm(grad(x)) > tol:
        x = x - alpha * grad(x)
    return x


def f(x):
    return x[0]**2 + x[1]**2

def grad(x):
    return np.array([2*x[0], 2*x[1]])

x0 = np.array([1.0, 1.0])
x = gradient_descent(f, grad, x0)
print(x)