2025-09-11

Lipschitzness of the Softmax Function

Definition

Let $e_i \in \mathbb{R}^d$ be the $i$ -th standard basis vector.

Let $\Delta_d$ be the $d$ -dimensional probability simplex, i.e.,

\Delta_{d} = \left\{ s \in\mathbb{R}^d \middle| \sum_{i=1}^d s_{i} = 1,\: \text{and } s_{i}\ge 0 \text{ for all } i \right\}.

Define the softmax with inverse temperature $\lambda>0$ , $\sigma_{\lambda}:\mathbb{R}^d\to\Delta_d$ , by

\sigma_{\lambda}(\mathbf x) = \frac{1}{\sum\limits_{i=1}^d\exp(\lambda x_i)} \begin{bmatrix} \exp(\lambda x_1) \\ \dots \\ \exp(\lambda x_d) \end{bmatrix}.

Proposition

For any $1\le p,q\le\infty$ and $\sigma_{\lambda}$ , the following holds:

\|\sigma_{\lambda}(x)-\sigma_{\lambda}(y)\|_{p} \le L_{p,q}\|x-y\|_{q},

where

L_{p,q} = \lambda 2^{-1 + \frac{1}{p} - \frac{1}{q}}.

Proof

Let $s=\sigma_{\lambda}(z)$ . Then the Jacobian matrix can be written as

J(z) = \nabla \sigma_{\lambda}(z) = \lambda(\mathrm{diag}(s)- ss^T).

By the mean value inequality,

\|\sigma_{\lambda}(x)-\sigma_{\lambda}(y)\|_{p} \le \left( \sup_{z} \sup_{\|u\|_{q}= 1} \|J(z) u\|_{p} \right) \|x-y\|_{q}.

Hence it suffices to bound $\sup_{z}\|J(z)\|_{q\to p}$ .

First, for the standard basis $\{e_i\}_{i=1}^d$ , the identity $\mathrm{diag}(s)-ss^T = \sum_{i<j}s_{i}s_{j}(e_{i}-e_{j})(e_{i}-e_{j})^T$ holds, so for any $u\in\mathbb{R}^d$ ,

J(z)u = \lambda \sum_{i<j} s_{i}s_{j}(u_{i}-u_{j})(e_{i}-e_{j}).

By the triangle inequality and $\|e_i-e_j\|_{p}=2^{1/p}$ (for $i\neq j$ ),

\begin{align*} \|J(z)u\|_{p} &\le \lambda \sum_{i<j} s_{i}s_{j} |u_{i}-u_{j}| \|e_{i}-e_{j}\|_{p} \\ &= \lambda 2^{1/p} \sum_{i<j} s_{i}s_{j} |u_{i}-u_{j}| \end{align*}

Here $s=\sigma_\lambda(z)$ ranges over the interior of the simplex. Since the function $s\mapsto\sum_{i<j}s_is_j|u_i-u_j|$ is continuous, its supremum over the interior coincides with its supremum over the closed simplex $\Delta_d$ , which includes the boundary. Therefore,

\begin{align*} \sup_{z} \sup_{\|u\|_{q}= 1} \|J(z) u\|_{p} &= \sup_{\|u\|_{q}=1} \sup_{z} \|J(z)u\|_{p} \\ &= \lambda 2^{1/p} \sup_{\|u\|_{q}=1} \sup_{s \in\Delta_{d}} \sum_{i<j} s_{i}s_{j}|u_{i}-u_{j}|. \end{align*}

Let $i_{min}=\argmin_{i}u_{i}$ and $i_{max}=\argmax_{i}u_{i}$ . The right-hand side is maximized when $s$ concentrates on $i_{min}$ and $i_{max}$ , hence

\sup_{s \in \Delta_{d}} \sum_{i<j} s_{i}s_{j}|u_{i}-u_{j}| = |u_{i_{max}}-u_{i_{min}} | \max_{S \in[0,1]} S(1-S) = |u_{i_{max}}-u_{i_{min}} | /4.

The equality condition is $s_{i_{min}}=s_{i_{max}}=\frac{1}{2}$ . Next, since for any real numbers $a,b$ we have $|a-b|^q \le 2^{q-1} (|a|^q + |b|^q)$ , applying this to $a=u_{i_{max}},b=u_{j_{min}}$ yields:

|u_{i_{max}}-u_{i_{min}}| \le 2^{1-1/q} (|u_{i_{max}}|^q + |u_{i_{min}}|^q)^{1/q} \le 2^{1-1/q}.

The equality condition is $u_{i_{max}}=2^{-1/q}, u_{i_{min}}=-2^{-1/q}$ . Putting everything together, we obtain

\sup_{z} \sup_{\|u\|_{q}= 1} \|J(z) u\|_{p} \le \lambda 2^{1/p} \cdot \frac{1}{4} \cdot 2^{1-1/q} = \lambda 2^{-1+1/p-1/q} =: L_{p,q}.

$\square$

Examples

For $(2,2)$ , $L_{2,2} = \lambda/2$
For $(1,1)$ , $L_{1,1} = \lambda/2$
For $(1,\infty)$ , $L_{1,\infty} = \lambda$

References

softmax 関数のリプシッツ連続性