Lipschitzness of the Softmax Function


Definition

Let eiRde_i \in \mathbb{R}^d be the ii-th standard basis vector.

Let Δd\Delta_d be the dd-dimensional probability simplex, i.e.,

Δd={sRd|i=1dsi=1,and si0 for all i}.\Delta_{d} = \left\{ s \in\mathbb{R}^d \middle| \sum_{i=1}^d s_{i} = 1,\: \text{and } s_{i}\ge 0 \text{ for all } i \right\}.

Define the softmax with inverse temperature λ>0\lambda>0, σλ:RdΔd\sigma_{\lambda}:\mathbb{R}^d\to\Delta_d, by

σλ(x)=1i=1dexp(λxi)[exp(λx1)exp(λxd)].\sigma_{\lambda}(\mathbf x) = \frac{1}{\sum\limits_{i=1}^d\exp(\lambda x_i)} \begin{bmatrix} \exp(\lambda x_1) \\ \dots \\ \exp(\lambda x_d) \end{bmatrix}.

Proposition

For any 1p,q1\le p,q\le\infty and σλ\sigma_{\lambda}, the following holds:

σλ(x)σλ(y)pLp,qxyq,\|\sigma_{\lambda}(x)-\sigma_{\lambda}(y)\|_{p} \le L_{p,q}\|x-y\|_{q},

where

Lp,q=λ21+1p1q.L_{p,q} = \lambda 2^{-1 + \frac{1}{p} - \frac{1}{q}}.

Proof

Let s=σλ(z)s=\sigma_{\lambda}(z). Then the Jacobian matrix can be written as

J(z)=σλ(z)=λ(diag(s)ssT).J(z) = \nabla \sigma_{\lambda}(z) = \lambda(\mathrm{diag}(s)- ss^T).

By the mean value inequality,

σλ(x)σλ(y)p(supzsupuq=1J(z)up)xyq.\|\sigma_{\lambda}(x)-\sigma_{\lambda}(y)\|_{p} \le \left( \sup_{z} \sup_{\|u\|_{q}= 1} \|J(z) u\|_{p} \right) \|x-y\|_{q}.

Hence it suffices to bound supzJ(z)qp\sup_{z}\|J(z)\|_{q\to p}.

First, for the standard basis {ei}i=1d\{e_i\}_{i=1}^d, the identity diag(s)ssT=i<jsisj(eiej)(eiej)T\mathrm{diag}(s)-ss^T = \sum_{i<j}s_{i}s_{j}(e_{i}-e_{j})(e_{i}-e_{j})^T holds, so for any uRdu\in\mathbb{R}^d,

J(z)u=λi<jsisj(uiuj)(eiej).J(z)u = \lambda \sum_{i<j} s_{i}s_{j}(u_{i}-u_{j})(e_{i}-e_{j}).

By the triangle inequality and eiejp=21/p\|e_i-e_j\|_{p}=2^{1/p} (for iji\neq j),

J(z)upλi<jsisjuiujeiejp=λ21/pi<jsisjuiuj\begin{align*} \|J(z)u\|_{p} &\le \lambda \sum_{i<j} s_{i}s_{j} |u_{i}-u_{j}| \|e_{i}-e_{j}\|_{p} \\ &= \lambda 2^{1/p} \sum_{i<j} s_{i}s_{j} |u_{i}-u_{j}| \end{align*}

Here s=σλ(z)s=\sigma_\lambda(z) ranges over the interior of the simplex. Since the function si<jsisjuiujs\mapsto\sum_{i<j}s_is_j|u_i-u_j| is continuous, its supremum over the interior coincides with its supremum over the closed simplex Δd\Delta_d, which includes the boundary. Therefore,

supzsupuq=1J(z)up=supuq=1supzJ(z)up=λ21/psupuq=1supsΔdi<jsisjuiuj.\begin{align*} \sup_{z} \sup_{\|u\|_{q}= 1} \|J(z) u\|_{p} &= \sup_{\|u\|_{q}=1} \sup_{z} \|J(z)u\|_{p} \\ &= \lambda 2^{1/p} \sup_{\|u\|_{q}=1} \sup_{s \in\Delta_{d}} \sum_{i<j} s_{i}s_{j}|u_{i}-u_{j}|. \end{align*}

Let imin=arg miniuii_{min}=\argmin_{i}u_{i} and imax=arg maxiuii_{max}=\argmax_{i}u_{i}. The right-hand side is maximized when ss concentrates on imini_{min} and imaxi_{max}, hence

supsΔdi<jsisjuiuj=uimaxuiminmaxS[0,1]S(1S)=uimaxuimin/4.\sup_{s \in \Delta_{d}} \sum_{i<j} s_{i}s_{j}|u_{i}-u_{j}| = |u_{i_{max}}-u_{i_{min}} | \max_{S \in[0,1]} S(1-S) = |u_{i_{max}}-u_{i_{min}} | /4.

The equality condition is simin=simax=12s_{i_{min}}=s_{i_{max}}=\frac{1}{2}. Next, since for any real numbers a,ba,b we have abq2q1(aq+bq)|a-b|^q \le 2^{q-1} (|a|^q + |b|^q), applying this to a=uimax,b=ujmina=u_{i_{max}},b=u_{j_{min}} yields:

uimaxuimin211/q(uimaxq+uiminq)1/q211/q.|u_{i_{max}}-u_{i_{min}}| \le 2^{1-1/q} (|u_{i_{max}}|^q + |u_{i_{min}}|^q)^{1/q} \le 2^{1-1/q}.

The equality condition is uimax=21/q,uimin=21/qu_{i_{max}}=2^{-1/q}, u_{i_{min}}=-2^{-1/q}. Putting everything together, we obtain

supzsupuq=1J(z)upλ21/p14211/q=λ21+1/p1/q=:Lp,q.\sup_{z} \sup_{\|u\|_{q}= 1} \|J(z) u\|_{p} \le \lambda 2^{1/p} \cdot \frac{1}{4} \cdot 2^{1-1/q} = \lambda 2^{-1+1/p-1/q} =: L_{p,q}.

\square

Examples

  • For (2,2)(2,2), L2,2=λ/2L_{2,2} = \lambda/2
  • For (1,1)(1,1), L1,1=λ/2L_{1,1} = \lambda/2
  • For (1,)(1,\infty), L1,=λL_{1,\infty} = \lambda

References

softmax 関数のリプシッツ連続性