Some facts concerning the von Neumann entropy and quantum mutual information

Here we'll prove some facts concerning the von Neumann entropy and quantum mutual information.

Let $\X$ be an $n$-dimensional complex Euclidean space, and let $\rho\in\Density(\X)$ be a density operator. Recall that the von Neumann entropy of $\rho$ is defined as
\[
S(\rho):=-\tr(\rho \ \text{log}(\rho)),
\]
or equivalently as
\[
S(\rho):=H(\lambda(\rho)),
\]
where $\lambda(\rho)=(\lambda_1(\rho),\lambda_2(\rho),\dots,\lambda_n(\rho))$ is the vector of eigenvalues of $\rho$, and
\[
H(p):=\sum_{a\in\Sigma}-p(a)\log(p(a))),
\]
is the classical Shannon entropy of a vector $p\in\mathbb{R}^{\Sigma}$ over some alphabet $\Sigma$.

Theorem:

For every choice of complex Euclidean spaces $\X$ and $\Y$, and every vector $u \in \X\otimes\Y$, it holds that $S(\tr_{\X}(u u^{\ast})) = S(\tr_{\Y}(u u^{\ast}))$.
   
Proof:

 The vector $u\in\X\otimes\Y$ can be expressed in its Schmidt decomposition after making the unique identification $u=vec(A)$ as
\[
 u=\sum_{k=1}^{r}s_kx_k\otimes y_k,
\]
 where $r=rank(A)$, $0\leq s_1,\dots, s_r\in\mathbb{R}$ are the singular values, and $\{x_1,\dots,x_r\}\subset\X$ and $\{y_1\dots y_r\}\subseteq\Y$ are orthonormal sets. Then
\[
uu^\ast=\sum_{j,k=1}^{r}s_js_kx_jx_k^\ast\otimes y_jy_k^\ast,
\]
and therefore
\[
\tr_{\X}(uu^\ast)=\sum_{k=1}^{r}s_k^2x_kx_k^\ast \ \  \ \ \text{and} \ \ \ \tr_{\Y}(uu^\ast)=\sum_{k=1}^{r}s_k^2y_ky_k^\ast.
\]
Now let $\lambda=(s_1^2,\dots,s_r^2)$, and observe that $\lambda$ is the vector of  non-zero eigenvalues of both $\tr_{\X}(uu^\ast)$ and $\tr_{\Y}(uu^\ast)$ since they are implicitly expressed in their own Schmidt decompositions above.

Hence, (by definition) the von Neumann entropy of each is 
\[
S(\tr_{\X}(u u^{\ast})) =H(\lambda) = S(\tr_{\Y}(u u^{\ast})).
\]


Theorem:

For every choice of registers $\reg{X}$ and $\reg{Y}$, and for every state $\rho\in\Density(\X\otimes\Y)$ of these registers, it holds that $S(\reg{X}) \leq S(\reg{Y}) + S(\reg{X},\reg{Y})$.}

Proof:
    
Choose a complex Euclidean space $\Z$ such that $\dim(\Z)\geq\rank(\rho)$ so that there exists a purification $\rho'=uu^\ast\in D(\X\otimes\Y\otimes\Z)$, and then let $\rho'$ be the joint state of the registers $\reg{X},\reg{Y},\reg{Z}$. Now consider the following. Since $\rho'$ is a pure state $S(\reg{X},\reg{Y}, \reg{Z})=0$. Moreover, $\rho'[\reg{X},\reg{Z}]=\tr_{\Y}(\rho')$ and $\rho'[\reg{Y}]=\tr_{\X\otimes\Z}(\rho')$, but since $\rho'=uu^\ast$ is a pure state the result of part (a) implies that $S(\tr_{\Y}(\rho'))=S(\tr_{\X\otimes\Z}(\rho'))$ or equivalently that $S(\reg{Y}) = S(\reg{X},\reg{Z})$.
    
By strong sub-additivity, for any possible state of the registers $\reg{X},\reg{Y}, \reg{Z}$,
\[
S(\reg{X},\reg{Y}, \reg{Z})+S(\reg{X})\leq S(\reg{X}, \reg{Z})+S(\reg{X}, \reg{Y}).
\]
However, by previous considerations we have that $S(\reg{X},\reg{Y}, \reg{Z})=0$ and $S(\reg{Y}) = S(\reg{X},\reg{Z})$, which after substituting implies that
\[
S(\reg{X})\leq S(\reg{Y})+S(\reg{X}, \reg{Y}).
\]


Theorem:

Let $\reg{X}$ and $\reg{Y}$ be registers, let $\Sigma$ be an alphabet, let $p\in\P(\Sigma)$ be a probability vector, and let $\{\sigma_a\,:\,a\in\Sigma\}\subset\Density(\X)$ and $\{\xi_a\,:\,a\in\Sigma\}\subset\Density(\Y)$ be arbitrary collections of density operators. For $(\reg{X},\reg{Y})$ being in the state
\[
      \rho = \sum_{a\in\Sigma} \, p(a) \sigma_a\otimes\xi_a,
\]
it holds that $S(\reg{X} : \reg{Y}) \leq H(p)$.
   
Proof:

In this case, the relative state of the two registers is given by
\[
\rho[\reg{X}]=\tr_\Y(\rho)=\sum_{a\in\Sigma}p(a) \sigma_a \ \ \ \text{and} \ \ \ \rho[\reg{Y}]=\tr_\X(\rho)=\sum_{a\in\Sigma}p(a) \xi_a.
\]
so that
\[\begin{align*}
\rho[\reg{X}]\otimes\rho[\reg{Y}]&=\left(\sum_{a\in\Sigma}p(a) \sigma_a\right)\otimes\left(\sum_{b\in\Sigma}p(b) \xi_b\right) \\
&=\sum_{a\in\Sigma}\sum_{b\in\Sigma}p(a)p(b) \sigma_a\otimes \xi_b.
\end{align*}\]

Then the mutual information $S(\reg{X} : \reg{Y})$ can be expressed as
\[\begin{align*}
S(\reg{X} : \reg{Y})&=S(\rho||\rho[\reg{X}]\otimes\rho[\reg{Y}]) \\
&=S\left( \sum_{a\in\Sigma} \, p(a) \sigma_a\otimes\xi_a || \sum_{a\in\Sigma}\sum_{b\in\Sigma}p(a)p(b) \sigma_a\otimes \xi_b\right) \\
&\leq \sum_{a\in\Sigma}S\left(  p(a) \sigma_a\otimes\xi_a || \sum_{b\in\Sigma}p(a)p(b) \sigma_a\otimes \xi_b\right) \\
&=\sum_{a\in\Sigma}S\left(  p(a) \sigma_a\otimes\xi_a || \, p(a) \sigma_a\otimes \sum_{b\in\Sigma}p(b)\xi_b\right) \\
&=\sum_{a\in\Sigma}\left(\tr(\xi_a)S(p(a)\sigma_a || p(a)\sigma_a) + \tr(p(a)\sigma_a)S(\xi_a || \sum_{b\in\Sigma}p(b)\xi_b)  \right),
\end{align*}\]
but since $S(p(a)\sigma_a || p(a)\sigma_a)=0$ and for $\sigma_a\in\Density(\X)$ it is always the case that $\tr(\sigma_a)=1$, it follows that
\[\begin{align*}
S(\reg{X} : \reg{Y})\leq &\sum_{a\in\Sigma} p(a)S\left(\xi_a || \sum_{b\in\Sigma}p(b)\xi_b\right) \\
=&\sum_{a\in\Sigma} p(a)S\left(\frac{p(a)}{p(a)}\xi_a || \sum_{b\in\Sigma}p(b)\xi_b\right) \\
=&\sum_{a\in\Sigma}p(a)\tr\left( \xi_a\log\left(\frac{p(a)}{p(a)}\xi_a\right) - \xi_a\log\left(\sum_{b\in\Sigma}p(b)\xi_b\right)\right) \\
=&\sum_{a\in\Sigma}p(a)\tr\left(-\xi_a\log(p(a))+ \xi_a\log\left(p(a)\xi_a\right) - \xi_a\log\left(\sum_{b\in\Sigma}p(b)\xi_b\right)\right) \\
=&\sum_{a\in\Sigma}p(a)\tr(-\xi_a\log(p(a)))+p(a)\tr\left( \xi_a\log\left(p(a)\xi_a\right) - \xi_a\log\left(\sum_{b\in\Sigma}p(b)\xi_b\right)\right) \\
=&\sum_{a\in\Sigma}-p(a)\log(p(a)))+p(a)\tr\left( \xi_a\log\left(p(a)\xi_a\right) - \xi_a\log\left(\sum_{b\in\Sigma}p(b)\xi_b\right)\right) \\
=&H(p)+c.
\end{align*}\]
Here, the Shannon entropy is by definition
\[
H(p)=\sum_{a\in\Sigma}-p(a)\log(p(a))),
\]
and the value $c$ has been introduced for convenience to represent the remaining quantity
\[
c:=\sum_{a\in\Sigma}p(a)\tr\left( \xi_a\log\left(p(a)\xi_a\right) - \xi_a\log\left(\sum_{b\in\Sigma}p(b)\xi_b\right)\right).
\]

In general, by the monoticity of the logarithmic function for $0\leq a,b\in \mathbb{R}$ it is the case that $\log(a)\leq(a+b)$. This then implies that

\[
\sum_{a\in\Sigma}p(a)\tr\left( \xi_a\log\left(p(a)\xi_a\right) \leq \xi_a\log\left(\sum_{b\in\Sigma}p(b)\xi_b\right)\right),
\]
 so that $c\leq 0$.

 Hence $S(\reg{X} : \reg{Y})\leq H(p)+c\leq H(p)$.