### Entanglement destroying channels

In a previous post, we were concerned with channels of the form $\Phi\in\Channel(\X,\Y)$ such that  $\bigl(\Phi\otimes \I_{\Lin(\Z)}\bigr)(\rho) \in \Sep(\Y:\Z)$ for every complex Euclidean space $\Z$ and every density operator $\rho\in\Density(\X\otimes\Z)$. Channels of this form have the effect of destroying entanglement that exists between the register they act on and any other registers.

Theorem:
There exist two channels $\Phi_0,\Phi_1\in\Channel(\X,\Y)$, both having the property described above, such that
$\bigtriplenorm{\Phi_0 - \Phi_1}_1 > \bignorm{\Phi_0(\rho) - \Phi_1(\rho)}_1$
for every $\rho\in\Density(\X)$. (Channels like this have the strange property that they destroy entanglement, and yet evaluating them on an entangled state helps to distinguish them.)

Proof:

For $\lambda\in[0,1]$, consider the two channels $\Phi_0(X),\Phi_0(X)\in\Channel(X)$ defined by
\begin{align*} \Phi_0(X)&=\frac{\lambda}{n+1}(\tr(X)\I_\X+X^T)+\frac{(1-\lambda)}{n}\tr(X)\I_\X \\ \Phi_1(X)&=\frac{\lambda}{n-1}(\tr(X)\I_\X-X^T)+\frac{(1-\lambda)}{n}\tr(X)\I_\X \end{align*}
Then for sufficiently small $\lambda\in[0,1]$ both of the Choi representations $J(\Phi_0(X))$ and $J(\Phi_1(X))$ are in a separable neighborhood of the maximally mixed state which implies that they are both separable by some theorem. Therefore, from the results in the previous post, we have that $\Phi_0(X)$ and $\Phi_0(X)$ are entanglement destroying as described in the problem statement.

Now considering that
$\Phi_0(\rho) - \Phi_1(\rho)=\frac{-2\lambda}{(n+1)(n-1)}\rho^T,$
it follows that
$\bignorm{\Phi_0(\rho) - \Phi_1(\rho)}_1=\frac{2\lambda}{(n+1)(n-1)}\bignorm{\rho^T}_1=\frac{2\lambda}{(n+1)(n-1)},$
since $\rho\in\Density(\X)$.

Moreover, since
\begin{align*} \bigtriplenorm{\Phi_0 - \Phi_1}_1&=max\{\bignorm{((\Phi_0(\rho) - \Phi_1(\rho))\otimes\I_{\Lin(\X)})(xx^\ast)}_1 \ : \ x\in S(\X\otimes\X)\} , \end{align*}
where $((\Phi_0(\rho) - \Phi_1(\rho))\otimes\I_{\Lin(\X)})(xx^\ast)$ gives the partial transpose of $xx^\ast$ (which is at most $n$ since $x\in S(\X\otimes\X)$) multiplied by the scalar quantity $\frac{2\lambda}{(n+1)(n-1)}$. Therefore $\bigtriplenorm{\Phi_0 - \Phi_1}_1=\frac{2\lambda n}{(n+1)(n-1)}$, which implies that
$\frac{2\lambda n}{(n+1)(n-1)}=\bigtriplenorm{\Phi_0 - \Phi_1}_1 > \bignorm{\Phi_0(\rho) - \Phi_1(\rho)}_1=\frac{2\lambda}{(n+1)(n-1)}.$

### Bounding the norm of the Choi representation of a channel in terms of its operator norm

Theorem:

Let $\X$ and $\Y$ be complex Euclidean spaces with $\dim(\X) = n$ and let $\Phi\in\Trans(\X,\Y)$. Let $\norm{\cdot}_1$ denote the usual trace norm of an density operator, and $\triplenorm{\cdot}_1$ the operator norm of a channel.
$\triplenorm{\Phi}_1 \leq \norm{J(\Phi)}_1 \leq n \triplenorm{\Phi}_1.$

Proof:

Since the Choi representation of $\Phi$ is given by $J(\Phi)=(\Phi\otimes\I_{\Lin(\X)})(\vec(\I_\X)\vec(\I_\X)^\ast)$, then the trace norm is given by and also satisfies
\begin{align*} \norm{J(\Phi)}_1&=\norm{(\Phi\otimes\I_{\Lin(\X)})(\vec(\I_\X)\vec(\I_\X)^\ast)}_1 \\ &\leq \norm{(\Phi\otimes\I_{\Lin(\X)})}_1\norm{\vec(\I_\X)\vec(\I_\X)^\ast}_1 \\ &=\norm{(\Phi\otimes\I_{\Lin(\X)})}_1n \\ &=n \triplenorm{\Phi}_1, \end{align*}
where the last two lines from $\norm{\vec(\I_\X)\vec(\I_\X)^\ast}_1=n$ and the definition of the completely bounded trace norm $\triplenorm{\Phi}_1 :=\norm{(\Phi\otimes\I_{\Lin(\X)})}_1$.

Now consider an alternate characterization of the completely bounded trace norm:
$\triplenorm{\Phi}_1=max\{\norm{(\I_{\Lin(\Y)}\otimes\sqrt{\rho_0})J(\Phi)(\I_{\Lin(\Y)}\otimes\sqrt{\rho_1})}_1 : \rho_0,\rho_1\in\Density(\X)\}.$
Since this norm satisfies the property
$\norm{(\I_{\Lin(\Y)}\otimes\sqrt{\rho_0})J(\Phi)(\I_{\Lin(\Y)}\otimes\sqrt{\rho_1})}_\infty\leq\norm{\I_{\Lin(\Y)}\otimes\sqrt{\rho_0}}_\infty\norm{J(\Phi)}_1\norm{\I_{\Lin(\Y)}\otimes\sqrt{\rho_1}}_\infty,$
and the spectral norm $\norm{A}_\infty$ of an operator $A$ is given by the largest singular value of $A$, then it follows that $\norm{\I_{\Lin(\Y)}\otimes\sqrt{\rho_a}}_\infty\leq1$ with equality holding in the case where $\rho_a$ is a pure state. Therefore,
$\norm{(\I_{\Lin(\Y)}\otimes\sqrt{\rho_0})J(\Phi)(\I_{\Lin(\Y)}\otimes\sqrt{\rho_1})}_\infty\leq\norm{J(\Phi)}_1,$
implying that
$\triplenorm{\Phi}_1=max\{\norm{(\I_{\Lin(\Y)}\otimes\sqrt{\rho_0})J(\Phi)(\I_{\Lin(\Y)}\otimes\sqrt{\rho_1})}_1 : \rho_0,\rho_1\in\Density(\X)\}\leq \norm{J(\Phi)}_1.$
Putting these two bounds together then gives
$\triplenorm{\Phi}_1 \leq \norm{J(\Phi)}_1 \leq n \triplenorm{\Phi}_1.$

### Separable channels decrease the entaglement of formation

The entanglement of formation of a density operator $\rho\in\Density(\X^{A}\otimes\X^{B})$ is defined as
$E_{f}(\rho) = \inf\Biggl\{\sum_{a\in\Sigma} p(a) E(u_a u_a^{\ast}) \,:\, \rho = \sum_{a\in\Sigma} p(a) u_a u_a^{\ast} \Biggr\},$
where $E(u u^{\ast}) = S(\tr_{\X^{B}}(u u^{\ast}))$ denotes the entanglement entropy of the pure state $u u^{\ast}$ and the infimum is over all expressions of $\rho$ of the given form, where $\Sigma$ is any alphabet, $p\in\P(\Sigma)$ is a probability vector, and $\{u_a\,:\,a\in\Sigma\} \subset \X^{A}\otimes\X^{B}$ is a collection of unit vectors.

Theorem:

For every choice of complex Euclidean spaces $\X^{A}$, $\X^{B}$, $\Y^{A}$, and $\Y^{B}$, every density operator $\rho\in\Density(\X^{A}\otimes\X^{B})$, and every separable channel $\Phi\in\SepC(\X^{A},\Y^{A}: \X^{B},\Y^{B})$, it holds that
$E_{f}(\Phi(\rho)) \leq E_{f}(\rho).$

Proof:

Assuming that $\Phi\in\SepC(\X^{A},\Y^{A}: \X^{B},\Y^{B})$ allows $\Phi$ to be expressed as
$\Phi(X)=\sum_{b\in\Gamma}(A_b\otimes B_b)X(A_b^\ast\otimes B_b^\ast),$
where $\Gamma$ is some alphabet and $\{A_b : b\in \Gamma\}\subset \Pos(\X^A)$ and $\{B_b : b\in \Gamma\}\subset \Pos(\X^B)$. For $\rho = \sum_{a\in\Sigma} p(a) u_a u_a^{\ast}$, the action of $\Phi$ on $\rho$ is specified by the action of $\Phi$ on each $u_au_a^\ast$ as
$\Phi(\rho)=\sum_{a\in\Sigma} p(a) \Phi(u_a u_a^{\ast})=\sum_{a\in\Sigma} p(a)\sum_{b\in\Gamma}(A_b\otimes B_b)u_au_a^*(A_b^\ast\otimes B_b^\ast).$
Therefore, represent $\Phi(u_au_a*)$ as
$\Phi(u_au_a*)=\sum_{b\in\Gamma}(A_b\otimes B_b)u_au_a^*(A_b^\ast\otimes B_b^\ast)=\sum_{b\in\Gamma}q_a(b) v_{ab} v_{ab}^{\ast},$
where $(A_b\otimes B_b)u_a=\sqrt{q_a(b)}v_{ab}$. Now let
$C_b=\frac{1}{\sqrt{q_a(b)}}(A_b\otimes B_b),$
so that $C_bu_au_a^\ast C_b^\ast=v_{ab}v_{ab}$.

Consider the channel $\Psi_{ab}\in\SepC(\X^{A},\Y^{A}: \X^{B},\Y^{B})$ defined by
$\Psi_{ab}(X)=C_bXC_b^\ast+(\tr(X)-\tr(C_bXC_b^\ast))\sigma,$
for some arbitrary $\sigma\in\Density(\Y^A\otimes\Y^B)$. Then $\Psi_{ab}$ is indeed a channel since it is completely positive because it is defined in terms of $C_b$ and $\Phi$ is assumed to be completely positive. Likewise, $\Psi_{ab}$ is separable since $\Phi$ is separable. Moreover, $\Psi_{ab}$ is trace preserving since
\begin{align*} \tr(\Psi_{ab}(X))&=\tr(C_bXC_b^\ast)+(\tr(X)-\tr(C_bXC_b^\ast))\tr(\sigma) \\ &=\tr(C_bXC_b^\ast)+\tr(X)-\tr(C_bXC_b^\ast) \\ &=\tr(X). \end{align*}
By construction $\Psi_{ab}(u_au_a^\ast)=v_{ab}v_{ab}^\ast$.
Therefore, by a corollary (6.36) to Nielsen's theorem it follows that for every $a\in\Sigma$ with $\rho_a^A=\tr_{\X^B}(u_au_a^\ast)$ and $\sigma_a^A=\tr_{\X^B}(v_{ab}v_{ab}^\ast)$ and $r=min\{rank(\rho_a^A),rank(\sigma^A_a)\}$ it holds that
$\lambda_1(\rho_a^A)+\dots+\lambda_1(\rho_m^A)\leq \lambda_1(\sigma_a^A)+\dots+\lambda_1(\sigma_m^A)$
for every $m\in\{1,\dots, r\}$.

Thus, the von Neummann entropy satisfies $S(\sigma_a^A)\leq S(\rho_a^A)$, which implies that the entanglement entropy also satisfies $E(v_{ab}v_{ab}^\ast)\leq E(u_au_a)$ for all $a\in\Sigma$ and $b\in\Gamma$. Then by tracing out system $B$, and taking the weighted average that is described the original state $\rho$ and using the joint convexity of the von Neumann entropy it follows that
\begin{align*} \sum_{a\in\Sigma} p(a)\sum_{b\in\Gamma}q_a(b) E(v_{ab} v_{ab}^{\ast}) \leq \sum_{a\in\Sigma} p(a) E(u_a u_a^{\ast}). \end{align*}

Therefore, by definition $E_{f}(\Phi(\rho)) \leq E_{f}(\rho)$.

### The SWAP operator and separable measurements

Let $\Sigma$ be an alphabet, let $n = \abs{\Sigma}$, and assume $n\geq 2$. Also let $\X^{A} = \mathbb{C}^{\Sigma}$ and $\X^{B} = \mathbb{C}^{\Sigma}$,  and recall that the swap operator $W\in\Lin(\X^{A}\otimes\X^{B})$ may be defined as
$W = \sum_{a,b\in\Sigma} E_{a,b} \otimes E_{b,a}.$
Define $\Pi_0,\,\Pi_1\in\Proj(\X^{A}\otimes\X^{B})$ and $\sigma_0,\sigma_1\in\Density(\X^{A}\otimes\X^{B})$ as follows:
$\Pi_0 = \frac{1}{2} \I\otimes\I + \frac{1}{2} W,\qquad \Pi_1 = \frac{1}{2} \I\otimes\I - \frac{1}{2} W,\qquad \sigma_0 = \frac{1}{\binom{n+1}{2}}\Pi_0,\qquad \sigma_1 = \frac{1}{\binom{n}{2}}\Pi_1.$

Theorem:

If $\mu:\{0,1\}\rightarrow\Pos(\X^{A}\otimes\X^{B})$ is a separable measurement, then
$\frac{1}{2} \ip{\mu(0)}{\sigma_0} + \frac{1}{2} \ip{\mu(1)}{\sigma_1} \leq \frac{1}{2} + \frac{1}{n+1}.$

Proof:

Assuming that $\mu$ is a separable measurement allows $\mu(0)$ to be expressed as
$\mu(0)=\sum_{a\in\Gamma}P_a\otimes Q_a,$
where $\{P_a : a\in \Gamma\}\subset \Pos(\X^A)$ and $\{Q_a : a\in \Gamma\}\subset \Pos(\X^B)$. Moreover, since $\mu$ is a measurement it must satisfy the completeness condition that $\mu(0)+\mu(1)=\I\otimes\I$ implying that $\mu(1)$ can be expressed in terms of $\mu(0)$ as
$\mu(1)=\I\otimes\I-\mu(0)=\I\otimes\I-\sum_{a\in\Gamma}P_a\otimes Q_a.$
Write $\sigma_0$ and $\sigma_1$ more explicitly as
\begin{align*} \sigma_0 &= \frac{1}{\binom{n+1}{2}}\Pi_0=\frac{1}{(n+1)n}(\I\otimes\I + W), \\ \sigma_1 &= \frac{1}{\binom{n+1}{2}}\Pi_0=\frac{1}{(n-1)n}(\I\otimes\I - W). \end{align*}
Then,
\begin{align*} \ip{\mu(0)}{\sigma_0} + \ip{\mu(1)}{\sigma_1} &= \ip{\mu(0)}{\sigma_0} + \ip{\I\otimes\I-\mu(0)}{\sigma_1} \\ &=\frac{1}{(n+1)n}\ip{\mu(0)}{\I\otimes\I + W} + \frac{1}{(n-1)n}\ip{\I\otimes\I-\mu(0)}{\I\otimes\I - W} \\ &=\frac{1}{(n+1)n}(\ip{\mu(0)}{\I\otimes\I }+\ip{\mu(0)}{ W}) \\ & \ \ \ + \frac{1}{(n-1)n}(\ip{\I\otimes\I}{\I\otimes\I} -\ip{\I\otimes\I}{W} ) \\ & \ \ \ + \frac{1}{(n-1)n}(\ip{\mu(0)}{W}-\ip{\mu(0)}{\I\otimes\I} ). \end{align*}
Now observe that
$\ip{\mu(0)}{\I\otimes\I }=\tr\left(\mu(0)^\ast\I\otimes\I\right)=\tr(\sum_{a\in\Gamma}(P_a\otimes Q_a)(\I\otimes\I))=\tr(\sum_{a\in\Gamma}P_a\otimes Q_a)=\sum_{a\in\Gamma}\tr(P_a)\tr(Q_a),$
$\ip{\I\otimes\I}{\I\otimes\I}=\tr((\I\otimes\I)(\I\otimes\I))=\tr(\I\otimes\I)=n^2,$
\begin{align*} \ip{\mu(0)}{ W}=\tr(\sum_{a\in\Gamma}(P_a\otimes Q_a)W)&=\sum_{a\in\Gamma}\sum_{i,j\in\Sigma}(e_i^\ast\otimes e_j^\ast)(P_a\otimes Q_a)W(e_i\otimes e_j) \\ &=\sum_{a\in\Gamma}\sum_{i,j\in\Sigma}(e_i^\ast\otimes e_j^\ast)(P_a\otimes Q_a)(e_j\otimes e_i) \\ &=\sum_{a\in\Gamma}\sum_{i,j\in\Sigma}(e_i^\ast P_a e_j)\otimes(e_j^\ast Q_a e_i) \\ &=\sum_{a\in\Gamma}\sum_{i,j\in\Sigma}e_i^\ast P_a e_je_j^\ast Q_a e_i \\ &=\sum_{a\in\Gamma}\sum_{i\in\Sigma}e_i^\ast P_a Q_a e_i \\ &=\sum_{a\in\Gamma}\tr(P_a Q_a), \\ \end{align*}
and by similar arguments used in the previous calculation $\ip{\I\otimes\I}{W}=\tr(\I\I)=\tr(\I)=n$.
Therefore, the original expression of interest can be simplified as
\begin{align*} \ip{\mu(0)}{\sigma_0} + \ip{\mu(1)}{\sigma_1}&=\frac{1}{(n+1)n}(\sum_{a\in\Gamma}\tr(P_a)\tr(Q_a)+\sum_{a\in\Gamma}\tr(P_a Q_a)) \\ & \ \ \ + \frac{1}{(n-1)n}(n^2 -n ) \\ & \ \ \ + \frac{1}{(n-1)n}(\sum_{a\in\Gamma}\tr(P_a Q_a)-\sum_{a\in\Gamma}\tr(P_a)\tr(Q_a) ) \\ &=1+\frac{2}{(n+1)(n-1)n}\left(n\sum_{a\in\Gamma}\tr(P_a Q_a)-\sum_{a\in\Gamma}\tr(P_a)\tr(Q_a)\right) \\ &\leq 1+\frac{2}{(n+1)(n-1)n}(n^2-n) \\ &=1+\frac{2}{(n+1)}. \end{align*}
Here, the inequality follows from the fact that the quantity $\sum_{a\in\Gamma}\tr(P_a)\tr(Q_a)$ is minimized when the projectors $P_a$ and $Q_a$ both have rank $1$ so that $\tr(P_a)\tr(Q_a)=1$ implying that $\sum_{a\in\Gamma}\tr(P_a)\tr(Q_a)=n$.
Furthermore, in this case,  $\sum_{a\in\Gamma}\tr(P_a Q_a)=n$.

Therefore, dividing both sides of the inequality by $2$ gives
$\frac{1}{2} \ip{\mu(0)}{\sigma_0} + \frac{1}{2} \ip{\mu(1)}{\sigma_1} \leq \frac{1}{2} + \frac{1}{n+1}.$

### Singular values before and after the action of a unital channel

Let $\X$ be a complex Euclidean space having dimension $n$, let $\Phi\in\Channel(\X)$ be a unital channel, let $X\in\Lin(\X)$ be an operator, and let $Y = \Phi(X)$. Following our usual conventions, let $s_1(X) \geq \cdots \geq s_n(X)$ and $s_1(Y) \geq \cdots \geq s_n(Y)$ denote the singular values of $X$ and $Y$, respectively, ordered from largest to smallest, and where we take $s_k(X) = 0$ when $k > \rank(X)$ and $s_k(Y) = 0$ when $k > \rank(Y)$.

Theorem:

$s_1(X) + \cdots + s_m(X) \geq s_1(Y) + \cdots + s_m(Y)$ for every $m \in \{1,\ldots,n\}$.

Proof:

Consider the space $\X\oplus\X$ and let
$\overline{\X}:= \begin{pmatrix} 0 & X\\ X^{\ast} & 0 \end{pmatrix}.$
Then it holds that $\overline{X}=\overline{X}^\ast$ so that $\overline{X}\in\Herm(\X\oplus\X)$. In addition, consider the channel $\overline{\Phi}\in\Channel(\X\oplus\X)$ defined as
$\overline{\Phi} \begin{pmatrix} A & B\\ C & D \end{pmatrix}= \begin{pmatrix} \Phi(A) & \Phi(B)\\ \Phi(C) & \Phi(D) \end{pmatrix}.$
Then it follows that
$\overline{\Phi}(\I_{\X\oplus\X}) \begin{pmatrix} \Phi(\I_{\X}) & 0\\ 0 & \Phi(\I_{\X}) \end{pmatrix}= \begin{pmatrix} \I_{\X} & 0\\ 0 & \I_{\X} \end{pmatrix}= \I_{\X\oplus\X},$
which implies that $\overline{\Phi}$ is unital. Moreover, letting $\Phi(X)=Y$ with $\Phi(X^\ast)=Y^\ast$ yields
$\overline{\Phi}(\overline{X})= \begin{pmatrix} 0&\Phi(X)\\ \Phi(X^\ast) & 0) \end{pmatrix}= \begin{pmatrix} 0 & Y\\ Y^\ast & 0 \end{pmatrix}=:\overline{Y}.$
It has now been shown that there exists a unital channel $\overline{\Phi}\in \Trans(\X\oplus\X)$ such that $\overline{\Phi}(\overline{X})=\overline{Y}$, where $\overline{X},\overline{Y}\in\Herm(\X\oplus\X)$. Therefore, by Uhlmann's theorem, this is equivalent to the statement that $\lambda(\overline{Y}) \prec \lambda(\overline{X})$, where $\lambda(\overline{Y})$ and  $\lambda(\overline{X})$ are the vector of eigenvalues of $\overline{Y}$ and $\overline{X}$, respectively.

In order to determine the singular values of  $\overline{Y}$ and $\overline{X}$, consider the singular value decompositions of $Y$ and $X$:
$X=\sum_{k=1}^{r_X}s_k(X)x'_kx_k^\ast \ \ \ \text{and} \ \ \ Y=\sum_{k=1}^{r_Y}s_k(Y)y'_ky_k^\ast,$
where $r_X=\rank(X)$, $r_Y=\rank(Y)$, $s(X)=(s_1(X),\dots,s_{r_X}(X))$ and $s(Y)=(s_1(Y),\dots,s_{r_Y}(Y))$ are the vectors of the non-zero singular values of $X$ and $Y$ (assumed to be written in decreasing order as the index increases), and
$\{x_1,\dots,x_{r_X}\}, \{x'_1,\dots,x'_{r_X}\}\subseteq\X \ \ \ \text{and} \ \ \ \{y_1\dots y_{r_Y}\}, \{y'_1\dots y'_{r_Y}\}\subseteq\Y$
are orthonormal sets of vectors in their respective spaces.

Then since the block matrix $\overline{X}$ can be diagonalized as
$\overline{\X}:= U \begin{pmatrix} 0 & X\\ X^{\ast} & 0 \end{pmatrix}U^\dagger= \begin{pmatrix} X & 0\\ 0 & -X^{\ast}, \end{pmatrix}$
with the unitary
$U=\frac{1}{\sqrt{2}}\begin{pmatrix} \I_\X & \I_\X\\ \I_\X & -\I_\X \end{pmatrix},$
the eigenvalues of $\overline{X}$ are given by
$\lambda(\X)=\{s_1(X),\dots,s_{r_X}(X),-s_{r_X}(X),\dots,-s_1(X)),$
where here they have been arranged in decreasing order. An equivalent argument shows that the the eigenvalues of $\overline{Y}$ are similarly given by
$\lambda(\Y)=\{s_1(Y),\dots,s_{r_Y}(Y),-s_{r_Y}(Y),\dots,-s_1(Y)).$

However the singular values of $\overline{X}$ and $\overline{Y}$ are related to the eigenvalues via the absolute value. Therefore the singular values $\overline{s}(X)$ of $\overline{X}$ and $\overline{s}(Y)$ of $\overline{Y}$ are positive and there are at least two equal values for each $s_k$. That is,
\begin{align*} \overline{s}(X)&=(s_1(X),s_1(X),\dots,s_{r_X}(X),s_{r_X}(X),\dots) \\ \overline{s}(Y)&=(s_1(Y),s_1(Y),\dots,s_{r_Y}(Y),s_{r_Y}(Y),\dots) , \end{align*}
where all values $s_{j}(X)$ and $s_{k}(Y)$for $j\geq r_X$ and $k \geq r_Y$ are assumed to be zero by the convention described in the problem statement.

Then it follows that for all $k\in\{1,\dots, n\}$,
$s_1(X)+s_1(X)+\dots+s_{k}(X)+s_{k}(X)\geq s_1(Y)+s_1(Y)+\dots+s_{k}(Y)+s_{k}(Y),$
or equivalently that
$s_1(X)\dots+s_{k}(X)\geq s_1(Y)+\dots+s_{k}(Y).$

### When the Choi representation of a channel is seperable

Let $\X$ and $\Y$ be complex Euclidean spaces, and let $\Phi\in\Channel(\X,\Y)$ be a channel. A positive operator $P\in \Pos(\X\otimes\Y)$ is separable if and only if there exists a positive integer $m$ and positive semi definite operators
$Q_1,Q_2, \dots, Q_m\in\Pos(\X) \ \ \text{and} \ \ R_1,R_2, \dots, R_m\in\Pos(\Y)$
such that
$P=\SUM{j=1}{m}Q_j\otimes R_j$
Denote by $\Sep(\X : \Y)$ the collection of all such separable operators.

Theorem:

The following two properties are equivalent:
1. For every complex Euclidean space $\Z$ and everydensity operator $\rho\in\Density(\X\otimes\Z)$, it holds that $\bigl(\Phi\otimes \I_{\Lin(\Z)}\bigr)(\rho) \in \Sep(\Y:\Z)$.
2. $J(\Phi) \in \Sep(\Y:\X)$.

Proof:

Recall that the Choi representation $J(\Phi)$ can be expressed as
$J(\Phi)=(\Phi\otimes\I_{\Lin(\X)})(\vec(\I_{\X})\vec(\I_{\X})^\ast).$
Now first assume that property 1 holds, and let $\Z=\X=\mathbb{C}^\Sigma$ and consider the density operator $\rho\in\Density(\X\otimes\X)$ given by
$\rho=\frac{1}{|\Sigma|}(\vec(\I_\X)\vec(\I_\X)^\ast).$
The assumption of property 1 then reads
$(\Phi\otimes\I_{\Lin(\X)})(\vec(\I_\X)\vec(\I_\X)^\ast)\in\Sep(\Y:\X),$
implying that $J(\Phi)\in\Sep(\Y : \X)$, which is the claim of property 2

Instead, now assume that property 2 holds so that 2: $J(\Phi) \in \Sep(\Y:\X)$. Then by the Woronowicz-Horodecki criterion this statement is equivalent to one where for every complex Euclidean space $\Z$ and every positive map $\Xi\in\Trans(\Y,\Z)$
$(\Xi\otimes\I_{\Lin(\X)})(J(\Phi))\in\Pos(\Z\otimes\X).$

Substituting the expression recalled above for $J(\Phi)$ then gives
\begin{align*} (\Xi\otimes\I_{\Lin(\X)})(J(\Phi))&=(\Xi\otimes\I_{\Lin(\X)})(\Phi\otimes\I_{\Lin(\X)})(\vec(\I_{\X})\vec(\I_{\X})^\ast) \\ &=(\Xi(\Phi)\otimes\I_{\Lin(\X)})(\vec(\I_{\X})\vec(\I_{\X}^\ast))\\ &=J(\Xi(\Phi)). \end{align*}

Hence, $J(\Xi(\Phi))\in\Pos(\Z\otimes\X)$. This implies that there exists a complex Euclidean space $\W$ an an operator $A\in\Lin(\X,\Z\otimes\W)$ such that
$\Xi(\Phi)(X)=\tr_{|W}(AXA^\ast)$
for all $X\in\Lin(\X)$. Consider any $\rho\Density(\X\otimes\Z)$. Then
$(\Xi\otimes\I_{\Lin(\X)})(\Phi\otimes\I_{\Lin(\X)})(\vec(\I_{\X})\vec(\I_{\X})^\ast)(\rho)\in\Pos(\Z\otimes\X),$
which should imply that $(\Phi\otimes\I_{L(\Z)})(\rho)\in\Sep(\Y:\Z)$ by the Woronowicz-Horodecki criterion.

### Some more facts concerning the von Neumann entropy

Let $\reg{X}$, $\reg{Y}$, and $\reg{Z}$ be registers, assume that the classical state set of $\reg{X}$ is $\Sigma$, and let $n = \abs{\Sigma}$.

Theorem:
For every state $\rho\in\Density(\X\otimes\Y\otimes\Z)$ of $(\reg{X},\reg{Y},\reg{Z})$ it holds that
$S(\reg{X},\reg{Y} : \reg{Z}) \leq S(\reg{Y}:\reg{X},\reg{Z}) + 2\log(n).$

Proof:

From the result proved in a previous post, it holds that for every choice of registers $\reg{X}$ and $\reg{Z}$, and for any state of $\Density(\X\otimes\Z)$, $S(\reg{Z})\leq S(\reg{X})+S(\reg{X}, \reg{Z})$, or equivalently that
$0\leq S(\reg{X})+S(\reg{X}, \reg{Z})-S(\reg{Z}).$
Also, by sub-additivity $S(\reg{X},\reg{Y})\leq S(\reg{X})+S(\reg{Y})$, or equivalently
$0\leq S(\reg{X})+S(\reg{Y})-S(\reg{X},\reg{Y}).$
Then by adding these two inequalities, it must also hold that
$0\leq S(\reg{X})+S(\reg{X}, \reg{Z})-S(\reg{Z})+S(\reg{X})+S(\reg{Y})-S(\reg{X},\reg{Y}),$
and since in general $S(\reg{X})\leq \log(n)$ or $2S(\reg{X})\leq 2\log(n)$,
$0\leq S(\reg{X}, \reg{Z})-S(\reg{Z})+S(\reg{Y})-S(\reg{X},\reg{Y})+2\log(n).$
Therefore,
$S(\reg{Z})+S(\reg{X},\reg{Y})\leq S(\reg{X}, \reg{Z})+S(\reg{Y})+2\log(n),$
Adding $-S(\reg{X},\reg{Y}, \reg{Z})$ to both sides of this inequality yields
$S(\reg{Z})+S(\reg{X},\reg{Y})-S(\reg{X},\reg{Y}, \reg{Z})\leq S(\reg{X}, \reg{Z})+S(\reg{Y})-S(\reg{X},\reg{Y}, \reg{Z})+2\log(n),$
or equivalently
$S(\reg{X},\reg{Y} : \reg{Z})\leq S(\reg{Y}:\reg{X},\reg{Z}) + 2\log(n).$

Here is an example, for $\Sigma = \{0,1\}$, of a state $\rho$ for which this inequality becomes an equality.

Consider the three qubit pure state
$\left|\psi\right>_{\reg{X},\reg{Y},\reg{Z}}=\frac{1}{\sqrt{2}}(\left|0\right>_{\reg{X}}\left|0\right>_{\reg{Y}}\left|0\right>_{\reg{Z}}+\left|1\right>_{\reg{X}}\left|0\right>_{\reg{Y}}\left|1\right>_{\reg{Z}}).$
Then the states of  the following particular subystems are also pure :
\begin{align*} \left|\psi\right>_{\reg{Y}}&=\left|0\right>_{\reg{Y}}\\ \left|\psi\right>_{\reg{X},\reg{Z}}&=\frac{1}{\sqrt{2}}(\left|0\right>_{\reg{X}}\left|0\right>_{\reg{Z}}+\left|1\right>_{\reg{X}}\left|1\right>_{\reg{Z}}). \end{align*}

However, the following subsystems are in the maximally mixed state:
\begin{align*} \rho_{\reg{X}}=\frac{1}{2}(\left|0\right>\left<0\right|+\left|1\right>\left<1\right|) \\ \rho_{\reg{Z}}=\frac{1}{2}(\left|0\right>\left<0\right|+\left|1\right>\left<1\right|). \end{align*}
Moreover, the state of the subsystem $\reg{X},\reg{Y}$ is in the tensor product state
\begin{align*} \rho_{\reg{X},\reg{Y}}&=\rho_{\reg{X}}\otimes \rho_{\reg{Y}}\\ &=\frac{1}{2}\bigl(\left|0\right>\left<0\right|+\left|1\right>\left<1\right|\bigr)\otimes \left|0\right>\left<0\right| \end{align*}

The entropy of a pure state is zero and the entropy of a maximally entangled state in this case is $\log(n)=\log(2)$. Then the entropies of the states listed above are
\begin{align*} S(\reg{Y})=S(\reg{X},\reg{Z})&=0 \\ S(\reg{X})=S(\reg{Z})&=\log(2) \\ S(\reg{X},\reg{Y})=S(\reg{X})+S(\reg{Y})&=\log(2). \end{align*}

Therefore,
\begin{align*} S(\reg{X},\reg{Y} : \reg{Z}) - S(\reg{Y}:\reg{X},\reg{Z})&=S(\reg{X},\reg{Y})-S(\reg{Y})-S(\reg{X},\reg{Z})+S(\reg{Z})+\left(S(\reg{X},\reg{Y},\reg{Z})-S(\reg{X},\reg{Y},\reg{Z})\right) \\ &=S(\reg{X},\reg{Y})-S(\reg{Y})-S(\reg{X},\reg{Z})+S(\reg{Z}) \\ &=S(\reg{X}+S(\reg{Y})-S(\reg{Y})-S(\reg{X},\reg{Z})+S(\reg{Z}) \\ &=S(\reg{X}-S(\reg{X},\reg{Z})+S(\reg{Z}) \\ &=\log(2)-0+\log(2) \\ &=2\log(2), \end{align*}
which implies that
$S(\reg{X},\reg{Y} : \reg{Z}) = S(\reg{Y}:\reg{X},\reg{Z})+2\log(2).$

Theorem:

Let $p\in\P(\Sigma)$ be a probability vector, let $\{\sigma_a\,:\,a\in\Sigma\} \subset \Density(\Y\otimes\Z)$ be a collection of density operators, and let
$\rho = \sum_{a\in \Sigma} p(a) E_{a,a}\otimes \sigma_a.$
In other words, $\rho$ is a state of $(\reg{X},\reg{Y},\reg{Z})$ in which we view $\reg{X}$ as a classical register. With respect to the state $\rho$, it holds that
$S(\reg{X},\reg{Y} : \reg{Z}) \leq S(\reg{Y}:\reg{X},\reg{Z}) + \log(n).$

Proof:

First, observe that
\begin{align*} S(\reg{X}|\reg{Y})-S(\reg{X}|\reg{Z})&=S(\reg{X},\reg{Y})-S(\reg{Y})-S(\reg{X},\reg{Z})+S(\reg{Z})\\ &=S(\reg{X},\reg{Y})-S(\reg{Y})-S(\reg{X},\reg{Z})+S(\reg{Z})+\left(S(\reg{X},\reg{Y},\reg{Z})-S(\reg{X},\reg{Y},\reg{Z})\right) \\ &=S(\reg{X},\reg{Y} : \reg{Z}) - S(\reg{Y}:\reg{X},\reg{Z}). \end{align*}

Now consider the individual bounds on the quantities $S(\reg{X}|\reg{Y})$ and $S(\reg{X}|\reg{Z})$ in order to infer a bound on the difference $S(\reg{X}|\reg{Y})-S(\reg{X}|\reg{Z})$. In this case, since the state of register $\reg{X}$ is classical the conditional entropies  are at most $S(\reg{X}|\reg{Y})\leq \log(n)$ and likewise $S(\reg{X}|\reg{Z})\leq \log(n)$. On the contrary, it could be the case that $S(\reg{X}|\reg{Y})\leq 0$ or $S(\reg{X}|\reg{Z})\leq 0$ in the presence of stronger entanglement correlations in which case $S(\reg{Y})\leq S(\reg{X},\reg{Y})$ or $S(\reg{Z})\leq S(\reg{X},\reg{Z})$. Therefore, the largest the difference of the two could be is when $S(\reg{X}|\reg{Y})=\log(n)$ and $S(\reg{X}|\reg{Z})=0$. Hence, $S(\reg{X}|\reg{Y})-S(\reg{X}|\reg{Z})\leq \log(n)$, or equivalently $S(\reg{X},\reg{Y} : \reg{Z})-S(\reg{Y}:\reg{X},\reg{Z}) \leq \log(n)$ implying that  $S(\reg{X},\reg{Y} : \reg{Z}) \leq S(\reg{Y}:\reg{X},\reg{Z}) + \log(n)$.

### Some facts concerning the von Neumann entropy and quantum mutual information

Here we'll prove some facts concerning the von Neumann entropy and quantum mutual information.

Let $\X$ be an $n$-dimensional complex Euclidean space, and let $\rho\in\Density(\X)$ be a density operator. Recall that the von Neumann entropy of $\rho$ is defined as
$S(\rho):=-\tr(\rho \ \text{log}(\rho)),$
or equivalently as
$S(\rho):=H(\lambda(\rho)),$
where $\lambda(\rho)=(\lambda_1(\rho),\lambda_2(\rho),\dots,\lambda_n(\rho))$ is the vector of eigenvalues of $\rho$, and
$H(p):=\sum_{a\in\Sigma}-p(a)\log(p(a))),$
is the classical Shannon entropy of a vector $p\in\mathbb{R}^{\Sigma}$ over some alphabet $\Sigma$.

Theorem:

For every choice of complex Euclidean spaces $\X$ and $\Y$, and every vector $u \in \X\otimes\Y$, it holds that $S(\tr_{\X}(u u^{\ast})) = S(\tr_{\Y}(u u^{\ast}))$.

Proof:

The vector $u\in\X\otimes\Y$ can be expressed in its Schmidt decomposition after making the unique identification $u=vec(A)$ as
$u=\sum_{k=1}^{r}s_kx_k\otimes y_k,$
where $r=rank(A)$, $0\leq s_1,\dots, s_r\in\mathbb{R}$ are the singular values, and $\{x_1,\dots,x_r\}\subset\X$ and $\{y_1\dots y_r\}\subseteq\Y$ are orthonormal sets. Then
$uu^\ast=\sum_{j,k=1}^{r}s_js_kx_jx_k^\ast\otimes y_jy_k^\ast,$
and therefore
$\tr_{\X}(uu^\ast)=\sum_{k=1}^{r}s_k^2x_kx_k^\ast \ \ \ \ \text{and} \ \ \ \tr_{\Y}(uu^\ast)=\sum_{k=1}^{r}s_k^2y_ky_k^\ast.$
Now let $\lambda=(s_1^2,\dots,s_r^2)$, and observe that $\lambda$ is the vector of  non-zero eigenvalues of both $\tr_{\X}(uu^\ast)$ and $\tr_{\Y}(uu^\ast)$ since they are implicitly expressed in their own Schmidt decompositions above.

Hence, (by definition) the von Neumann entropy of each is
$S(\tr_{\X}(u u^{\ast})) =H(\lambda) = S(\tr_{\Y}(u u^{\ast})).$

Theorem:

For every choice of registers $\reg{X}$ and $\reg{Y}$, and for every state $\rho\in\Density(\X\otimes\Y)$ of these registers, it holds that $S(\reg{X}) \leq S(\reg{Y}) + S(\reg{X},\reg{Y})$.}

Proof:

Choose a complex Euclidean space $\Z$ such that $\dim(\Z)\geq\rank(\rho)$ so that there exists a purification $\rho'=uu^\ast\in D(\X\otimes\Y\otimes\Z)$, and then let $\rho'$ be the joint state of the registers $\reg{X},\reg{Y},\reg{Z}$. Now consider the following. Since $\rho'$ is a pure state $S(\reg{X},\reg{Y}, \reg{Z})=0$. Moreover, $\rho'[\reg{X},\reg{Z}]=\tr_{\Y}(\rho')$ and $\rho'[\reg{Y}]=\tr_{\X\otimes\Z}(\rho')$, but since $\rho'=uu^\ast$ is a pure state the result of part (a) implies that $S(\tr_{\Y}(\rho'))=S(\tr_{\X\otimes\Z}(\rho'))$ or equivalently that $S(\reg{Y}) = S(\reg{X},\reg{Z})$.

By strong sub-additivity, for any possible state of the registers $\reg{X},\reg{Y}, \reg{Z}$,
$S(\reg{X},\reg{Y}, \reg{Z})+S(\reg{X})\leq S(\reg{X}, \reg{Z})+S(\reg{X}, \reg{Y}).$
However, by previous considerations we have that $S(\reg{X},\reg{Y}, \reg{Z})=0$ and $S(\reg{Y}) = S(\reg{X},\reg{Z})$, which after substituting implies that
$S(\reg{X})\leq S(\reg{Y})+S(\reg{X}, \reg{Y}).$

Theorem:

Let $\reg{X}$ and $\reg{Y}$ be registers, let $\Sigma$ be an alphabet, let $p\in\P(\Sigma)$ be a probability vector, and let $\{\sigma_a\,:\,a\in\Sigma\}\subset\Density(\X)$ and $\{\xi_a\,:\,a\in\Sigma\}\subset\Density(\Y)$ be arbitrary collections of density operators. For $(\reg{X},\reg{Y})$ being in the state
$\rho = \sum_{a\in\Sigma} \, p(a) \sigma_a\otimes\xi_a,$
it holds that $S(\reg{X} : \reg{Y}) \leq H(p)$.

Proof:

In this case, the relative state of the two registers is given by
$\rho[\reg{X}]=\tr_\Y(\rho)=\sum_{a\in\Sigma}p(a) \sigma_a \ \ \ \text{and} \ \ \ \rho[\reg{Y}]=\tr_\X(\rho)=\sum_{a\in\Sigma}p(a) \xi_a.$
so that
\begin{align*} \rho[\reg{X}]\otimes\rho[\reg{Y}]&=\left(\sum_{a\in\Sigma}p(a) \sigma_a\right)\otimes\left(\sum_{b\in\Sigma}p(b) \xi_b\right) \\ &=\sum_{a\in\Sigma}\sum_{b\in\Sigma}p(a)p(b) \sigma_a\otimes \xi_b. \end{align*}

Then the mutual information $S(\reg{X} : \reg{Y})$ can be expressed as
\begin{align*} S(\reg{X} : \reg{Y})&=S(\rho||\rho[\reg{X}]\otimes\rho[\reg{Y}]) \\ &=S\left( \sum_{a\in\Sigma} \, p(a) \sigma_a\otimes\xi_a || \sum_{a\in\Sigma}\sum_{b\in\Sigma}p(a)p(b) \sigma_a\otimes \xi_b\right) \\ &\leq \sum_{a\in\Sigma}S\left( p(a) \sigma_a\otimes\xi_a || \sum_{b\in\Sigma}p(a)p(b) \sigma_a\otimes \xi_b\right) \\ &=\sum_{a\in\Sigma}S\left( p(a) \sigma_a\otimes\xi_a || \, p(a) \sigma_a\otimes \sum_{b\in\Sigma}p(b)\xi_b\right) \\ &=\sum_{a\in\Sigma}\left(\tr(\xi_a)S(p(a)\sigma_a || p(a)\sigma_a) + \tr(p(a)\sigma_a)S(\xi_a || \sum_{b\in\Sigma}p(b)\xi_b) \right), \end{align*}
but since $S(p(a)\sigma_a || p(a)\sigma_a)=0$ and for $\sigma_a\in\Density(\X)$ it is always the case that $\tr(\sigma_a)=1$, it follows that
\begin{align*} S(\reg{X} : \reg{Y})\leq &\sum_{a\in\Sigma} p(a)S\left(\xi_a || \sum_{b\in\Sigma}p(b)\xi_b\right) \\ =&\sum_{a\in\Sigma} p(a)S\left(\frac{p(a)}{p(a)}\xi_a || \sum_{b\in\Sigma}p(b)\xi_b\right) \\ =&\sum_{a\in\Sigma}p(a)\tr\left( \xi_a\log\left(\frac{p(a)}{p(a)}\xi_a\right) - \xi_a\log\left(\sum_{b\in\Sigma}p(b)\xi_b\right)\right) \\ =&\sum_{a\in\Sigma}p(a)\tr\left(-\xi_a\log(p(a))+ \xi_a\log\left(p(a)\xi_a\right) - \xi_a\log\left(\sum_{b\in\Sigma}p(b)\xi_b\right)\right) \\ =&\sum_{a\in\Sigma}p(a)\tr(-\xi_a\log(p(a)))+p(a)\tr\left( \xi_a\log\left(p(a)\xi_a\right) - \xi_a\log\left(\sum_{b\in\Sigma}p(b)\xi_b\right)\right) \\ =&\sum_{a\in\Sigma}-p(a)\log(p(a)))+p(a)\tr\left( \xi_a\log\left(p(a)\xi_a\right) - \xi_a\log\left(\sum_{b\in\Sigma}p(b)\xi_b\right)\right) \\ =&H(p)+c. \end{align*}
Here, the Shannon entropy is by definition
$H(p)=\sum_{a\in\Sigma}-p(a)\log(p(a))),$
and the value $c$ has been introduced for convenience to represent the remaining quantity
$c:=\sum_{a\in\Sigma}p(a)\tr\left( \xi_a\log\left(p(a)\xi_a\right) - \xi_a\log\left(\sum_{b\in\Sigma}p(b)\xi_b\right)\right).$

In general, by the monoticity of the logarithmic function for $0\leq a,b\in \mathbb{R}$ it is the case that $\log(a)\leq(a+b)$. This then implies that

$\sum_{a\in\Sigma}p(a)\tr\left( \xi_a\log\left(p(a)\xi_a\right) \leq \xi_a\log\left(\sum_{b\in\Sigma}p(b)\xi_b\right)\right),$
so that $c\leq 0$.

Hence $S(\reg{X} : \reg{Y})\leq H(p)+c\leq H(p)$.

### Bounding the quantum relative entropy in terms of the classical relative entropy

Theorem:

Let $\X$ be a complex Euclidean space, let $\Sigma$ be an alphabet, let $p,q\in\P(\Sigma)$ be probability vectors, and let $\{\rho_a\,:\,a\in\Sigma\}\subset\Density(\X)$ and $\{\sigma_a\,:\,a\in\Sigma\}\subset\Density(\X)$ be collections of density operators indexed by $\Sigma$. Assume that $\im(\rho_a)\subseteq\im(\sigma_a)$, $p(a)>0$, and $q(a) > 0$ for all $a\in\Sigma$. For two positive definite operators $P$ and $Q$ acting on $\X$, denote the quantum relative entropy as
$S(P || Q )=\tr(P \ \text{log}(P))-\tr(P \ \text{log}(Q))$

Then
$S\Biggl(\sum_{a\in\Sigma} p(a) \rho_a \Bigg\| \sum_{a\in\Sigma} q(a) \sigma_a \Biggr) \leq \sum_{a\in\Sigma} p(a) S(\rho_a \| \sigma_a) + D(p \| q),$
where
$D(p \| q):=\sum_{a\in\Sigma}\Bigl(p(a)\text{log}\Bigl(\frac{p(a)}{q(a)}\Bigr).$
is the classical relative entropy of two probability vectors $p,q\in\P(\Sigma)$.

Proof:

Consider the following fact, which states that for a complex Euclidean space $\X$ and operators $P_0,P_1,Q_0,Q_1\in \Pos(\X)$,
$S(P_0+P_1\| Q_0+Q_1)\leq S(P_0 \|Q_0)+S( P_1 \| Q_1).$
Therefore,
$S\Biggl(\sum_{a\in\Sigma} p(a) \rho_a \Bigg\| \sum_{a\in\Sigma} q(a) \sigma_a \Biggr) \leq \sum_{a\in\Sigma}S\left( p(a) \rho_a \| q(a) \sigma_a \right).$
Now as a consequence, for $P,Q\in\Pos{\X}$ and scalars $\alpha,\beta\in(0,\infty)$
$S(\alpha P \| \beta Q)=\alpha S(P\|Q)+\alpha \text{log}(\alpha/\beta)\tr(P).$
Thus,
\begin{align*} S\Biggl(\sum_{a\in\Sigma} p(a) \rho_a \Bigg\| \sum_{a\in\Sigma} q(a) \sigma_a \Biggr) & \leq \sum_{a\in\Sigma}S\left( p(a) \rho_a \| q(a) \sigma_a \right) \\ &= \sum_{a\in\Sigma} \Bigl(p(a)S(\rho_a \| \sigma_a)+p(a)\text{log}\Bigl(\frac{p(a)}{q(a)}\Bigr)\tr(\rho_a) \Bigr) \\ &=\sum_{a\in\Sigma} \Bigl(p(a)S(\rho_a \| \sigma_a)\Bigr)+\sum_{a\in\Sigma}\Bigl(p(a)\text{log}\Bigl(\frac{p(a)}{q(a)}\Bigr) \Bigr), \end{align*}
since $\rho_a\in\Density(\X)$ implies that $\tr(\rho_a)=1$. Also, by definition of the relative entropy of two probability vectors $p,q\in\P(\Sigma)$,
$D(p \| q):=\sum_{a\in\Sigma}\Bigl(p(a)\text{log}\Bigl(\frac{p(a)}{q(a)}\Bigr).$
Hence,
$\sum_{a\in\Sigma} \Bigl(p(a)S(\rho_a \| \sigma_a)\Bigr)+\sum_{a\in\Sigma}\Bigl(p(a)\text{log}\Bigl(\frac{p(a)}{q(a)}\Bigr) \Bigr)=\sum_{a\in\Sigma} \Bigl(p(a)S(\rho_a \| \sigma_a)\Bigr)+ D(p \| q) \Bigr),$
which implies
$S\Biggl(\sum_{a\in\Sigma} p(a) \rho_a \Bigg\| \sum_{a\in\Sigma} q(a) \sigma_a \Biggr) \leq \sum_{a\in\Sigma} p(a) S(\rho_a \| \sigma_a) + D(p \| q).$

### When a channel is "optimal"

Let $\X$ and $\Y$ be complex Euclidean spaces and let $H\in\Herm(\Y\otimes\X)$ be an arbitrary Hermitian operator. Consider the problem of maximizing the value
$\ip{H}{J(\Phi)}$
over all choices of a channel $\Phi\in\Channel(\X,\Y)$.

One may observe that there must always exist at least one choice of a channel $\Psi\in\Channel(\X,\Y)$ such that
$\ip{H}{J(\Psi)} = \sup\{\ip{H}{J(\Phi)}\,:\,\Phi\in\Channel(\X,\Y)\},$
by virtue of the fact that $\Channel(\X,\Y)$ is a compact set and $\Phi\mapsto\ip{H}{J(\Phi)}$ is a continuous function. For any channel $\Psi\in\Channel(\X,\Y)$ satisfying the identity above, let us say that $\Psi$ is optimal with respect to $H$.

Theorem:

$\Phi\in\Channel(\X,\Y)$ is optimal with respect to $H$ if and only if
$\I_{\Y} \otimes \tr_{\Y} ( H J(\Phi)) - H \in \Pos(\Y\otimes\X).$

Proof:

Let $\Z=\X\otimes\Y$. If $\Phi\in\Channel(\X,\Y)$, then the Choi representation $J (\Phi)$ satisfies
$J(\Phi)\in\Pos(\Z) \ \text{and} \ \tr_{\Y} (J(\Phi)=\I_{\Y}.$
Consider the semidefinite program defined by the triple $(\Omega, H, \I_{\X})$, where $H\in\Herm(\Z), \I_{\X}\in\Herm{\X}$, and $\Omega\in\Channel(\Z,\X)$ is defined as $\Omega(Z)=\tr_{\Y}(Z)$ so that $\Omega^*\in\Channel(\X,\Z)$ is given as $\Omega^*(X)=\I_{\Y}\otimes X$. Then the primal and dual problems can be expressed as
\begin{align*} Primal & & & &Dual& \\ &\max\ip{H}{J(\Phi)} & & & &\min\ip{\I_{\X}}{X} \\ \text{subject to:} \ & \tr_{\Y}(J(\Phi), & & &\text{subject to:} \ & \I_{\Y}\otimes X \geq H, \\ & J(\Phi)\in\Pos(\Z) &&&& X\in\Herm(\X) \end{align*}
Define the primal and dual feasible sets $\mathcal{A}$ and $\mathcal{B}$, respectively, as
$\mathcal{A}:=\{Z\in\Pos(\Z) : \Omega(Z)=\I_{\X}\} \ \text{and} \ \mathcal{B}:=\{ X\in\Herm(\X) : \Omega^*(X)\geq H\}.$
Also define the optimate values associated to the primal and dual problems as
$\alpha:=\sup\{\ip{H}{Z} : Z\in\mathcal{A}\} \ \text{and} \ \beta:=\inf\{\ip{\I_{\X}}{X} : X\in\mathcal{B}\}.$

Since there always exists some $\Psi\in\Channel(\X,\Y)$ that is optimal with respect to $H$ as claimed in the problem statement, the primal feasible set is nonempty.Thus, $\alpha$ is finite. Now, consider the spectral decomposition of $H$ and its spectrum of eigenvalues $spec(H)$. Let $\lambda=\max\{spec(H)\}$ be the largest eigenvalue, and consider the operator $\lambda\I_{\X}\in \Herm(\X)$. Then $\Omega^*(\lambda\I_{\X})=\I_{\Y}\otimes\lambda\I_{\X}>H$.  Therefore, strong duality holds by Slater's theorem (Theorem 1.11). This implies that $\alpha=\beta$ and there exists $Z\in\mathcal{A}$ such that $\ip{H}{Z}=\alpha$. Then by complementary slackness (Proposition 1.12), if $Z\in\mathcal{A}$ and $\X\in\mathcal{B}$ satisfy $\ip{H}{Z}=\ip{\I_{\X}}{X}$, it holds that $\Omega^*(X)Z=HZ$.

Now suppose that $\Phi\in\Channel(\X,\Y)$ is optimal with respect to $H$ so that $\ip{H}{J(\Phi)}=\alpha$, and that $\ip{H}{J(\Phi)}=\ip{\I_{\X}}{X}$ for some $X\in\mathcal{B}$.  Then by complementary slackness it follows that
\begin{align*} \Omega^*(X)J(\Phi)&=HJ(\Phi) \\ (\I_{\Y}\otimes X)J(\Phi)&=HJ(\Phi) \\ \tr_{\Y}((\I_{\Y}\otimes X)J(\Phi))&=\tr_{\Y}(HJ(\Phi)) \\ X&=\tr_{\Y}(HJ(\Phi)), \end{align*}
since $\tr_{\Y}(J(\Phi))=\I_{\X}$. Therefore, since $X\in\mathcal{B}$ satisfies $X\in\Herm(\X)$ and $\Omega^*(X)\geq H$. This implies that $\Omega^*(J(\Phi))=\I_{\Y}\otimes \tr_{\Y}(HJ(\Phi))\geq H$. That is, $\I_{\Y}\otimes \tr_{\Y}(HJ(\Phi))-H\geq 0$, or in other words $\I_{\Y}\otimes \tr_{\Y}(HJ(\Phi))-H\in\Pos(\Z)=\Pos(\Y\otimes\X)$.

Suppose instead that $\I_{\Y}\otimes \tr_{\Y}(HJ(\Phi))-H\in\Pos(\Z)=\Pos(\Y\otimes\X)$ holds for some $\Phi\in\Channel(\X,\Y)$. This is equivalent to writing $\I_{\Y}\otimes \tr_{\Y}(HJ(\Phi))\geq H$ or $\Omega^*(\tr_{\Y}(HJ(\Phi))\geq H$. Moreover, since $J(\Phi)\in\Pos(\Z)\subset\Herm(\Z)$ and $H\in\Herm(Z)$, the product $HJ(\Phi)\in\Herm(\Z)$ as well. Also, because $\tr_{\Y}\in\Channel(\Z,\X)$ is Hermiticity-preserving, this implies that $\tr_{\Y}(HJ(\Phi)\in\Herm(\X)$. Hence, $\tr_{\Y}(HJ(\Phi)\in\mathcal{B}$ as it satisfies the conditions for being dual feasible. The quantity $\ip{\I_{\X}}{\tr_{\Y}(HJ(\Phi)}$ therefore places an upper bound on the possible values of $\ip{H}{Z}$ for any primal feasible $Z\in\mathcal{A}$. Thus, $\ip{H}{Z}\leq\ip{\I_{\X}}{\tr_{\Y}(HJ(\Phi)}$. However, observe that
\begin{align*} \ip{\I_{\X}}{\tr_{\Y}(HJ(\Phi)}=\tr_{\X}(\tr_{\Y}(HJ(\Phi))=\tr_{\Y\otimes\X}(HJ(\Phi))=\ip{H}{J(\Phi)}, \end{align*}
which actually implies that $\ip{H}{Z}=\ip{\I_{\X}}{\tr_{\Y}(HJ(\Phi)}$. Hence, it must be the case that $\tr_{\Y}(HJ(\Phi)$ is a solution to the dual problem, and that $J(\Phi)$ is a solution to the primal problem since $J(\Phi)\in\mathcal{A}$ by virtue of $\Phi\in\Channel(\X,\Y)$. Thus, $\Phi\in\Channel(\X,\Y)$ is optimal with respect to $H$.

It has now been shown that $\Phi\in\Channel(\X,\Y)$ is optimal with respect to $H$ if and only if $\I_{\Y}\otimes \tr_{\Y}(HJ(\Phi))-H\in\Pos(\Y\otimes\X)$.

### A lower bound on the trace distance of tensor copies of states

Theorem:
Let $\X$ be a complex Euclidean space, let $\rho_0,\rho_1\in\Density(\X)$ be density operators satisfying
$\bignorm{\rho_0 - \rho_1}_1 \geq \varepsilon$
for $\varepsilon > 0$, and let $n$ be an arbitrary positive integer.
Then
$\Bignorm{\rho_0^{\otimes n} - \rho_1^{\otimes n}}_1 \geq 2 - 2 \exp\biggl(-\frac{n\varepsilon^2}{8}\biggr).$
(The notation $\rho^{\otimes n}$ means $\rho$ tensored with itself $n$ times. For example, $\rho^{\otimes 4} = \rho\otimes\rho\otimes\rho\otimes\rho$.)

Proof:
By the Fuchs-van de Graaf inequalities (Theorem 3.34) we have that the following two statements are equivalent:
$1-\frac{1}{2}\bignorm{\rho_0-\rho_1}_1\leq\fid(\rho_0,\rho_1)\leq\sqrt{1-\frac{1}{4}\bignorm{\rho_0-\rho_1}_1^2},$
$2-2\fid(\rho_0,\rho_1)\leq\bignorm{\rho_0-\rho_1}_1\leq 2\sqrt{1-\fid(\rho_0,\rho_1)^2}.$
Also, by Proposition 3.16, it follows that
$\fid(\rho_0^{\otimes n},\rho_1^{\otimes n})=\fid(\rho_0,\rho_1)^{n}.$
Therefore,
$\fid(\rho_0^{\otimes n},\rho_1^{\otimes n})^2=\fid(\rho_0,\rho_1)^{2n}\leq\left(1-\frac{1}{4}\bignorm{\rho_0-\rho_1}_1^2\right)^n\leq\left(1-\frac{1}{4}\varepsilon^2\right)^n,$
since $\varepsilon\leq\bignorm{\rho_0 - \rho_1}_1$ by assumption.

Since for $x\in\mathbb{R}$ such that $0\leq x\leq1$, and any positive integer $n$,
$(1-x)^n\leq \exp(-nx),$
then for $0\leq\varepsilon\leq2$,
$\left(1-\frac{1}{4}\varepsilon^2\right)^n\leq \exp\biggl(-\frac{n\varepsilon^2}{4}\biggr).$
This implies that
$\fid(\rho_0^{\otimes n},\rho_1^{\otimes n})\leq\left(1-\frac{1}{4}\varepsilon^2\right)^{n/2}\leq\exp\biggl(-\frac{n\varepsilon^2}{8}\biggr).$
However, by the Fuchs-van de Graaf inequalities, since
$2-2\fid(\rho_0^{\otimes n},\rho_1^{\otimes n})\leq\bignorm{\rho_0^{\otimes n}-\rho_1^{\otimes n}}_1,$
the previous result implies
$2-2\exp\biggl(-\frac{n\varepsilon^2}{8}\biggr)\leq2-2\fid(\rho_0^{\otimes n},\rho_1^{\otimes n})\leq\bignorm{\rho_0^{\otimes n}-\rho_1^{\otimes n}}_1,$
which completes the proof.