Entanglement destroying channels

In a previous post, we were concerned with channels of the form $\Phi\in\Channel(\X,\Y)$ such that  $\bigl(\Phi\otimes \I_{\Lin(\Z)}\bigr)(\rho) \in \Sep(\Y:\Z)$ for every complex Euclidean space $\Z$ and every density operator $\rho\in\Density(\X\otimes\Z)$. Channels of this form have the effect of destroying entanglement that exists between the register they act on and any other registers.

Theorem:
There exist two channels $\Phi_0,\Phi_1\in\Channel(\X,\Y)$, both having the property described above, such that
  \[
  \bigtriplenorm{\Phi_0 - \Phi_1}_1
  > \bignorm{\Phi_0(\rho) - \Phi_1(\rho)}_1
  \]
  for every $\rho\in\Density(\X)$. (Channels like this have the strange property that they destroy entanglement, and yet evaluating them on an entangled state helps to distinguish them.)


Proof:

For $\lambda\in[0,1]$, consider the two channels $\Phi_0(X),\Phi_0(X)\in\Channel(X)$ defined by
\[\begin{align*}
\Phi_0(X)&=\frac{\lambda}{n+1}(\tr(X)\I_\X+X^T)+\frac{(1-\lambda)}{n}\tr(X)\I_\X \\
\Phi_1(X)&=\frac{\lambda}{n-1}(\tr(X)\I_\X-X^T)+\frac{(1-\lambda)}{n}\tr(X)\I_\X
\end{align*}\]
Then for sufficiently small $\lambda\in[0,1]$ both of the Choi representations $J(\Phi_0(X))$ and $J(\Phi_1(X))$ are in a separable neighborhood of the maximally mixed state which implies that they are both separable by some theorem. Therefore, from the results in the previous post, we have that $\Phi_0(X)$ and $\Phi_0(X)$ are entanglement destroying as described in the problem statement.

Now considering that
\[
\Phi_0(\rho) - \Phi_1(\rho)=\frac{-2\lambda}{(n+1)(n-1)}\rho^T,
\]
it follows that
\[
\bignorm{\Phi_0(\rho) - \Phi_1(\rho)}_1=\frac{2\lambda}{(n+1)(n-1)}\bignorm{\rho^T}_1=\frac{2\lambda}{(n+1)(n-1)},
\]
since $\rho\in\Density(\X)$.

Moreover, since
\[\begin{align*}
\bigtriplenorm{\Phi_0 - \Phi_1}_1&=max\{\bignorm{((\Phi_0(\rho) - \Phi_1(\rho))\otimes\I_{\Lin(\X)})(xx^\ast)}_1 \ : \ x\in S(\X\otimes\X)\} ,
\end{align*}\]
where $((\Phi_0(\rho) - \Phi_1(\rho))\otimes\I_{\Lin(\X)})(xx^\ast)$ gives the partial transpose of $xx^\ast$ (which is at most $n$ since $x\in S(\X\otimes\X)$) multiplied by the scalar quantity $\frac{2\lambda}{(n+1)(n-1)}$. Therefore $\bigtriplenorm{\Phi_0 - \Phi_1}_1=\frac{2\lambda n}{(n+1)(n-1)}$, which implies that
  \[
  \frac{2\lambda n}{(n+1)(n-1)}=\bigtriplenorm{\Phi_0 - \Phi_1}_1
  > \bignorm{\Phi_0(\rho) - \Phi_1(\rho)}_1=\frac{2\lambda}{(n+1)(n-1)}.
  \]

Bounding the norm of the Choi representation of a channel in terms of its operator norm

Theorem:

Let $\X$ and $\Y$ be complex Euclidean spaces with $\dim(\X) = n$ and let $\Phi\in\Trans(\X,\Y)$. Let $\norm{\cdot}_1$ denote the usual trace norm of an density operator, and $\triplenorm{\cdot}_1$ the operator norm of a channel.
  \[
  \triplenorm{\Phi}_1 \leq \norm{J(\Phi)}_1 \leq n
  \triplenorm{\Phi}_1.  \]

Proof:

Since the Choi representation of $\Phi$ is given by $J(\Phi)=(\Phi\otimes\I_{\Lin(\X)})(\vec(\I_\X)\vec(\I_\X)^\ast)$, then the trace norm is given by and also satisfies
\[\begin{align*}
\norm{J(\Phi)}_1&=\norm{(\Phi\otimes\I_{\Lin(\X)})(\vec(\I_\X)\vec(\I_\X)^\ast)}_1 \\
&\leq \norm{(\Phi\otimes\I_{\Lin(\X)})}_1\norm{\vec(\I_\X)\vec(\I_\X)^\ast}_1 \\
&=\norm{(\Phi\otimes\I_{\Lin(\X)})}_1n \\
&=n  \triplenorm{\Phi}_1,
\end{align*}\]
where the last two lines from $\norm{\vec(\I_\X)\vec(\I_\X)^\ast}_1=n$ and the definition of the completely bounded trace norm $\triplenorm{\Phi}_1 :=\norm{(\Phi\otimes\I_{\Lin(\X)})}_1$.

Now consider an alternate characterization of the completely bounded trace norm:
\[
\triplenorm{\Phi}_1=max\{\norm{(\I_{\Lin(\Y)}\otimes\sqrt{\rho_0})J(\Phi)(\I_{\Lin(\Y)}\otimes\sqrt{\rho_1})}_1 : \rho_0,\rho_1\in\Density(\X)\}.
\]
Since this norm satisfies the property
\[
\norm{(\I_{\Lin(\Y)}\otimes\sqrt{\rho_0})J(\Phi)(\I_{\Lin(\Y)}\otimes\sqrt{\rho_1})}_\infty\leq\norm{\I_{\Lin(\Y)}\otimes\sqrt{\rho_0}}_\infty\norm{J(\Phi)}_1\norm{\I_{\Lin(\Y)}\otimes\sqrt{\rho_1}}_\infty,
\]
and the spectral norm $\norm{A}_\infty$ of an operator $A$ is given by the largest singular value of $A$, then it follows that $\norm{\I_{\Lin(\Y)}\otimes\sqrt{\rho_a}}_\infty\leq1$ with equality holding in the case where $\rho_a$ is a pure state. Therefore,
\[
\norm{(\I_{\Lin(\Y)}\otimes\sqrt{\rho_0})J(\Phi)(\I_{\Lin(\Y)}\otimes\sqrt{\rho_1})}_\infty\leq\norm{J(\Phi)}_1,
\]
implying that
\[
\triplenorm{\Phi}_1=max\{\norm{(\I_{\Lin(\Y)}\otimes\sqrt{\rho_0})J(\Phi)(\I_{\Lin(\Y)}\otimes\sqrt{\rho_1})}_1 : \rho_0,\rho_1\in\Density(\X)\}\leq \norm{J(\Phi)}_1.
\]
Putting these two bounds together then gives
\[
  \triplenorm{\Phi}_1 \leq \norm{J(\Phi)}_1 \leq n
  \triplenorm{\Phi}_1.
\]

Separable channels decrease the entaglement of formation

The entanglement of formation of a density operator $\rho\in\Density(\X^{A}\otimes\X^{B})$ is defined as
  \[
  E_{f}(\rho) = \inf\Biggl\{\sum_{a\in\Sigma} p(a) E(u_a u_a^{\ast})
  \,:\, \rho = \sum_{a\in\Sigma} p(a) u_a u_a^{\ast} \Biggr\},
  \]
  where $E(u u^{\ast}) = S(\tr_{\X^{B}}(u u^{\ast}))$ denotes the entanglement entropy of the pure state $u u^{\ast}$ and the infimum is over all expressions of $\rho$ of the given form, where $\Sigma$ is any alphabet, $p\in\P(\Sigma)$ is a probability vector, and $\{u_a\,:\,a\in\Sigma\} \subset \X^{A}\otimes\X^{B}$ is a collection of unit vectors.

Theorem:


For every choice of complex Euclidean spaces $\X^{A}$, $\X^{B}$, $\Y^{A}$, and $\Y^{B}$, every density operator $\rho\in\Density(\X^{A}\otimes\X^{B})$, and every separable channel $\Phi\in\SepC(\X^{A},\Y^{A}: \X^{B},\Y^{B})$, it holds that
  \[
  E_{f}(\Phi(\rho)) \leq E_{f}(\rho).
  \]

Proof:

Assuming that $\Phi\in\SepC(\X^{A},\Y^{A}: \X^{B},\Y^{B})$ allows $\Phi$ to be expressed as
\[
\Phi(X)=\sum_{b\in\Gamma}(A_b\otimes B_b)X(A_b^\ast\otimes B_b^\ast),
\]
where $\Gamma$ is some alphabet and $\{A_b : b\in \Gamma\}\subset \Pos(\X^A)$ and $\{B_b : b\in \Gamma\}\subset \Pos(\X^B)$. For $\rho = \sum_{a\in\Sigma} p(a) u_a u_a^{\ast}$, the action of $\Phi$ on $\rho$ is specified by the action of $\Phi$ on each $u_au_a^\ast$ as
\[
\Phi(\rho)=\sum_{a\in\Sigma} p(a) \Phi(u_a u_a^{\ast})=\sum_{a\in\Sigma} p(a)\sum_{b\in\Gamma}(A_b\otimes B_b)u_au_a^*(A_b^\ast\otimes B_b^\ast).
\]
Therefore, represent $\Phi(u_au_a*)$ as
\[
\Phi(u_au_a*)=\sum_{b\in\Gamma}(A_b\otimes B_b)u_au_a^*(A_b^\ast\otimes B_b^\ast)=\sum_{b\in\Gamma}q_a(b) v_{ab} v_{ab}^{\ast},
\]
where $(A_b\otimes B_b)u_a=\sqrt{q_a(b)}v_{ab}$. Now let
\[
C_b=\frac{1}{\sqrt{q_a(b)}}(A_b\otimes B_b),
\]
so that $C_bu_au_a^\ast C_b^\ast=v_{ab}v_{ab}$.

Consider the channel $\Psi_{ab}\in\SepC(\X^{A},\Y^{A}: \X^{B},\Y^{B})$ defined by
\[
\Psi_{ab}(X)=C_bXC_b^\ast+(\tr(X)-\tr(C_bXC_b^\ast))\sigma,
\]
for some arbitrary $\sigma\in\Density(\Y^A\otimes\Y^B)$. Then $\Psi_{ab}$ is indeed a channel since it is completely positive because it is defined in terms of $C_b$ and $\Phi$ is assumed to be completely positive. Likewise, $\Psi_{ab}$ is separable since $\Phi$ is separable. Moreover, $\Psi_{ab}$ is trace preserving since
\[\begin{align*}
\tr(\Psi_{ab}(X))&=\tr(C_bXC_b^\ast)+(\tr(X)-\tr(C_bXC_b^\ast))\tr(\sigma) \\
&=\tr(C_bXC_b^\ast)+\tr(X)-\tr(C_bXC_b^\ast) \\
&=\tr(X).
\end{align*}\]
By construction $\Psi_{ab}(u_au_a^\ast)=v_{ab}v_{ab}^\ast$.
 Therefore, by a corollary (6.36) to Nielsen's theorem it follows that for every $a\in\Sigma$ with $\rho_a^A=\tr_{\X^B}(u_au_a^\ast)$ and $\sigma_a^A=\tr_{\X^B}(v_{ab}v_{ab}^\ast)$ and $r=min\{rank(\rho_a^A),rank(\sigma^A_a)\}$ it holds that
\[
\lambda_1(\rho_a^A)+\dots+\lambda_1(\rho_m^A)\leq \lambda_1(\sigma_a^A)+\dots+\lambda_1(\sigma_m^A)
\]
for every $m\in\{1,\dots, r\}$.

Thus, the von Neummann entropy satisfies $S(\sigma_a^A)\leq S(\rho_a^A)$, which implies that the entanglement entropy also satisfies $E(v_{ab}v_{ab}^\ast)\leq E(u_au_a)$ for all $a\in\Sigma$ and $b\in\Gamma$. Then by tracing out system $B$, and taking the weighted average that is described the original state $\rho$ and using the joint convexity of the von Neumann entropy it follows that 
\[\begin{align*}
\sum_{a\in\Sigma} p(a)\sum_{b\in\Gamma}q_a(b) E(v_{ab} v_{ab}^{\ast})
\leq \sum_{a\in\Sigma} p(a) E(u_a u_a^{\ast}).
\end{align*}\]

 Therefore, by definition $E_{f}(\Phi(\rho)) \leq E_{f}(\rho)$.

The SWAP operator and separable measurements

Let $\Sigma$ be an alphabet, let $n = \abs{\Sigma}$, and assume $n\geq 2$. Also let $\X^{A} = \mathbb{C}^{\Sigma}$ and $\X^{B} = \mathbb{C}^{\Sigma}$,  and recall that the swap operator $W\in\Lin(\X^{A}\otimes\X^{B})$ may be defined as
\[
    W = \sum_{a,b\in\Sigma} E_{a,b} \otimes E_{b,a}.
\]
Define $\Pi_0,\,\Pi_1\in\Proj(\X^{A}\otimes\X^{B})$ and $\sigma_0,\sigma_1\in\Density(\X^{A}\otimes\X^{B})$ as follows:
  \[
  \Pi_0 = \frac{1}{2} \I\otimes\I + \frac{1}{2} W,\qquad
  \Pi_1 = \frac{1}{2} \I\otimes\I - \frac{1}{2} W,\qquad
  \sigma_0 = \frac{1}{\binom{n+1}{2}}\Pi_0,\qquad
  \sigma_1 = \frac{1}{\binom{n}{2}}\Pi_1.
  \]

Theorem:

If $\mu:\{0,1\}\rightarrow\Pos(\X^{A}\otimes\X^{B})$ is a separable measurement, then
  \[
  \frac{1}{2} \ip{\mu(0)}{\sigma_0}
  + \frac{1}{2} \ip{\mu(1)}{\sigma_1}
  \leq \frac{1}{2} + \frac{1}{n+1}.
  \]


Proof:

Assuming that $\mu$ is a separable measurement allows $\mu(0)$ to be expressed as
\[
\mu(0)=\sum_{a\in\Gamma}P_a\otimes Q_a,
\]
where $\{P_a : a\in \Gamma\}\subset \Pos(\X^A)$ and $\{Q_a : a\in \Gamma\}\subset \Pos(\X^B)$. Moreover, since $\mu$ is a measurement it must satisfy the completeness condition that $\mu(0)+\mu(1)=\I\otimes\I$ implying that $\mu(1)$ can be expressed in terms of $\mu(0)$ as
\[
\mu(1)=\I\otimes\I-\mu(0)=\I\otimes\I-\sum_{a\in\Gamma}P_a\otimes Q_a.
\]
Write $\sigma_0$ and $\sigma_1$ more explicitly as
\[\begin{align*}
 \sigma_0 &= \frac{1}{\binom{n+1}{2}}\Pi_0=\frac{1}{(n+1)n}(\I\otimes\I + W), \\
 \sigma_1 &= \frac{1}{\binom{n+1}{2}}\Pi_0=\frac{1}{(n-1)n}(\I\otimes\I - W).
\end{align*}\]
Then,
\[\begin{align*}
\ip{\mu(0)}{\sigma_0}
+ \ip{\mu(1)}{\sigma_1}
&= \ip{\mu(0)}{\sigma_0}
+ \ip{\I\otimes\I-\mu(0)}{\sigma_1} \\
&=\frac{1}{(n+1)n}\ip{\mu(0)}{\I\otimes\I + W}
+ \frac{1}{(n-1)n}\ip{\I\otimes\I-\mu(0)}{\I\otimes\I - W} \\
&=\frac{1}{(n+1)n}(\ip{\mu(0)}{\I\otimes\I }+\ip{\mu(0)}{ W}) \\
& \ \ \ + \frac{1}{(n-1)n}(\ip{\I\otimes\I}{\I\otimes\I} -\ip{\I\otimes\I}{W} ) \\
& \ \ \ + \frac{1}{(n-1)n}(\ip{\mu(0)}{W}-\ip{\mu(0)}{\I\otimes\I} ).
\end{align*}\]
Now observe that
\[
\ip{\mu(0)}{\I\otimes\I }=\tr\left(\mu(0)^\ast\I\otimes\I\right)=\tr(\sum_{a\in\Gamma}(P_a\otimes Q_a)(\I\otimes\I))=\tr(\sum_{a\in\Gamma}P_a\otimes Q_a)=\sum_{a\in\Gamma}\tr(P_a)\tr(Q_a),
\]
\[
\ip{\I\otimes\I}{\I\otimes\I}=\tr((\I\otimes\I)(\I\otimes\I))=\tr(\I\otimes\I)=n^2,
\]
\[\begin{align*}
\ip{\mu(0)}{ W}=\tr(\sum_{a\in\Gamma}(P_a\otimes Q_a)W)&=\sum_{a\in\Gamma}\sum_{i,j\in\Sigma}(e_i^\ast\otimes e_j^\ast)(P_a\otimes Q_a)W(e_i\otimes e_j) \\
&=\sum_{a\in\Gamma}\sum_{i,j\in\Sigma}(e_i^\ast\otimes e_j^\ast)(P_a\otimes Q_a)(e_j\otimes e_i) \\
&=\sum_{a\in\Gamma}\sum_{i,j\in\Sigma}(e_i^\ast P_a e_j)\otimes(e_j^\ast Q_a e_i) \\
&=\sum_{a\in\Gamma}\sum_{i,j\in\Sigma}e_i^\ast P_a e_je_j^\ast Q_a e_i \\
&=\sum_{a\in\Gamma}\sum_{i\in\Sigma}e_i^\ast P_a Q_a e_i \\
&=\sum_{a\in\Gamma}\tr(P_a Q_a), \\
\end{align*}\]
and by similar arguments used in the previous calculation $\ip{\I\otimes\I}{W}=\tr(\I\I)=\tr(\I)=n$.
Therefore, the original expression of interest can be simplified as
\[\begin{align*}
\ip{\mu(0)}{\sigma_0}
+ \ip{\mu(1)}{\sigma_1}&=\frac{1}{(n+1)n}(\sum_{a\in\Gamma}\tr(P_a)\tr(Q_a)+\sum_{a\in\Gamma}\tr(P_a Q_a)) \\
& \ \ \ + \frac{1}{(n-1)n}(n^2 -n ) \\
& \ \ \ + \frac{1}{(n-1)n}(\sum_{a\in\Gamma}\tr(P_a Q_a)-\sum_{a\in\Gamma}\tr(P_a)\tr(Q_a) ) \\
&=1+\frac{2}{(n+1)(n-1)n}\left(n\sum_{a\in\Gamma}\tr(P_a Q_a)-\sum_{a\in\Gamma}\tr(P_a)\tr(Q_a)\right) \\
&\leq 1+\frac{2}{(n+1)(n-1)n}(n^2-n) \\
&=1+\frac{2}{(n+1)}.
\end{align*}\]
Here, the inequality follows from the fact that the quantity $\sum_{a\in\Gamma}\tr(P_a)\tr(Q_a)$ is minimized when the projectors $P_a$ and $Q_a$ both have rank $1$ so that $\tr(P_a)\tr(Q_a)=1$ implying that $\sum_{a\in\Gamma}\tr(P_a)\tr(Q_a)=n$.
Furthermore, in this case,  $\sum_{a\in\Gamma}\tr(P_a Q_a)=n$.

Therefore, dividing both sides of the inequality by $2$ gives
\[
  \frac{1}{2} \ip{\mu(0)}{\sigma_0}
  + \frac{1}{2} \ip{\mu(1)}{\sigma_1}
  \leq \frac{1}{2} + \frac{1}{n+1}.
\]

Singular values before and after the action of a unital channel

Let $\X$ be a complex Euclidean space having dimension $n$, let $\Phi\in\Channel(\X)$ be a unital channel, let $X\in\Lin(\X)$ be an operator, and let $Y = \Phi(X)$. Following our usual conventions, let $s_1(X) \geq \cdots \geq s_n(X)$ and $s_1(Y) \geq \cdots \geq s_n(Y)$ denote the singular values of $X$ and $Y$, respectively, ordered from largest to smallest, and where we take $s_k(X) = 0$ when $k > \rank(X)$ and $s_k(Y) = 0$ when $k > \rank(Y)$.

Theorem:

  $s_1(X) + \cdots + s_m(X) \geq s_1(Y) + \cdots + s_m(Y)$ for every $m \in \{1,\ldots,n\}$. 

Proof:

Consider the space $\X\oplus\X$ and let
\[
 \overline{\X}:=
 \begin{pmatrix}
      0 & X\\
      X^{\ast} & 0
    \end{pmatrix}.
\]
Then it holds that $\overline{X}=\overline{X}^\ast$ so that $\overline{X}\in\Herm(\X\oplus\X)$. In addition, consider the channel $\overline{\Phi}\in\Channel(\X\oplus\X)$ defined as
\[
\overline{\Phi} \begin{pmatrix}
      A & B\\
      C & D
    \end{pmatrix}=
    \begin{pmatrix}
      \Phi(A) & \Phi(B)\\
      \Phi(C) & \Phi(D)
    \end{pmatrix}.
\]
Then it follows that
\[
\overline{\Phi}(\I_{\X\oplus\X})     \begin{pmatrix}
      \Phi(\I_{\X}) & 0\\
      0 & \Phi(\I_{\X})
    \end{pmatrix}=
        \begin{pmatrix}
      \I_{\X} & 0\\
      0 & \I_{\X}
    \end{pmatrix}=
    \I_{\X\oplus\X},
\]
 which implies that $\overline{\Phi}$ is unital. Moreover, letting $\Phi(X)=Y$ with $\Phi(X^\ast)=Y^\ast$ yields
\[
 \overline{\Phi}(\overline{X})=
 \begin{pmatrix}
      0&\Phi(X)\\
      \Phi(X^\ast) & 0)
    \end{pmatrix}=
        \begin{pmatrix}
      0 & Y\\
      Y^\ast & 0
    \end{pmatrix}=:\overline{Y}.
\]
 It has now been shown that there exists a unital channel $\overline{\Phi}\in \Trans(\X\oplus\X)$ such that $\overline{\Phi}(\overline{X})=\overline{Y}$, where $\overline{X},\overline{Y}\in\Herm(\X\oplus\X)$. Therefore, by Uhlmann's theorem, this is equivalent to the statement that $\lambda(\overline{Y}) \prec \lambda(\overline{X})$, where $\lambda(\overline{Y})$ and  $\lambda(\overline{X})$ are the vector of eigenvalues of $\overline{Y}$ and $\overline{X}$, respectively.

 In order to determine the singular values of  $\overline{Y}$ and $\overline{X}$, consider the singular value decompositions of $Y$ and $X$:
\[
X=\sum_{k=1}^{r_X}s_k(X)x'_kx_k^\ast \ \ \ \text{and} \ \ \ Y=\sum_{k=1}^{r_Y}s_k(Y)y'_ky_k^\ast,
\]
 where $r_X=\rank(X)$, $r_Y=\rank(Y)$, $s(X)=(s_1(X),\dots,s_{r_X}(X))$ and $s(Y)=(s_1(Y),\dots,s_{r_Y}(Y))$ are the vectors of the non-zero singular values of $X$ and $Y$ (assumed to be written in decreasing order as the index increases), and
\[
 \{x_1,\dots,x_{r_X}\}, \{x'_1,\dots,x'_{r_X}\}\subseteq\X  \ \ \ \text{and} \ \ \ \{y_1\dots y_{r_Y}\}, \{y'_1\dots y'_{r_Y}\}\subseteq\Y
\]
 are orthonormal sets of vectors in their respective spaces.

 Then since the block matrix $\overline{X}$ can be diagonalized as
\[
  \overline{\X}:=
U \begin{pmatrix}
      0 & X\\
      X^{\ast} & 0
    \end{pmatrix}U^\dagger=
 \begin{pmatrix}
      X & 0\\
      0 & -X^{\ast},
    \end{pmatrix}
\]
 with the unitary
\[
U=\frac{1}{\sqrt{2}}\begin{pmatrix}
      \I_\X & \I_\X\\
      \I_\X & -\I_\X
    \end{pmatrix},
\]
the eigenvalues of $\overline{X}$ are given by
\[
 \lambda(\X)=\{s_1(X),\dots,s_{r_X}(X),-s_{r_X}(X),\dots,-s_1(X)),
\]
 where here they have been arranged in decreasing order. An equivalent argument shows that the the eigenvalues of $\overline{Y}$ are similarly given by
\[
 \lambda(\Y)=\{s_1(Y),\dots,s_{r_Y}(Y),-s_{r_Y}(Y),\dots,-s_1(Y)).
\]

However the singular values of $\overline{X}$ and $\overline{Y}$ are related to the eigenvalues via the absolute value. Therefore the singular values $\overline{s}(X)$ of $\overline{X}$ and $\overline{s}(Y)$ of $\overline{Y}$ are positive and there are at least two equal values for each $s_k$. That is,
\[\begin{align*}
\overline{s}(X)&=(s_1(X),s_1(X),\dots,s_{r_X}(X),s_{r_X}(X),\dots) \\
\overline{s}(Y)&=(s_1(Y),s_1(Y),\dots,s_{r_Y}(Y),s_{r_Y}(Y),\dots) ,
\end{align*}\]
where all values $s_{j}(X)$ and $s_{k}(Y)$for $j\geq r_X$ and $k
\geq r_Y$ are assumed to be zero by the convention described in the problem statement.

Then it follows that for all $k\in\{1,\dots, n\}$,
\[
s_1(X)+s_1(X)+\dots+s_{k}(X)+s_{k}(X)\geq s_1(Y)+s_1(Y)+\dots+s_{k}(Y)+s_{k}(Y),
\]
or equivalently that
\[
s_1(X)\dots+s_{k}(X)\geq s_1(Y)+\dots+s_{k}(Y).
\]

When the Choi representation of a channel is seperable

Let $\X$ and $\Y$ be complex Euclidean spaces, and let $\Phi\in\Channel(\X,\Y)$ be a channel. A positive operator $P\in \Pos(\X\otimes\Y)$ is separable if and only if there exists a positive integer $m$ and positive semi definite operators
\[
Q_1,Q_2, \dots, Q_m\in\Pos(\X) \ \ \text{and} \ \ R_1,R_2, \dots, R_m\in\Pos(\Y)
\]
such that
\[
P=\SUM{j=1}{m}Q_j\otimes R_j
\]
Denote by $\Sep(\X : \Y)$ the collection of all such separable operators.

Theorem:

The following two properties are equivalent:
  1. For every complex Euclidean space $\Z$ and everydensity operator $\rho\in\Density(\X\otimes\Z)$, it holds that $\bigl(\Phi\otimes \I_{\Lin(\Z)}\bigr)(\rho) \in \Sep(\Y:\Z)$.
  2. $J(\Phi) \in \Sep(\Y:\X)$.

Proof:

Recall that the Choi representation $J(\Phi)$ can be expressed as
\[
J(\Phi)=(\Phi\otimes\I_{\Lin(\X)})(\vec(\I_{\X})\vec(\I_{\X})^\ast).
\]
Now first assume that property 1 holds, and let $\Z=\X=\mathbb{C}^\Sigma$ and consider the density operator $\rho\in\Density(\X\otimes\X)$ given by
\[
\rho=\frac{1}{|\Sigma|}(\vec(\I_\X)\vec(\I_\X)^\ast).
\]
The assumption of property 1 then reads
\[
(\Phi\otimes\I_{\Lin(\X)})(\vec(\I_\X)\vec(\I_\X)^\ast)\in\Sep(\Y:\X),
\]
implying that $J(\Phi)\in\Sep(\Y : \X)$, which is the claim of property 2

Instead, now assume that property 2 holds so that 2: $J(\Phi) \in \Sep(\Y:\X)$. Then by the Woronowicz-Horodecki criterion this statement is equivalent to one where for every complex Euclidean space $\Z$ and every positive map $\Xi\in\Trans(\Y,\Z)$
\[
(\Xi\otimes\I_{\Lin(\X)})(J(\Phi))\in\Pos(\Z\otimes\X).
\]

Substituting the expression recalled above for $J(\Phi)$ then gives
\[\begin{align*}
(\Xi\otimes\I_{\Lin(\X)})(J(\Phi))&=(\Xi\otimes\I_{\Lin(\X)})(\Phi\otimes\I_{\Lin(\X)})(\vec(\I_{\X})\vec(\I_{\X})^\ast) \\
&=(\Xi(\Phi)\otimes\I_{\Lin(\X)})(\vec(\I_{\X})\vec(\I_{\X}^\ast))\\
&=J(\Xi(\Phi)).
\end{align*}\]

Hence, $J(\Xi(\Phi))\in\Pos(\Z\otimes\X)$. This implies that there exists a complex Euclidean space $\W$ an an operator $A\in\Lin(\X,\Z\otimes\W)$ such that
\[
\Xi(\Phi)(X)=\tr_{|W}(AXA^\ast)
\]
for all $X\in\Lin(\X)$. Consider any $\rho\Density(\X\otimes\Z)$. Then
\[
(\Xi\otimes\I_{\Lin(\X)})(\Phi\otimes\I_{\Lin(\X)})(\vec(\I_{\X})\vec(\I_{\X})^\ast)(\rho)\in\Pos(\Z\otimes\X),
\]
 which should imply that $(\Phi\otimes\I_{L(\Z)})(\rho)\in\Sep(\Y:\Z)$ by the Woronowicz-Horodecki criterion.

Some more facts concerning the von Neumann entropy

Let $\reg{X}$, $\reg{Y}$, and $\reg{Z}$ be registers, assume that the classical state set of $\reg{X}$ is $\Sigma$, and let $n = \abs{\Sigma}$.

Theorem:
For every state $\rho\in\Density(\X\otimes\Y\otimes\Z)$ of $(\reg{X},\reg{Y},\reg{Z})$ it holds that
\[
      S(\reg{X},\reg{Y} : \reg{Z})
      \leq S(\reg{Y}:\reg{X},\reg{Z}) + 2\log(n).
\]
     

Proof:
 
    From the result proved in a previous post, it holds that for every choice of registers $\reg{X}$ and $\reg{Z}$, and for any state of $\Density(\X\otimes\Z)$, $S(\reg{Z})\leq S(\reg{X})+S(\reg{X}, \reg{Z})$, or equivalently that
\[
     0\leq S(\reg{X})+S(\reg{X}, \reg{Z})-S(\reg{Z}).
\]
 Also, by sub-additivity $S(\reg{X},\reg{Y})\leq S(\reg{X})+S(\reg{Y})$, or equivalently
\[
 0\leq S(\reg{X})+S(\reg{Y})-S(\reg{X},\reg{Y}).
\]
 Then by adding these two inequalities, it must also hold that
\[
  0\leq S(\reg{X})+S(\reg{X}, \reg{Z})-S(\reg{Z})+S(\reg{X})+S(\reg{Y})-S(\reg{X},\reg{Y}),
\]
and since in general $S(\reg{X})\leq \log(n)$ or $2S(\reg{X})\leq 2\log(n)$,
\[
   0\leq S(\reg{X}, \reg{Z})-S(\reg{Z})+S(\reg{Y})-S(\reg{X},\reg{Y})+2\log(n).
\]
 Therefore,
\[
S(\reg{Z})+S(\reg{X},\reg{Y})\leq S(\reg{X}, \reg{Z})+S(\reg{Y})+2\log(n),
\]
 Adding $-S(\reg{X},\reg{Y}, \reg{Z})$ to both sides of this inequality yields
\[
S(\reg{Z})+S(\reg{X},\reg{Y})-S(\reg{X},\reg{Y}, \reg{Z})\leq S(\reg{X}, \reg{Z})+S(\reg{Y})-S(\reg{X},\reg{Y}, \reg{Z})+2\log(n),
\]
 or equivalently
\[
 S(\reg{X},\reg{Y} : \reg{Z})\leq S(\reg{Y}:\reg{X},\reg{Z}) + 2\log(n).
\]



Here is an example, for $\Sigma = \{0,1\}$, of a state $\rho$ for which this inequality becomes an equality.

Consider the three qubit pure state
\[
\left|\psi\right>_{\reg{X},\reg{Y},\reg{Z}}=\frac{1}{\sqrt{2}}(\left|0\right>_{\reg{X}}\left|0\right>_{\reg{Y}}\left|0\right>_{\reg{Z}}+\left|1\right>_{\reg{X}}\left|0\right>_{\reg{Y}}\left|1\right>_{\reg{Z}}).
\]
Then the states of  the following particular subystems are also pure :
\[\begin{align*}
\left|\psi\right>_{\reg{Y}}&=\left|0\right>_{\reg{Y}}\\
\left|\psi\right>_{\reg{X},\reg{Z}}&=\frac{1}{\sqrt{2}}(\left|0\right>_{\reg{X}}\left|0\right>_{\reg{Z}}+\left|1\right>_{\reg{X}}\left|1\right>_{\reg{Z}}).
\end{align*}\]

However, the following subsystems are in the maximally mixed state:
\[\begin{align*}
\rho_{\reg{X}}=\frac{1}{2}(\left|0\right>\left<0\right|+\left|1\right>\left<1\right|) \\
\rho_{\reg{Z}}=\frac{1}{2}(\left|0\right>\left<0\right|+\left|1\right>\left<1\right|).
\end{align*}\]
Moreover, the state of the subsystem $\reg{X},\reg{Y}$ is in the tensor product state
\[\begin{align*}
\rho_{\reg{X},\reg{Y}}&=\rho_{\reg{X}}\otimes \rho_{\reg{Y}}\\
&=\frac{1}{2}\bigl(\left|0\right>\left<0\right|+\left|1\right>\left<1\right|\bigr)\otimes \left|0\right>\left<0\right|
\end{align*}\]

The entropy of a pure state is zero and the entropy of a maximally entangled state in this case is $\log(n)=\log(2)$. Then the entropies of the states listed above are
\[\begin{align*}
S(\reg{Y})=S(\reg{X},\reg{Z})&=0 \\
S(\reg{X})=S(\reg{Z})&=\log(2) \\
S(\reg{X},\reg{Y})=S(\reg{X})+S(\reg{Y})&=\log(2).
\end{align*} \]

Therefore,
\[\begin{align*}
S(\reg{X},\reg{Y} : \reg{Z}) - S(\reg{Y}:\reg{X},\reg{Z})&=S(\reg{X},\reg{Y})-S(\reg{Y})-S(\reg{X},\reg{Z})+S(\reg{Z})+\left(S(\reg{X},\reg{Y},\reg{Z})-S(\reg{X},\reg{Y},\reg{Z})\right) \\
&=S(\reg{X},\reg{Y})-S(\reg{Y})-S(\reg{X},\reg{Z})+S(\reg{Z}) \\
&=S(\reg{X}+S(\reg{Y})-S(\reg{Y})-S(\reg{X},\reg{Z})+S(\reg{Z}) \\
&=S(\reg{X}-S(\reg{X},\reg{Z})+S(\reg{Z}) \\
&=\log(2)-0+\log(2) \\
&=2\log(2),
\end{align*}\]
which implies that
\[
S(\reg{X},\reg{Y} : \reg{Z}) = S(\reg{Y}:\reg{X},\reg{Z})+2\log(2).
\]

Theorem:

Let $p\in\P(\Sigma)$ be a probability vector, let $\{\sigma_a\,:\,a\in\Sigma\} \subset \Density(\Y\otimes\Z)$ be a collection of density operators, and let
\[
      \rho = \sum_{a\in \Sigma} p(a) E_{a,a}\otimes \sigma_a.
\]
In other words, $\rho$ is a state of $(\reg{X},\reg{Y},\reg{Z})$ in which we view $\reg{X}$ as a classical register. With respect to the state $\rho$, it holds that
\[
      S(\reg{X},\reg{Y} : \reg{Z})
      \leq S(\reg{Y}:\reg{X},\reg{Z}) + \log(n).
\]
   

Proof:

First, observe that
\[\begin{align*}
S(\reg{X}|\reg{Y})-S(\reg{X}|\reg{Z})&=S(\reg{X},\reg{Y})-S(\reg{Y})-S(\reg{X},\reg{Z})+S(\reg{Z})\\
&=S(\reg{X},\reg{Y})-S(\reg{Y})-S(\reg{X},\reg{Z})+S(\reg{Z})+\left(S(\reg{X},\reg{Y},\reg{Z})-S(\reg{X},\reg{Y},\reg{Z})\right) \\
&=S(\reg{X},\reg{Y} : \reg{Z}) - S(\reg{Y}:\reg{X},\reg{Z}).
\end{align*}\]

Now consider the individual bounds on the quantities $S(\reg{X}|\reg{Y})$ and $S(\reg{X}|\reg{Z})$ in order to infer a bound on the difference $S(\reg{X}|\reg{Y})-S(\reg{X}|\reg{Z})$. In this case, since the state of register $\reg{X}$ is classical the conditional entropies  are at most $S(\reg{X}|\reg{Y})\leq \log(n)$ and likewise $S(\reg{X}|\reg{Z})\leq \log(n)$. On the contrary, it could be the case that $S(\reg{X}|\reg{Y})\leq 0$ or $S(\reg{X}|\reg{Z})\leq 0$ in the presence of stronger entanglement correlations in which case $S(\reg{Y})\leq S(\reg{X},\reg{Y})$ or $S(\reg{Z})\leq S(\reg{X},\reg{Z})$. Therefore, the largest the difference of the two could be is when $S(\reg{X}|\reg{Y})=\log(n)$ and $S(\reg{X}|\reg{Z})=0$. Hence, $S(\reg{X}|\reg{Y})-S(\reg{X}|\reg{Z})\leq \log(n)$, or equivalently $S(\reg{X},\reg{Y} : \reg{Z})-S(\reg{Y}:\reg{X},\reg{Z}) \leq \log(n)$ implying that  $S(\reg{X},\reg{Y} : \reg{Z}) \leq S(\reg{Y}:\reg{X},\reg{Z}) + \log(n)$.

Some facts concerning the von Neumann entropy and quantum mutual information

Here we'll prove some facts concerning the von Neumann entropy and quantum mutual information.

Let $\X$ be an $n$-dimensional complex Euclidean space, and let $\rho\in\Density(\X)$ be a density operator. Recall that the von Neumann entropy of $\rho$ is defined as
\[
S(\rho):=-\tr(\rho \ \text{log}(\rho)),
\]
or equivalently as
\[
S(\rho):=H(\lambda(\rho)),
\]
where $\lambda(\rho)=(\lambda_1(\rho),\lambda_2(\rho),\dots,\lambda_n(\rho))$ is the vector of eigenvalues of $\rho$, and
\[
H(p):=\sum_{a\in\Sigma}-p(a)\log(p(a))),
\]
is the classical Shannon entropy of a vector $p\in\mathbb{R}^{\Sigma}$ over some alphabet $\Sigma$.

Theorem:

For every choice of complex Euclidean spaces $\X$ and $\Y$, and every vector $u \in \X\otimes\Y$, it holds that $S(\tr_{\X}(u u^{\ast})) = S(\tr_{\Y}(u u^{\ast}))$.
   
Proof:

 The vector $u\in\X\otimes\Y$ can be expressed in its Schmidt decomposition after making the unique identification $u=vec(A)$ as
\[
 u=\sum_{k=1}^{r}s_kx_k\otimes y_k,
\]
 where $r=rank(A)$, $0\leq s_1,\dots, s_r\in\mathbb{R}$ are the singular values, and $\{x_1,\dots,x_r\}\subset\X$ and $\{y_1\dots y_r\}\subseteq\Y$ are orthonormal sets. Then
\[
uu^\ast=\sum_{j,k=1}^{r}s_js_kx_jx_k^\ast\otimes y_jy_k^\ast,
\]
and therefore
\[
\tr_{\X}(uu^\ast)=\sum_{k=1}^{r}s_k^2x_kx_k^\ast \ \  \ \ \text{and} \ \ \ \tr_{\Y}(uu^\ast)=\sum_{k=1}^{r}s_k^2y_ky_k^\ast.
\]
Now let $\lambda=(s_1^2,\dots,s_r^2)$, and observe that $\lambda$ is the vector of  non-zero eigenvalues of both $\tr_{\X}(uu^\ast)$ and $\tr_{\Y}(uu^\ast)$ since they are implicitly expressed in their own Schmidt decompositions above.

Hence, (by definition) the von Neumann entropy of each is 
\[
S(\tr_{\X}(u u^{\ast})) =H(\lambda) = S(\tr_{\Y}(u u^{\ast})).
\]


Theorem:

For every choice of registers $\reg{X}$ and $\reg{Y}$, and for every state $\rho\in\Density(\X\otimes\Y)$ of these registers, it holds that $S(\reg{X}) \leq S(\reg{Y}) + S(\reg{X},\reg{Y})$.}

Proof:
    
Choose a complex Euclidean space $\Z$ such that $\dim(\Z)\geq\rank(\rho)$ so that there exists a purification $\rho'=uu^\ast\in D(\X\otimes\Y\otimes\Z)$, and then let $\rho'$ be the joint state of the registers $\reg{X},\reg{Y},\reg{Z}$. Now consider the following. Since $\rho'$ is a pure state $S(\reg{X},\reg{Y}, \reg{Z})=0$. Moreover, $\rho'[\reg{X},\reg{Z}]=\tr_{\Y}(\rho')$ and $\rho'[\reg{Y}]=\tr_{\X\otimes\Z}(\rho')$, but since $\rho'=uu^\ast$ is a pure state the result of part (a) implies that $S(\tr_{\Y}(\rho'))=S(\tr_{\X\otimes\Z}(\rho'))$ or equivalently that $S(\reg{Y}) = S(\reg{X},\reg{Z})$.
    
By strong sub-additivity, for any possible state of the registers $\reg{X},\reg{Y}, \reg{Z}$,
\[
S(\reg{X},\reg{Y}, \reg{Z})+S(\reg{X})\leq S(\reg{X}, \reg{Z})+S(\reg{X}, \reg{Y}).
\]
However, by previous considerations we have that $S(\reg{X},\reg{Y}, \reg{Z})=0$ and $S(\reg{Y}) = S(\reg{X},\reg{Z})$, which after substituting implies that
\[
S(\reg{X})\leq S(\reg{Y})+S(\reg{X}, \reg{Y}).
\]


Theorem:

Let $\reg{X}$ and $\reg{Y}$ be registers, let $\Sigma$ be an alphabet, let $p\in\P(\Sigma)$ be a probability vector, and let $\{\sigma_a\,:\,a\in\Sigma\}\subset\Density(\X)$ and $\{\xi_a\,:\,a\in\Sigma\}\subset\Density(\Y)$ be arbitrary collections of density operators. For $(\reg{X},\reg{Y})$ being in the state
\[
      \rho = \sum_{a\in\Sigma} \, p(a) \sigma_a\otimes\xi_a,
\]
it holds that $S(\reg{X} : \reg{Y}) \leq H(p)$.
   
Proof:

In this case, the relative state of the two registers is given by
\[
\rho[\reg{X}]=\tr_\Y(\rho)=\sum_{a\in\Sigma}p(a) \sigma_a \ \ \ \text{and} \ \ \ \rho[\reg{Y}]=\tr_\X(\rho)=\sum_{a\in\Sigma}p(a) \xi_a.
\]
so that
\[\begin{align*}
\rho[\reg{X}]\otimes\rho[\reg{Y}]&=\left(\sum_{a\in\Sigma}p(a) \sigma_a\right)\otimes\left(\sum_{b\in\Sigma}p(b) \xi_b\right) \\
&=\sum_{a\in\Sigma}\sum_{b\in\Sigma}p(a)p(b) \sigma_a\otimes \xi_b.
\end{align*}\]

Then the mutual information $S(\reg{X} : \reg{Y})$ can be expressed as
\[\begin{align*}
S(\reg{X} : \reg{Y})&=S(\rho||\rho[\reg{X}]\otimes\rho[\reg{Y}]) \\
&=S\left( \sum_{a\in\Sigma} \, p(a) \sigma_a\otimes\xi_a || \sum_{a\in\Sigma}\sum_{b\in\Sigma}p(a)p(b) \sigma_a\otimes \xi_b\right) \\
&\leq \sum_{a\in\Sigma}S\left(  p(a) \sigma_a\otimes\xi_a || \sum_{b\in\Sigma}p(a)p(b) \sigma_a\otimes \xi_b\right) \\
&=\sum_{a\in\Sigma}S\left(  p(a) \sigma_a\otimes\xi_a || \, p(a) \sigma_a\otimes \sum_{b\in\Sigma}p(b)\xi_b\right) \\
&=\sum_{a\in\Sigma}\left(\tr(\xi_a)S(p(a)\sigma_a || p(a)\sigma_a) + \tr(p(a)\sigma_a)S(\xi_a || \sum_{b\in\Sigma}p(b)\xi_b)  \right),
\end{align*}\]
but since $S(p(a)\sigma_a || p(a)\sigma_a)=0$ and for $\sigma_a\in\Density(\X)$ it is always the case that $\tr(\sigma_a)=1$, it follows that
\[\begin{align*}
S(\reg{X} : \reg{Y})\leq &\sum_{a\in\Sigma} p(a)S\left(\xi_a || \sum_{b\in\Sigma}p(b)\xi_b\right) \\
=&\sum_{a\in\Sigma} p(a)S\left(\frac{p(a)}{p(a)}\xi_a || \sum_{b\in\Sigma}p(b)\xi_b\right) \\
=&\sum_{a\in\Sigma}p(a)\tr\left( \xi_a\log\left(\frac{p(a)}{p(a)}\xi_a\right) - \xi_a\log\left(\sum_{b\in\Sigma}p(b)\xi_b\right)\right) \\
=&\sum_{a\in\Sigma}p(a)\tr\left(-\xi_a\log(p(a))+ \xi_a\log\left(p(a)\xi_a\right) - \xi_a\log\left(\sum_{b\in\Sigma}p(b)\xi_b\right)\right) \\
=&\sum_{a\in\Sigma}p(a)\tr(-\xi_a\log(p(a)))+p(a)\tr\left( \xi_a\log\left(p(a)\xi_a\right) - \xi_a\log\left(\sum_{b\in\Sigma}p(b)\xi_b\right)\right) \\
=&\sum_{a\in\Sigma}-p(a)\log(p(a)))+p(a)\tr\left( \xi_a\log\left(p(a)\xi_a\right) - \xi_a\log\left(\sum_{b\in\Sigma}p(b)\xi_b\right)\right) \\
=&H(p)+c.
\end{align*}\]
Here, the Shannon entropy is by definition
\[
H(p)=\sum_{a\in\Sigma}-p(a)\log(p(a))),
\]
and the value $c$ has been introduced for convenience to represent the remaining quantity
\[
c:=\sum_{a\in\Sigma}p(a)\tr\left( \xi_a\log\left(p(a)\xi_a\right) - \xi_a\log\left(\sum_{b\in\Sigma}p(b)\xi_b\right)\right).
\]

In general, by the monoticity of the logarithmic function for $0\leq a,b\in \mathbb{R}$ it is the case that $\log(a)\leq(a+b)$. This then implies that

\[
\sum_{a\in\Sigma}p(a)\tr\left( \xi_a\log\left(p(a)\xi_a\right) \leq \xi_a\log\left(\sum_{b\in\Sigma}p(b)\xi_b\right)\right),
\]
 so that $c\leq 0$.

 Hence $S(\reg{X} : \reg{Y})\leq H(p)+c\leq H(p)$.




Bounding the quantum relative entropy in terms of the classical relative entropy

Theorem:

Let $\X$ be a complex Euclidean space, let $\Sigma$ be an alphabet, let $p,q\in\P(\Sigma)$ be probability vectors, and let $\{\rho_a\,:\,a\in\Sigma\}\subset\Density(\X)$ and $\{\sigma_a\,:\,a\in\Sigma\}\subset\Density(\X)$ be collections of density operators indexed by $\Sigma$. Assume that $\im(\rho_a)\subseteq\im(\sigma_a)$, $p(a)>0$, and $q(a) > 0$ for all $a\in\Sigma$. For two positive definite operators $P$ and $Q$ acting on $\X$, denote the quantum relative entropy as
\[
S(P ||   Q )=\tr(P \ \text{log}(P))-\tr(P \ \text{log}(Q))
\]

Then
\[
    S\Biggl(\sum_{a\in\Sigma} p(a) \rho_a \Bigg\|
    \sum_{a\in\Sigma} q(a) \sigma_a \Biggr)
    \leq \sum_{a\in\Sigma} p(a) S(\rho_a \| \sigma_a) + D(p \| q),\]
where
 \[
  D(p \| q):=\sum_{a\in\Sigma}\Bigl(p(a)\text{log}\Bigl(\frac{p(a)}{q(a)}\Bigr).
\]
is the classical relative entropy of two probability vectors $p,q\in\P(\Sigma)$.


Proof:

Consider the following fact, which states that for a complex Euclidean space $\X$ and operators $P_0,P_1,Q_0,Q_1\in \Pos(\X)$,
\[
S(P_0+P_1\| Q_0+Q_1)\leq S(P_0 \|Q_0)+S( P_1 \| Q_1).
\]
Therefore,
\[
    S\Biggl(\sum_{a\in\Sigma} p(a) \rho_a \Bigg\|
    \sum_{a\in\Sigma} q(a) \sigma_a \Biggr)
    \leq \sum_{a\in\Sigma}S\left( p(a) \rho_a \|
  q(a) \sigma_a \right).
\]
Now as a consequence, for $P,Q\in\Pos{\X}$ and scalars $\alpha,\beta\in(0,\infty)$
\[
S(\alpha P \| \beta Q)=\alpha S(P\|Q)+\alpha \text{log}(\alpha/\beta)\tr(P).
\]
Thus,
\[ \begin{align*}
    S\Biggl(\sum_{a\in\Sigma} p(a) \rho_a \Bigg\|
    \sum_{a\in\Sigma} q(a) \sigma_a \Biggr)
   & \leq \sum_{a\in\Sigma}S\left( p(a) \rho_a \|
  q(a) \sigma_a \right) \\
  &= \sum_{a\in\Sigma} \Bigl(p(a)S(\rho_a \| \sigma_a)+p(a)\text{log}\Bigl(\frac{p(a)}{q(a)}\Bigr)\tr(\rho_a) \Bigr) \\
  &=\sum_{a\in\Sigma} \Bigl(p(a)S(\rho_a \| \sigma_a)\Bigr)+\sum_{a\in\Sigma}\Bigl(p(a)\text{log}\Bigl(\frac{p(a)}{q(a)}\Bigr) \Bigr),
  \end{align*}\]
  since $\rho_a\in\Density(\X)$ implies that $\tr(\rho_a)=1$. Also, by definition of the relative entropy of two probability vectors $p,q\in\P(\Sigma)$,
 \[
  D(p \| q):=\sum_{a\in\Sigma}\Bigl(p(a)\text{log}\Bigl(\frac{p(a)}{q(a)}\Bigr).
\]
  Hence,
\[
 \sum_{a\in\Sigma} \Bigl(p(a)S(\rho_a \| \sigma_a)\Bigr)+\sum_{a\in\Sigma}\Bigl(p(a)\text{log}\Bigl(\frac{p(a)}{q(a)}\Bigr) \Bigr)=\sum_{a\in\Sigma} \Bigl(p(a)S(\rho_a \| \sigma_a)\Bigr)+   D(p \| q) \Bigr),
\]
  which implies
\[
    S\Biggl(\sum_{a\in\Sigma} p(a) \rho_a \Bigg\|
    \sum_{a\in\Sigma} q(a) \sigma_a \Biggr)
    \leq \sum_{a\in\Sigma} p(a) S(\rho_a \| \sigma_a) + D(p \| q).
\]

When a channel is "optimal"

  Let $\X$ and $\Y$ be complex Euclidean spaces and let $H\in\Herm(\Y\otimes\X)$ be an arbitrary Hermitian operator. Consider the problem of maximizing the value
\[
    \ip{H}{J(\Phi)}
\]
  over all choices of a channel $\Phi\in\Channel(\X,\Y)$.

  One may observe that there must always exist at least one choice of a channel $\Psi\in\Channel(\X,\Y)$ such that
\[
    \ip{H}{J(\Psi)} = \sup\{\ip{H}{J(\Phi)}\,:\,\Phi\in\Channel(\X,\Y)\},
\]
  by virtue of the fact that $\Channel(\X,\Y)$ is a compact set and $\Phi\mapsto\ip{H}{J(\Phi)}$ is a continuous function. For any channel $\Psi\in\Channel(\X,\Y)$ satisfying the identity above, let us say that $\Psi$ is optimal with respect to $H$.

Theorem:

 $\Phi\in\Channel(\X,\Y)$ is optimal with respect to $H$ if and only if
\[
    \I_{\Y} \otimes \tr_{\Y} ( H J(\Phi)) - H \in \Pos(\Y\otimes\X).
\]


Proof:

Let $\Z=\X\otimes\Y$. If $\Phi\in\Channel(\X,\Y)$, then the Choi representation $J (\Phi)$ satisfies
\[
 J(\Phi)\in\Pos(\Z) \ \text{and} \ \tr_{\Y} (J(\Phi)=\I_{\Y}.
\]
 Consider the semidefinite program defined by the triple $(\Omega, H, \I_{\X})$, where $H\in\Herm(\Z), \I_{\X}\in\Herm{\X}$, and $\Omega\in\Channel(\Z,\X)$ is defined as $\Omega(Z)=\tr_{\Y}(Z)$ so that $\Omega^*\in\Channel(\X,\Z)$ is given as $\Omega^*(X)=\I_{\Y}\otimes X$. Then the primal and dual problems can be expressed as
\[ \begin{align*}
 Primal & & & &Dual& \\
 &\max\ip{H}{J(\Phi)} & & & &\min\ip{\I_{\X}}{X} \\
 \text{subject to:} \ &  \tr_{\Y}(J(\Phi), & & &\text{subject to:} \ & \I_{\Y}\otimes X \geq H, \\
 & J(\Phi)\in\Pos(\Z)  &&&& X\in\Herm(\X)
 \end{align*}\]
 Define the primal and dual feasible sets $\mathcal{A}$ and $\mathcal{B}$, respectively, as
\[
\mathcal{A}:=\{Z\in\Pos(\Z) : \Omega(Z)=\I_{\X}\} \ \text{and} \ \mathcal{B}:=\{ X\in\Herm(\X) : \Omega^*(X)\geq H\}.
\]
 Also define the optimate values associated to the primal and dual problems as
\[
 \alpha:=\sup\{\ip{H}{Z} : Z\in\mathcal{A}\} \ \text{and} \ \beta:=\inf\{\ip{\I_{\X}}{X} : X\in\mathcal{B}\}.
\]

 Since there always exists some $\Psi\in\Channel(\X,\Y)$ that is optimal with respect to $H$ as claimed in the problem statement, the primal feasible set is nonempty.Thus, $\alpha$ is finite. Now, consider the spectral decomposition of $H$ and its spectrum of eigenvalues $spec(H)$. Let $\lambda=\max\{spec(H)\}$ be the largest eigenvalue, and consider the operator $\lambda\I_{\X}\in \Herm(\X)$. Then $\Omega^*(\lambda\I_{\X})=\I_{\Y}\otimes\lambda\I_{\X}>H$.  Therefore, strong duality holds by Slater's theorem (Theorem 1.11). This implies that $\alpha=\beta$ and there exists $Z\in\mathcal{A}$ such that $\ip{H}{Z}=\alpha$. Then by complementary slackness (Proposition 1.12), if $Z\in\mathcal{A}$ and $\X\in\mathcal{B}$ satisfy $\ip{H}{Z}=\ip{\I_{\X}}{X}$, it holds that $\Omega^*(X)Z=HZ$.

 Now suppose that $\Phi\in\Channel(\X,\Y)$ is optimal with respect to $H$ so that $\ip{H}{J(\Phi)}=\alpha$, and that $\ip{H}{J(\Phi)}=\ip{\I_{\X}}{X}$ for some $X\in\mathcal{B}$.  Then by complementary slackness it follows that
\[ \begin{align*}
 \Omega^*(X)J(\Phi)&=HJ(\Phi) \\
 (\I_{\Y}\otimes X)J(\Phi)&=HJ(\Phi) \\
 \tr_{\Y}((\I_{\Y}\otimes X)J(\Phi))&=\tr_{\Y}(HJ(\Phi)) \\
 X&=\tr_{\Y}(HJ(\Phi)),
 \end{align*}\]
 since $\tr_{\Y}(J(\Phi))=\I_{\X}$. Therefore, since $X\in\mathcal{B}$ satisfies $X\in\Herm(\X)$ and $\Omega^*(X)\geq H$. This implies that $\Omega^*(J(\Phi))=\I_{\Y}\otimes \tr_{\Y}(HJ(\Phi))\geq H$. That is, $\I_{\Y}\otimes \tr_{\Y}(HJ(\Phi))-H\geq 0$, or in other words $\I_{\Y}\otimes \tr_{\Y}(HJ(\Phi))-H\in\Pos(\Z)=\Pos(\Y\otimes\X)$.

 Suppose instead that $\I_{\Y}\otimes \tr_{\Y}(HJ(\Phi))-H\in\Pos(\Z)=\Pos(\Y\otimes\X)$ holds for some $\Phi\in\Channel(\X,\Y)$. This is equivalent to writing $\I_{\Y}\otimes \tr_{\Y}(HJ(\Phi))\geq H$ or $\Omega^*(\tr_{\Y}(HJ(\Phi))\geq H$. Moreover, since $J(\Phi)\in\Pos(\Z)\subset\Herm(\Z)$ and $H\in\Herm(Z)$, the product $HJ(\Phi)\in\Herm(\Z)$ as well. Also, because $\tr_{\Y}\in\Channel(\Z,\X)$ is Hermiticity-preserving, this implies that $\tr_{\Y}(HJ(\Phi)\in\Herm(\X)$. Hence, $\tr_{\Y}(HJ(\Phi)\in\mathcal{B}$ as it satisfies the conditions for being dual feasible. The quantity $\ip{\I_{\X}}{\tr_{\Y}(HJ(\Phi)}$ therefore places an upper bound on the possible values of $\ip{H}{Z}$ for any primal feasible $Z\in\mathcal{A}$. Thus, $\ip{H}{Z}\leq\ip{\I_{\X}}{\tr_{\Y}(HJ(\Phi)}$. However, observe that
\[ \begin{align*}
 \ip{\I_{\X}}{\tr_{\Y}(HJ(\Phi)}=\tr_{\X}(\tr_{\Y}(HJ(\Phi))=\tr_{\Y\otimes\X}(HJ(\Phi))=\ip{H}{J(\Phi)},
 \end{align*}\]
 which actually implies that $\ip{H}{Z}=\ip{\I_{\X}}{\tr_{\Y}(HJ(\Phi)}$. Hence, it must be the case that $\tr_{\Y}(HJ(\Phi)$ is a solution to the dual problem, and that $J(\Phi)$ is a solution to the primal problem since $J(\Phi)\in\mathcal{A}$ by virtue of $\Phi\in\Channel(\X,\Y)$. Thus, $\Phi\in\Channel(\X,\Y)$ is optimal with respect to $H$.

 It has now been shown that $\Phi\in\Channel(\X,\Y)$ is optimal with respect to $H$ if and only if $\I_{\Y}\otimes \tr_{\Y}(HJ(\Phi))-H\in\Pos(\Y\otimes\X)$.

A lower bound on the trace distance of tensor copies of states

Theorem:
  Let $\X$ be a complex Euclidean space, let $\rho_0,\rho_1\in\Density(\X)$ be density operators satisfying
\[
    \bignorm{\rho_0 - \rho_1}_1 \geq \varepsilon
\]
  for $\varepsilon > 0$, and let $n$ be an arbitrary positive integer.
  Then
\[
    \Bignorm{\rho_0^{\otimes n} - \rho_1^{\otimes n}}_1
    \geq 2 - 2 \exp\biggl(-\frac{n\varepsilon^2}{8}\biggr).
\]
  (The notation $\rho^{\otimes n}$ means $\rho$ tensored with itself $n$ times. For example, $\rho^{\otimes 4} = \rho\otimes\rho\otimes\rho\otimes\rho$.)

Proof:
By the Fuchs-van de Graaf inequalities (Theorem 3.34) we have that the following two statements are equivalent:
\[
1-\frac{1}{2}\bignorm{\rho_0-\rho_1}_1\leq\fid(\rho_0,\rho_1)\leq\sqrt{1-\frac{1}{4}\bignorm{\rho_0-\rho_1}_1^2},
\]
\[
2-2\fid(\rho_0,\rho_1)\leq\bignorm{\rho_0-\rho_1}_1\leq 2\sqrt{1-\fid(\rho_0,\rho_1)^2}.
\]
Also, by Proposition 3.16, it follows that
\[
\fid(\rho_0^{\otimes n},\rho_1^{\otimes n})=\fid(\rho_0,\rho_1)^{n}.
\]
Therefore,
\[
\fid(\rho_0^{\otimes n},\rho_1^{\otimes n})^2=\fid(\rho_0,\rho_1)^{2n}\leq\left(1-\frac{1}{4}\bignorm{\rho_0-\rho_1}_1^2\right)^n\leq\left(1-\frac{1}{4}\varepsilon^2\right)^n,
\]
since $ \varepsilon\leq\bignorm{\rho_0 - \rho_1}_1$ by assumption.

Since for $x\in\mathbb{R}$ such that $0\leq x\leq1$, and any positive integer $n$,
\[
(1-x)^n\leq \exp(-nx),
\]
 then for $0\leq\varepsilon\leq2$,
\[
\left(1-\frac{1}{4}\varepsilon^2\right)^n\leq  \exp\biggl(-\frac{n\varepsilon^2}{4}\biggr).
\]
This implies that
\[
\fid(\rho_0^{\otimes n},\rho_1^{\otimes n})\leq\left(1-\frac{1}{4}\varepsilon^2\right)^{n/2}\leq\exp\biggl(-\frac{n\varepsilon^2}{8}\biggr).
\]
However, by the Fuchs-van de Graaf inequalities, since
\[
2-2\fid(\rho_0^{\otimes n},\rho_1^{\otimes n})\leq\bignorm{\rho_0^{\otimes n}-\rho_1^{\otimes n}}_1,
\]
the previous result implies
\[
2-2\exp\biggl(-\frac{n\varepsilon^2}{8}\biggr)\leq2-2\fid(\rho_0^{\otimes n},\rho_1^{\otimes n})\leq\bignorm{\rho_0^{\otimes n}-\rho_1^{\otimes n}}_1,
\]
which completes the proof.