The entanglement of formation of a density operator $\rho\in\Density(\X^{A}\otimes\X^{B})$ is defined as
\[
E_{f}(\rho) = \inf\Biggl\{\sum_{a\in\Sigma} p(a) E(u_a u_a^{\ast})
\,:\, \rho = \sum_{a\in\Sigma} p(a) u_a u_a^{\ast} \Biggr\},
\]
where $E(u u^{\ast}) = S(\tr_{\X^{B}}(u u^{\ast}))$ denotes the entanglement entropy of the pure state $u u^{\ast}$ and the infimum is over all expressions of $\rho$ of the given form, where $\Sigma$ is any alphabet, $p\in\P(\Sigma)$ is a probability vector, and $\{u_a\,:\,a\in\Sigma\} \subset \X^{A}\otimes\X^{B}$ is a collection of unit vectors.
Theorem:
For every choice of complex Euclidean spaces $\X^{A}$, $\X^{B}$, $\Y^{A}$, and $\Y^{B}$, every density operator $\rho\in\Density(\X^{A}\otimes\X^{B})$, and every separable channel $\Phi\in\SepC(\X^{A},\Y^{A}: \X^{B},\Y^{B})$, it holds that
\[
E_{f}(\Phi(\rho)) \leq E_{f}(\rho).
\]
Proof:
Assuming that $\Phi\in\SepC(\X^{A},\Y^{A}: \X^{B},\Y^{B})$ allows $\Phi$ to be expressed as
\[
\Phi(X)=\sum_{b\in\Gamma}(A_b\otimes B_b)X(A_b^\ast\otimes B_b^\ast),
\]
where $\Gamma$ is some alphabet and $\{A_b : b\in \Gamma\}\subset \Pos(\X^A)$ and $\{B_b : b\in \Gamma\}\subset \Pos(\X^B)$. For $\rho = \sum_{a\in\Sigma} p(a) u_a u_a^{\ast}$, the action of $\Phi$ on $\rho$ is specified by the action of $\Phi$ on each $u_au_a^\ast$ as
\[
\Phi(\rho)=\sum_{a\in\Sigma} p(a) \Phi(u_a u_a^{\ast})=\sum_{a\in\Sigma} p(a)\sum_{b\in\Gamma}(A_b\otimes B_b)u_au_a^*(A_b^\ast\otimes B_b^\ast).
\]
Therefore, represent $\Phi(u_au_a*)$ as
\[
\Phi(u_au_a*)=\sum_{b\in\Gamma}(A_b\otimes B_b)u_au_a^*(A_b^\ast\otimes B_b^\ast)=\sum_{b\in\Gamma}q_a(b) v_{ab} v_{ab}^{\ast},
\]
where $(A_b\otimes B_b)u_a=\sqrt{q_a(b)}v_{ab}$. Now let
\[
C_b=\frac{1}{\sqrt{q_a(b)}}(A_b\otimes B_b),
\]
so that $C_bu_au_a^\ast C_b^\ast=v_{ab}v_{ab}$.
Consider the channel $\Psi_{ab}\in\SepC(\X^{A},\Y^{A}: \X^{B},\Y^{B})$ defined by
\[
\Psi_{ab}(X)=C_bXC_b^\ast+(\tr(X)-\tr(C_bXC_b^\ast))\sigma,
\]
for some arbitrary $\sigma\in\Density(\Y^A\otimes\Y^B)$. Then $\Psi_{ab}$ is indeed a channel since it is completely positive because it is defined in terms of $C_b$ and $\Phi$ is assumed to be completely positive. Likewise, $\Psi_{ab}$ is separable since $\Phi$ is separable. Moreover, $\Psi_{ab}$ is trace preserving since
\[\begin{align*}
\tr(\Psi_{ab}(X))&=\tr(C_bXC_b^\ast)+(\tr(X)-\tr(C_bXC_b^\ast))\tr(\sigma) \\
&=\tr(C_bXC_b^\ast)+\tr(X)-\tr(C_bXC_b^\ast) \\
&=\tr(X).
\end{align*}\]
By construction $\Psi_{ab}(u_au_a^\ast)=v_{ab}v_{ab}^\ast$.
Therefore, by a corollary (6.36) to Nielsen's theorem it follows that for every $a\in\Sigma$ with $\rho_a^A=\tr_{\X^B}(u_au_a^\ast)$ and $\sigma_a^A=\tr_{\X^B}(v_{ab}v_{ab}^\ast)$ and $r=min\{rank(\rho_a^A),rank(\sigma^A_a)\}$ it holds that
\[
\lambda_1(\rho_a^A)+\dots+\lambda_1(\rho_m^A)\leq \lambda_1(\sigma_a^A)+\dots+\lambda_1(\sigma_m^A)
\]
for every $m\in\{1,\dots, r\}$.
Thus, the von Neummann entropy satisfies $S(\sigma_a^A)\leq S(\rho_a^A)$, which implies that the entanglement entropy also satisfies $E(v_{ab}v_{ab}^\ast)\leq E(u_au_a)$ for all $a\in\Sigma$ and $b\in\Gamma$. Then by tracing out system $B$, and taking the weighted average that is described the original state $\rho$ and using the joint convexity of the von Neumann entropy it follows that
\[\begin{align*}
\sum_{a\in\Sigma} p(a)\sum_{b\in\Gamma}q_a(b) E(v_{ab} v_{ab}^{\ast})
\leq \sum_{a\in\Sigma} p(a) E(u_a u_a^{\ast}).
\end{align*}\]
Therefore, by definition $E_{f}(\Phi(\rho)) \leq E_{f}(\rho)$.
\[
E_{f}(\rho) = \inf\Biggl\{\sum_{a\in\Sigma} p(a) E(u_a u_a^{\ast})
\,:\, \rho = \sum_{a\in\Sigma} p(a) u_a u_a^{\ast} \Biggr\},
\]
where $E(u u^{\ast}) = S(\tr_{\X^{B}}(u u^{\ast}))$ denotes the entanglement entropy of the pure state $u u^{\ast}$ and the infimum is over all expressions of $\rho$ of the given form, where $\Sigma$ is any alphabet, $p\in\P(\Sigma)$ is a probability vector, and $\{u_a\,:\,a\in\Sigma\} \subset \X^{A}\otimes\X^{B}$ is a collection of unit vectors.
Theorem:
For every choice of complex Euclidean spaces $\X^{A}$, $\X^{B}$, $\Y^{A}$, and $\Y^{B}$, every density operator $\rho\in\Density(\X^{A}\otimes\X^{B})$, and every separable channel $\Phi\in\SepC(\X^{A},\Y^{A}: \X^{B},\Y^{B})$, it holds that
\[
E_{f}(\Phi(\rho)) \leq E_{f}(\rho).
\]
Proof:
Assuming that $\Phi\in\SepC(\X^{A},\Y^{A}: \X^{B},\Y^{B})$ allows $\Phi$ to be expressed as
\[
\Phi(X)=\sum_{b\in\Gamma}(A_b\otimes B_b)X(A_b^\ast\otimes B_b^\ast),
\]
where $\Gamma$ is some alphabet and $\{A_b : b\in \Gamma\}\subset \Pos(\X^A)$ and $\{B_b : b\in \Gamma\}\subset \Pos(\X^B)$. For $\rho = \sum_{a\in\Sigma} p(a) u_a u_a^{\ast}$, the action of $\Phi$ on $\rho$ is specified by the action of $\Phi$ on each $u_au_a^\ast$ as
\[
\Phi(\rho)=\sum_{a\in\Sigma} p(a) \Phi(u_a u_a^{\ast})=\sum_{a\in\Sigma} p(a)\sum_{b\in\Gamma}(A_b\otimes B_b)u_au_a^*(A_b^\ast\otimes B_b^\ast).
\]
Therefore, represent $\Phi(u_au_a*)$ as
\[
\Phi(u_au_a*)=\sum_{b\in\Gamma}(A_b\otimes B_b)u_au_a^*(A_b^\ast\otimes B_b^\ast)=\sum_{b\in\Gamma}q_a(b) v_{ab} v_{ab}^{\ast},
\]
where $(A_b\otimes B_b)u_a=\sqrt{q_a(b)}v_{ab}$. Now let
\[
C_b=\frac{1}{\sqrt{q_a(b)}}(A_b\otimes B_b),
\]
so that $C_bu_au_a^\ast C_b^\ast=v_{ab}v_{ab}$.
Consider the channel $\Psi_{ab}\in\SepC(\X^{A},\Y^{A}: \X^{B},\Y^{B})$ defined by
\[
\Psi_{ab}(X)=C_bXC_b^\ast+(\tr(X)-\tr(C_bXC_b^\ast))\sigma,
\]
for some arbitrary $\sigma\in\Density(\Y^A\otimes\Y^B)$. Then $\Psi_{ab}$ is indeed a channel since it is completely positive because it is defined in terms of $C_b$ and $\Phi$ is assumed to be completely positive. Likewise, $\Psi_{ab}$ is separable since $\Phi$ is separable. Moreover, $\Psi_{ab}$ is trace preserving since
\[\begin{align*}
\tr(\Psi_{ab}(X))&=\tr(C_bXC_b^\ast)+(\tr(X)-\tr(C_bXC_b^\ast))\tr(\sigma) \\
&=\tr(C_bXC_b^\ast)+\tr(X)-\tr(C_bXC_b^\ast) \\
&=\tr(X).
\end{align*}\]
By construction $\Psi_{ab}(u_au_a^\ast)=v_{ab}v_{ab}^\ast$.
Therefore, by a corollary (6.36) to Nielsen's theorem it follows that for every $a\in\Sigma$ with $\rho_a^A=\tr_{\X^B}(u_au_a^\ast)$ and $\sigma_a^A=\tr_{\X^B}(v_{ab}v_{ab}^\ast)$ and $r=min\{rank(\rho_a^A),rank(\sigma^A_a)\}$ it holds that
\[
\lambda_1(\rho_a^A)+\dots+\lambda_1(\rho_m^A)\leq \lambda_1(\sigma_a^A)+\dots+\lambda_1(\sigma_m^A)
\]
for every $m\in\{1,\dots, r\}$.
Thus, the von Neummann entropy satisfies $S(\sigma_a^A)\leq S(\rho_a^A)$, which implies that the entanglement entropy also satisfies $E(v_{ab}v_{ab}^\ast)\leq E(u_au_a)$ for all $a\in\Sigma$ and $b\in\Gamma$. Then by tracing out system $B$, and taking the weighted average that is described the original state $\rho$ and using the joint convexity of the von Neumann entropy it follows that
\[\begin{align*}
\sum_{a\in\Sigma} p(a)\sum_{b\in\Gamma}q_a(b) E(v_{ab} v_{ab}^{\ast})
\leq \sum_{a\in\Sigma} p(a) E(u_a u_a^{\ast}).
\end{align*}\]
Therefore, by definition $E_{f}(\Phi(\rho)) \leq E_{f}(\rho)$.