$\DeclareMathOperator{\erfc}{erfc}$ $\DeclareMathOperator{\var}{var}$ $\DeclareMathOperator{\mean}{mean}$
Noise spectral density: $N_0$ Joules, one-sided
Symbol rate: $r_s$ symbols/sec.
Information rate: $r_b$ bits/sec.
Average signal power: $S$ Watts.
Code rate: $R = r_b/r_s$ (for binary symbols).
Baseband signalling: $B = r_s/2$.
Passband signalling: $B = r_s$.
Bandwidth efficiency, or spectral efficiency: $r_b/B$
Noise power $N = N_0 B$
Passband noise power: $N = N_0 r_s$
Baseband noise power: $N = N_0 r_s/2$
Signal energy per symbol: $E_s = S/r_s$
Signal energy per info bit: $E_b = S/r_b$
Baseband: $$ \frac{S}{N} = \frac{2S}{ N_0 r_s} = \frac{2 E_s r_s}{ N_0 r_s} = \frac{2E_s}{N_0} = \frac{2RE_b}{N_0} $$ Passband: $$ \frac{S}{N} = \frac{S}{ N_0 r_s} = \frac{ E_s r_s}{ N_0 r_s} = \frac{E_s}{N_0} = \frac{RE_b}{N_0} $$
A minimum S/N above which reliable comms can theoretically be achieved by suitable coding.
In terms of code rate $R = r_b/r_s = k/n$ for some block code $(n,k)$:
Baseband: $R < C = \frac{1}{2} \log_2\left( 1 + \frac{S}{N} \right)$
Passband: $R < C = \log_2\left( 1 + \frac{S}{N} \right)$
In terms of info rate $r_b$ bits per second and comms bandwidth B Hertz, both passband and baseband:
$$
r_b < C^* = r_s C = B \log_2\left( 1 + \frac{S}{N} \right)
= B \log_2\left( 1 + \frac{Eb}{N_0} \frac{r_b}{B} \right)
$$
Limit as $B$ tends to $\infty$
$$
\lim_{B \to \infty} C^* = \log_2 \exp \left( \frac{r_b E_b}{N_0} \right)
= \frac{1}{\log(2)} \frac{r_b E_b}{N_0}
$$
So
$$
\begin{split}
r_b &< \frac{1}{\log(2)} \frac{r_b E_b}{N_0} \\
\log(2) < \frac{E_b}{N_0}
\end{split}
$$
Probability that a normal random variable will exceed $x$ standard deviations above the mean. $$ Q(x) = \frac{1}{2} \erfc{ \left( \frac{x}{\sqrt{2}} \right)} $$
In general $P_e = Q\left(\sqrt{ \frac{S}{N}}\right)$
Baseband signalling: $P_e = Q\left(\sqrt{ \frac{2 E_s}{N_0}}\right)$
Passband, coherent BPSK: The received signal has $S/N =E_s/N_0$ in a bandwidth $B$ equal to the symbol rate $r_s$. After quadrature decomposition, the noise power $N_0$ is split equally between the in-phase and quadrature components so the noise density in each component is $N_0/2$. After rotation to allow for the reference phase, all the signal power is in one channel so the S/N seen by the decoder is $2E_s/N_0$ and the symbol error rate $P_e$ is therefore the same as for baseband signalling.
Non-coherent BPSK: $P_e = \frac{1}{2} \exp\left(-\frac{E_s}{N_0}\right)$
Orthogonal coherent FSK: $P_e = Q\left(\sqrt{ \frac{E_s}{N_0}}\right)$
Non-coherent orthogonal FSK: $P_e = \frac{1}{2} \exp\left(-\frac{E_s}{2N_0}\right)$
I show the use of cross correlation (inner product) to implement the Euclidean distance metric in the codebook search.
Consider a noisy received word: an array of $n$ symbol amplitudes, $x[1] \dots x[n]$.
The decoder is going to compare $x[]$ with a codeword defined by an array
of $n$ amplitudes $A[1] \dots A[n]$ of values +1, -1.
The squared Euclidean distance between received word and codeword is just Pythagoras in $n$ dimensions: $$ d^2 = \sum_{i=1}^{n} (x[i] - A[i])^2 $$ which multiplies out to $$ d^2 = \sum_{i=1}^{n} x[i]^2 + \sum_{i=1}^{n} A[i]^2 - \sum_{i=1}^{n} x[i]A[i] $$ The first term is the total energy of the received word, the second term is the total energy of the codeword, and the third term is the inner product of the received word and codeword. When the decoder comes to look for the smallest $d^2$ of all the codewords, the first term can be dropped since it is constant. Similarly the second term can be dropped - the $A[i]$ all have values +1 or -1 so all the codewords have the same total energy $n$: $$ \sum_{i=1}^{n} A[i]^2 = n $$ It is the remaining term, the inner product, which the decoder bases its decision on. The decoder will choose the codeword with the maximum value of $$ \sum_{i=1}^{n} x[i]A[i] $$ Note that we don't have to scale the received amplitude to any particular signal value by adjusting the gain.