When writing about statistical machine learning, it can be useful to use clear notation to differentiate between
- random variables,
- realisations of random variables,
- matrices / vectors / scalars, and
- their distributions.
In my articles, I use roman font to denote random variables, e.g. \(\mathbf{X}, \mathbf{y}\).
Realisations of random variables (as well as all other variables) are italicised e.g. \(\mX, \vy\).
Upper-case characters denote matrices, while lower-case characters denote vectors.
Elements of matrices and vectors are written with normal (i.e. not bold) font weight.
Probability densities are written with lower-case letters (e.g. \(p\), \(q\)), and the distributions with which they are associated is left to be inferred by the name of the argument.
An Example
A random matrix \(\rmX\) with elements \(\ermX_{ij}\) has a realisation \(\mX\), itself with elements \(X_{ij}\).
The random matrix’s \(j\)th column is the column vector \(\rvx = \rmX_j\), with elements \(\ervx_{i}\). Correspondingly, a realisation of this random vector is \(\vx\), with elements \(\evx_{i}\).
For probabilities, \(p(\mX, \vy)\) is the density of the joint distribution over random variables \(\rmX, \rvy\), evaluated at \(\rmX = \mX\) and \(\rvy = \vy\).