Basic Concepts in Linear Algebra
Vector Spaces, Vectors, and Matrices
In mathematics, an algebraic structure is an abstraction consisting of (i) a set of elements, (ii) operations that manipulate those elements, and (iii) axioms that the operations must satisfy. The power of this abstraction is that once the core properties of the structure are formalized in general, they can be applied to any specific system — mathematical or real-world — that shares the same structure. For example, the field \(F\) is an algebraic structure consisting of elements called scalars with operations of addition and multiplication that satisfy a number of axioms. A ubiquitous field is the set of real numbers \(\mathbb{R}\).
Linear algebra is the study of vector spaces \(V\), which is an algebraic structure defined in the context of a field. The elements in a vector space are called vectors. For any two vectors \(\boldsymbol u,\boldsymbol v \in V\), the operation of vector addition creates a third vector \(\boldsymbol u + \boldsymbol v \in V\); this is known as closure under vector addition. For any scalar \(c \in F\) and vector \(\boldsymbol u \in V\), the operation of scalar multiplication creates another vector \(c \boldsymbol u \in V\); this is known as closure under scalar multiplication. The 8 axioms that govern these two operations are listed here.
Any sets of elements equipped with vector addition and scalar multiplication that satisfy the closure property and the 8 axioms is considered a vector space. Of particular interest are \(n\)-tuples of the form
\[ \boldsymbol u = (u_1, u_2, \ldots, u_n), \]
where the components \(u_1, \ldots, u_n\) are scalars from a field \(F\). The set of all such \(n\)-tuples is denoted by \(F^n\).1 For example, \(\mathbb{R}^3\) is the set of all 3-tuples of real numbers. Here, vector addition is defined as the component-wise operation:
\[ \begin{aligned} \boldsymbol u = (u_1, u_2, &\ldots, u_n) \in F^n \quad\text{and}\quad \boldsymbol v = (v_1, v_2, \ldots, v_n) \in F^n \\ \\ \boldsymbol u + \boldsymbol v &= (u_1 + v_1, u_2 + v_2, \ldots, u_n + v_n) \in F^n. \end{aligned} \]
Similarly, scalar multiplication is defined as the component-wise operation:
\[ \begin{aligned} c \in F \quad&\text{and}\quad \boldsymbol u = (u_1,u_2, \ldots, u_n) \in F^n \\ \\ c \boldsymbol u &= (c u_1, c u_2, \ldots, c u_n) \in F^n. \end{aligned} \]
A natural generalization of \(n\)-tuples is the \(m \times n\) array called the matrix:
\[ \mathbf A = \begin{pmatrix} a_{11} & a_{12} & \ldots & a_{1n} \\ a_{21} & a_{22} & \ldots & a_{2n} \\ \vdots & \vdots & \ddots & \vdots \\ a_{m1} & a_{m2} & \ldots & a_{mn} \end{pmatrix}, \quad a_{ij} \in F. \]
The set of all \(m \times n\) matrices with components in a field \(F\) is denoted by \(F^{m \times n}\). Vector addition and scalar multiplication are defined analogously to the component-wise operations for \(n\)-tuples. Specifically, for any two matrices \(A, B \in F^{m \times n}\), vector addition creates a third matrix \(A + B \in F^{m \times n}\) whose components are given by
\[ (\mathbf{A} + \mathbf B)_{ij} = \mathbf A_{ij} + \mathbf B_{ij}. \]
For any scalar \(c \in F\), scalar multiplication creates another matrix \(cA \in F^{m \times n}\) where
\[ c \mathbf A_{ij} = c (\mathbf A_{ij}). \]
Basis and Dimension
A vector space \(V\) contains infinitely many vectors. We are interested in finding a smaller set of vectors that captures the entire structure in a much more tractable manner. For example, consider the vector space \(\mathbb{R}^3\) in Figure 1. It seems like any point (i.e. vector or 3-tuple) on the blue lattice structure can be described by how far it extends along the \(x\), \(y\), and \(z\) axes. This section formalizes this idea.
Linear Combination, Span, and Linear Independence
Let \(\mathcal{A} = \{\boldsymbol{u}_1, \ldots, \boldsymbol{u}_n\}\) be some finite subset of vectors in \(V\). One way to formalize the idea of \(\mathcal{A}\) capturing the entire structure of \(V\) is if any vector \(\boldsymbol{v} \in V\) can be expressed as a linear combination of the vectors in \(\mathcal{A}\):
\[ \boldsymbol v = c_1 \boldsymbol u_1 + c_2 \boldsymbol u_2 + \ldots + c_n \boldsymbol u_n = \sum_{i=1}^n c_i \boldsymbol u_i \in V, \]
where \(c_1, \ldots, c_n \in F\). If this is the case, then we say that the set \(\mathcal{A}\) spans the vector space \(V\). An equivalent characterization is that the set of all linear combinations of the vectors in \(\mathcal{A}\) — the span of \(\mathcal{A}\) — is equal to \(V\).
We are also interested in efficiency. That is to say, we want \(\mathcal{A}\) to be as small as possible while still spanning \(V\). We call a set of vectors linearly dependent if one of the vectors can be expressed as a linear combination of the others. More formally, the vectors \(\boldsymbol{u}_1, \ldots, \boldsymbol{u}_n\) in \(\mathcal{A}\) would be linearly dependent if
\[ c_1 \boldsymbol u_1 + c_2 \boldsymbol u_2 + \ldots + c_n \boldsymbol u_n = 0, \]
and not all of the \(c_i\) are zero. Thus, we would like to remove any linearly dependent vectors from \(\mathcal{A}\): such vectors are going to be redundant since they will be a part of the span of the other vectors in \(\mathcal{A}\). Formally, we say that the vectors \(\boldsymbol{u}_1, \ldots, \boldsymbol{u}_n\) in \(\mathcal{A}\) are linearly independent if
\[ c_1 \boldsymbol u_1 + c_2 \boldsymbol u_2 + \ldots + c_n \boldsymbol u_n = 0, \]
only if \(c_1 = c_2 = \ldots = c_n = 0\).
Taken together, these two ideas — spanning and linear independence — gives us exactly what we were looking for: a minimal yet complete description of the vector space \(V\). More formally, we call a set of vectors a basis \(\mathcal{B}\) for the vector space \(V\) if it is a linearly independent subset that spans \(V\). The vectors in a basis are called basis vectors.
Dimension
Notice that in the above definition of the basis set \(\mathcal{B}\), we did not assume that the number of basis vectors is finite. In fact, there exists vector spaces that require infinitely many basis vectors. For example, an \(n\)-degree polynomial is defined as
\[ f(x) = a_nx^n + a_{n-1}x^{n-1} + \ldots + a_1 x + a_0, \] where \(x\) is a variable, \(a_i \in F\), and \(n \geq 0\) is an integer. The vector space of all polynomials with coefficients in a field \(F\), denoted \(P(F)\), has the infinite basis set
\[ \mathcal{B}_{P(F)} = \{1, x, x^2, x^3, \ldots\}. \] Nevertheless, note that the definition of linear combination requires a finite number of vectors. Thus, even if a basis set \(\mathcal{B}\) for a vector space \(V\) is infinite, any vector \(\boldsymbol v \in V\) can be represented as a linear combination of a finite subset of vectors in \(\mathcal{B}\).
To distinguish between vector spaces with finite and infinite basis sets, we introduce the notion of dimension, which is simply the number of basis vectors in a basis set \(\mathcal{B}\) for the vector space \(V\). If \(V\) has a finite basis, then we say that \(V\) is finite-dimensional; otherwise, it is infinite-dimensional.
Basis Sets Are Not Unique
An important property of basis sets is that they are not unique. Let us revisit the vector space \(\mathbb{R}^3\) to illustrate this point. At the start of this section, I mentioned that it seems possible to describe any point in \(\mathbb{R}^3\) by how far it extends along the \(x\), \(y\), and \(z\) axes. With this intuition, we can define the following basis set
\[ \mathcal{B}_{\mathbb{R}^3} = \{(1,0,0), (0,1,0), (0,0,1)\} = \{\boldsymbol e_1, \boldsymbol e_2, \boldsymbol e_3\}. \]
This is known as the standard basis for \(\mathbb{R}^3\). We can verify that it is a valid basis by confirming that (i) the three vectors are linearly independent and (ii) any vector \(\boldsymbol u = (u_1,u_2,u_3) \in \mathbb{R}^3\) can be expressed as a linear combination of the basis vectors. As an example, the vectors \(w = (2,1.5,3)\) and \(v=(-1.5,0.6, -1.2)\) are plotted below.
However, any any set of three linearly independent vectors in \(\mathbb{R}^3\) that span \(\mathbb{R}^3\) can serve as a basis. For example, the following set is also a valid basis for \(\mathbb{R}^3\):
\[ \mathcal{B}'_{\mathbb{R}^3} = \{(1,0,0), (1,1,0), (1,1,1)\}. \] The figure below provides visual intuition for this basis.
Subspaces
It is often useful to consider a lower-dimension vector space that still preserves the properties of the original vector space. Formally, for a vector space \(V\), we define the subspace \(W\) to be any vector space that consists of a nonempty subset of the vectors in \(V\) endowed with the same operations of vector addition and scalar multiplication defined on \(V\).
This definition can seem abstract at first, and so it’s valuable to walk through a concrete example. Consider the following subset of \(\mathbb{R}^3\)
\[ W = \{(x,y,0): x, y \in \mathbb{R}\} \subseteq {\mathbb{R^3}}, \]
which is visualized as the turquoise \(x-y\) plane in the figure below.
The subset \(W\) is a subspace of \(\mathbb{R}^3\). To see this, note that for any \(\boldsymbol u = (u_1, u_2, 0) \in W\), \(\boldsymbol v = (v_1, v_2, 0) \in W\), and \(c \in \mathbb{R}\), the component-wise definitions of vector addition and scalar multiplication for \(\mathbb{R}^3\) are closed in \(W\)
\[ \begin{aligned} \boldsymbol u + \boldsymbol v = &(u_1 + v_1, u_2 + v_2, 0) \in W \\ \\ c \boldsymbol u &= (c u_1, c u_2, 0) \in W, \end{aligned} \] and thus \(W\) is a valid vector space.
Importance of Origin
Notice that the origin \((0,0,0)\) — the additive identity for \(\mathbb{R}^3\) — is contained in \(W\). This is not a coincidence: every subspace of \(\mathbb{R}^3\) must contain the origin. To see this, consider the \(x-y\) plane shifted up by one unit:
\[ W' = \{(x,y,1): x, y \in \mathbb{R}\} \subseteq {\mathbb{R^3}}. \]
This set does not include the origin, and it fails to be a subspace because the operations of vector addition and scalar multiplication are not closed in \(W'\). Specifically, for any \(\boldsymbol u = (u_1, u_2, 1) \in W'\), \(\boldsymbol v = (v_1, v_2, 1) \in W'\), and \(c \in \{\mathbb{R} / 1\}\), we have that
\[ \begin{aligned} \boldsymbol u + \boldsymbol v = &(u_1 + v_1, u_2 + v_2, 2) \notin W' \\ \\ c \boldsymbol u &= (c u_1, c u_2, c) \notin W', \end{aligned} \] and so \(W'\) is not a valid vector space. Importantly, this is a general property not limited to \(\mathbb{R}^3\): any subspace must contain the additive identity (also called the zero vector) of the parent vector space.
Basis of Subspaces
A general property of subspaces of a finite-dimensional vector space is that their dimension is less than the dimension of the parent vector space.2 For example, in the example above, a basis for the \(x-y\) plane in \(\mathbb{R}^3\) is
\[ \mathcal{B}_W = \{(1,0,0), (0,1,0)\}. \]
Thus, the dimension of \(W\) is 2.