For almost a period I’ve had intense Linear Algebra (some days 4-6 hours), a subject that does not need that much prior learning to be able to understand. You need to know addition and multiplication, that’s it. Yet, the power of linear algebra in real world problems (and highly theoretical problems) is great. The aim of this post is to give a short overview of the subject, summarizing basic concepts.

Introduction

To translate problems into matrix equations can be quite useful. It can allows us to perform simple things as balancing chemical equations (use of Ax=b), take derivatives of polynomials, approximate solutions to overdetermined systems, rotate objects in any space, find volumes, interpret shapes, and so much more. If we think about the very basic concept – matrix multiplication – we simply follow certain rules (a definition) to find the answer. It is quite interesting that we can use this fact to “store” information in matrices, that is, a certain set of actions that have to be performed given a rule. An example of this is differentiation of polynomials (see the ToK marked with blue). Below, I’ve included some examples I constructed using MATLAB.

Euclidean Vector Spaces

Since this is simply a subset of General Vector Space and Inner Product Spaces, the interested reader is advised to read those sections instead.

General Vector Spaces

In this section we were introduced to the concept of vector spaces; basically generalizing euclidean vector spaces (3d). Below, some of the questions and focuses:

How do we find the shortest distance between two lines in n space? Answer, shortest distance is the perpendicular one (by Pythagoras thm). When I am concerned with such problems, I simply pick a general point on the lines, find the line that goes through these points (this expression will contain both \(s\) and \(t\)), and use the power of calculus to find the shortest distance. Note, the calculus approach requires knowledge of multivariable calculus. Otherwise, remember that when dot product is zero, vectors are perpendicular (by definition).
What is a vector space? It might come as a surprise, but vector spaces are not only about vectors. We can have a polynomial vector space, matrix vector space, etc. The good thing here is that if we can prove that “something” is a vector space, then we are able to apply theorems for general vector spaces on these “something” vector spaces. We should prove that
1. closed under addition (\(\vec{u},\vec{v} \in V \implies (\vec{(u+v)}\in V)\)), i.e. by adding vectors we never leave the vector space V, and
2. that it is closed under scalar multiplication, (\(\vec{v}\in V \implies k\vec{v}\in V, k \in \mathbb{R}\)).
A trick is to set the scalar to zero to show that something isn’t a vector space. Lastly, the beautiful thing about mathematics is that many times it’s all about definitions. If I want, I can come up with addition that behaves as multiplication, etc. It’s up to you to decide (we will later see in Inner product spaces that we can define the “dot” product also).
A subspace? As the name suggests, it’s a vector space that is inside another vector space. Same applies to these also; they should be closed under addition and scalar multiplication.
Linear dependent/independent? This is all about being able to express a vector as a linear combination of other vectors. If no vector in a vector space \(V\) is expressible as a linear combination of other vectors in \(V\), then these vectors are linearly independent. Ex. If a vector space is spanned by vectors \(\{v_1,v_2,v_3\}\), then if we are unable to express these vectors in terms of each other, they are said to be linearly independent. Otherwise, they are linearly dependent. Recall cross product. We used it in “normal 3d space” (or more academically, orthonormal system with 3 basis vectors) to find a vector that is perpendicular to both vectors. NOTE: Linear independent vectors don’t need to be perpendicular, which is the case with cross product.
Basis? Above, we saw the use of basis without a proper definition. Not good. Basically, a basis is what makes up a vector space. It’s a set of vectors that are required to be a) linearly independent and b) they should span V (all vectors in V should be expressible as a linear combination of the basis vectors). To test if a basis is linearly independent, simply thrown them in into a matrix (in any direction), and evaluate the determinant. If it isn’t equal to zero, the vectors are said to be linearly independent.
Column space/row space/null space? Ok, we are getting into highly theoretical grounds (at least at this stage, it might seem so). Given \(A\) is \(m\times n\) matrix. Column space is the subspace of \(\mathbb{R^m}\) that is spanned by the column vectors of \(A\). The row space, as the name suggests, is the sub space of \(\mathbb{R^n}\) that is spanned by the row vectors of \(A\). The null space is the solution space (recurring definition) of \(Ax=0\). A great theorem states that a system of linear equations \(Ax=b\) is consistent if and only if \(b\) is in the column space of \(A\) (\(b\) is expressible as a linear combination of column vectors of \(A\)). “The proof is similar to [..] and is left as an exercise to the reader“(book).
Theorems? Yes, it turns out that there is a formula that relates the dimension of the column space (\(rank(A)\)) and the dimension of the null space (\(nullity(A)\)). For a matrix with \(n\) columns, we have that \(rank(A)+nullity(A)=n\). I would like to point out that dimension of the column space is equal to the dimension of the row space.
Orthogonal complements? A good relationship exists between the row space and the null space, and that is that they are orthogonal complements of each other. Orthogonal is a fancy way of saying perpendicular.
Is it possible to change basis vectors? Yes, this is perfectly fine. We use a matrix \(T\) that will function as a converter (ToK: Discuss the importance of representing of a set of instructions in a more abstract form). Here’s the relationship,
$\vec{v})_{B’}= {}_{B’}{T_B}(\vec{v})_{B}$

If we know the basis vectors of \(B’\) and \(B\), we can get the transition matrix by putting the basis vectors as columns in \([B’|B]\) and perform elementary row operations until we get to \([I|{}_{B’}{T_B}]\). In general,

$[new|old]\sim \text{el. row op.}\sim [I|old\to new]$
How does the notion of functions apply to vector spaces? From high school, many of us should be familiar with the fact that a function maps one value to another value. This can be applied to vector spaces. Again, it’s always good to have a solid definition. We say that \(T:V\to W\) is a function from vector space V to W, then T is a linear transformation if the following criteria is fulfilled: a) \(T(k\vec{v})=kT(\vec{v}\) and b) \(T(\vec{u}+\vec{v}) = T(\vec{u})+T(\vec{v}\). The good thing about being able to transform from one vector to another is that when this is put into a computer, we can do all sorts of cool things. For example, we can reflect, project, and rotate objects. We can also contract and dilate vectors, and this can be expressed in matrix form. Sometimes, we might want to perform two things at once, reflect and rotate maybe, then, we simply multiply the transformation matrices together in this order: rotate*reflect. Always think from right to left.

Inner Product Spaces

The fact that it is possible to generalize the notion of vector spaces suggests that the same can be done for operations that are performed inside a vector space.

What is an Inner product space (real vector space)? By definition, this is a vector space where we have an inner product with following properties.
1. \(\langle\vec{u}|\vec{v}\rangle=\langle\vec{v}|\vec{u}\rangle\)
2. \(\langle\vec{u}|\lambda\vec{v}\rangle=\lambda \langle\vec{u}|\vec{v}\rangle\)
3. \(\langle\vec{u}|\vec{v} +\vec{w}\rangle= \langle\vec{u}|\vec{v}\rangle \langle\vec{u}|\vec{w}\rangle\)
4. \(\langle\vec{u}|\vec{u}\rangle \ge 0\) and \(\langle\vec{u}|\vec{u}\rangle = 0 \iff \vec{u}=0\)
This is essentially the dot product in \(\mathbb{R^n}\). The good thing about generalizing it is that “something” does not necessarily need to be in \(\mathbb{R^n}\) to be an inner product space. For example, we can have an inner product space of all continuous functions (denoted by \(C[a,b]\)). Then, the inner product is defined as:

$\langle f(t)|g(t)\langle\int_a^b f(t)g(t)dt$

Since polynomials are continuous functions, this definition applies to polynomial inner product spaces.
What is perpendicularity (orthogonality)? In a a vector space with an inner product \(\langle\vec{u}|\vec{v}\rangle\), the angle between \(vec{u}\) and \(vec{v}\) is defined as
$cos \theta = \frac{{ \langle \vec u|\vec v \rangle }}{{||u||||v||}}$

Cauchy-Schwarz identity is quite useful and is given by: \(\langle\vec{u}|\vec{v}\rangle^2\le ||\vec{u}||^2||\vec{v}||^2\)
How to project a vector on a subspace of an inner product space? Since we have a generalized version of the dot product, it’s possible to generalize the projection of a vector on a subspace. Here’s how: Given an orthogonal (this is crucial) basis for \(W\in V\), say \(\{ {{\vec v}_1} \ldots {{\vec v}_n}\}\)and that \(\vec{u} \in V\), then:
${{\mathop{\rm Proj}\nolimits} _W}\vec u = \frac{{ \langle \vec u|{{\vec v}_1} \rangle }}{{||{{\vec v}_1}|{|^2}}}{{\vec v}_1} + \ldots + \frac{{ \langle \vec u|{{\vec v}_n} \rangle }}{{||{{\vec v}_n}|{|^2}}}{{\vec v}_n}$
How to find an orthogonal basis? Note that the projection on “formula” only works if the basis is orthogonal. In order to find an orthogonal basis, we can use Gram-Schmidt process.
$\begin{array}{l} {\rm{Step 1: }}{{\vec v}_1} = {{\vec u}_1}\\ {\rm{Step 2: }}{{\vec v}_2} = {{\vec u}_2} – \frac{{ \langle {{\vec u}_2}|{v_1} \rangle }}{{||{{\vec v}_1}|{|^2}}}{{\vec v}_1}\\ {\rm{Step 3: }}{{\vec v}_3} = {u_3} – \frac{{ \langle {{\vec u}_3}|{{\vec v}_1} \rangle }}{{||{{\vec v}_1}|{|^2}}}{{\vec v}_1} – \frac{{ \langle {{\vec u}_3}|{{\vec v}_2} \rangle }}{{||{{\vec v}_2}||}}{{\vec v}_2}\\ \vdots \\ {\text{r times}} \end{array}$

Remember that

$ \vec u = {{\mathop{\rm Proj}\nolimits} _W}\vec u + {{\mathop{\rm Proj}\nolimits} _{{W^{\bot}}}}\vec{u}$
How to find the “best” solution in an over-determined system? A good approach is to apply Least Square method. Over-determined systems occur in scientific measurements, and so if the error is assumed to be normally distributed, we can use the method of Least Squares. The idea is to think about the system’s column space and then try to project the measurements on the plane that the column space spans. It turns out that the best solution is
$x = {({A^T}A)^{ – 1}}{A^T}b$

Keep in mind that it’s not required to have an orthogonal basis (i.e. the column space does not have to be orthogonal). Using this, we can express the projection formula as a simple matrix multiplication

${{\mathop{\rm Proj}\nolimits} _L}\vec u = A{({A^T}A)^{ – 1}}{A^T}b$

An interesting property that can be obtained is that A^TA is invertible if and only if the column space of A is linearly independent.

Linear Transformations

I think the Swedish term linjär avbildning conveys more information rather than just stating linear transformation. The term basically means linear depiction, which is what this topic is about. At the first sight, this might seem as converting from one base to another. This isn’t entirely true, I’ve realized. It’s more appropriate to view this as a function that maps one value to another. As functions can be injective, surjective, bijective, it’s not implied that we should always be able to map a transformed point back to the original point. In contrast, when changing bases, it’s always possible to get from one basis to another; you never really introduce/remove more information.

How to define Linear Transformation? A linear transformation \(A: V\to W\) should have the following properties:
1. \( A(\vec {u} + \vec {v})) = A(\vec {u}) + A(\vec {v} \)
2. \( A(\lambda \vec u) = (\lambda A \vec u) \)
What is Kernel and Image space(range)? If you’ve read what I wrote in General Vector Spaces, the Kernel = Null space and Image space = Column space.
Example? Let’s define the following linear transformation
$\begin{array}{l} A(1,2,0) = (1,1,1)\\ A(0,1,2) = (0,1,1)\\ A(1,3,3) = (0,0,1) \end{array}$

Unfortunately, this is not so useful if we want to transform a vector with the matrix A. It is easier if we would have the standard basis vectors on the left hand side. So, similar to the change in basis example, we put this into a matrix try to get an identity matrix on the left hand side. This method (Martin’s method) was discovered by Martin Wennerstein 2003. See the formal explanation of the method.
Relation between column space of composite linear transformations? Given that \(B\) has full rank, i.e. \(rank(B)=n\), then \(rank(BA)=rank(A\). Motivation: This is obvious if we consider the transformation \(BA\) means. First, we perform transformation \(A\), which will lead us to the image of A (\(range(A)\)). That is, all vectors will be depicted on the image space of \(A\). When transformation \(B\) is performed, all vectors in the image space of \(A\) will be transformed using \(B\). Note, \(B\) kernel is trivial, i.e. the zero vector transforms to zero vector.
What’s special about the kernel? It can be useful to think of kernel as “the information that gets lost during a transformation”(KTH student). For example, if the kernel contains more than just a zero vector (i.e. it’s non-trivial), when we perform a transformation, some vectors will be depicted on the zero vector; thus “information” disappears.
What is injectivite, surjective, and bijective? Let \(A:V \to W\). Injective (one-to-one) is if \(\vec x \neq \vec x’ \implies A(\vec x) \neq A (\vec x’)\), i.e. each vector is represented by a unique vector (so we can map back to the original vector after a transformation). Surjective (onto) is if \(\forall \vec y \in W\exists \vec x \in V:\vec y = A\vec x\), i.e. for all vectors in the image of \(A\), we can map it to the original vector. Bijective is when it’s both injective and surjective.
Properties? For something to be injective, the dimension of the kernel has to be zero. For \(A: V\to W\) to be surjective, \(\dim \ker(A) = \dim(W) \). This can be proved by the dimension theorem and the definition of surjectivity.

Eigenvalues

An eigenvalue is the value lambda in \(A\vec x = \lambda \vec x\). It can be, for instance, applied in problems such as a) Find

$ {A^{1000}}\left( \begin{array}{l} 2\\ 1 \end{array} \right) $

given that we know A, or b) express \(2x_1^2+ 2x_2^2+2x_3^2+4x_1x_2\) (see this) without the cross product terms. The latter is particularly good when identifying shapes that have been rotated/transformed.

How to find eigenvalues? Simply solve \(\det (A – \lambda I) = 0\), the characteristic equation.
How to find the eigenvectors corresponding to eigenvalues? Solve \((A – \lambda I) (\vec x) = 0\).
Express a matrix using eigenvalues and eigenvectors? We can always express a matrix A as \(A = PD{P^{ – 1}}\). If P happens to be orthogonal (i.e. the rows/columns space forms an orthonormal base), then we can express it as \(A = PD{P^T}\) because of a property of orthogonal matrices that states that \(A{A^T} = I\) or equivalently that \({A^{ – 1}} = {A^T}\).
Raising matrices to a certain power? It can be shown that \({A^k} = P{D^k}{P^{ – 1}}\).
Applying orthogonal deagonalization on quadratic form?
$\begin{array}{l} a{x^2} + b{y^2} + c{z^2} + d{x_1}{x_2} + e{x_1}{x_3} + f{x_2}{x_3} = \\ = ({x_1},{x_2},{x_3})\left( {\begin{array}{*{20}{c}} a&{d/2}&{e/2}\\ {d/2}&b&{f/2}\\ {e/2}&{f/2}&c \end{array}} \right)\left( \begin{array}{l} {x_1}\\ {x_2}\\ {x_3} \end{array} \right) \end{array}$

Now, this can be seen as \(x^TAx\). If we make a substitution \(\vec x = P\vec y\) such that \(P\) orthogonaly diagonalizes \(A\), then

$ {x^T}Ax = {(P\vec y)^T}A(P\vec y) = {y^T}({P^T}AP)y$
Orthogonal matrices? In an orthogonal matrix, the row vectors form an orthonormal base (same for column vectors). An interesting property that exists when multiplying a vector \(x\) is
$||A\vec x ||=||\vec x ||$
and
$A\vec x * A \vec y=\vec x* \vec y$

Then, as we mentioned previously, the inverse of an orthogonal matrix is simply the transpose (please see my answer for a possible application).

The Beauty of Linear Algebra – Summary