Link back to Home page

Link to typed notes:

0. Complex Numbers

C={a+bia,bR}Re(a+bi)=aIm(a+bi)=b

Why is i on the other side??

Graphically, z=a+biC corresponds to the point (a,b)R2

Definition Modulus of a complex number a+bi is

z∣=a2+b2

Think of it like vectors and distances from origin

Definition Argument of a complex number is the angle between z and the positive x axis measured in radians anti-clockwise

Arg(z) is the principle value of the argument, but arg(z) is defined by adding + 2πn

The argument lies in the interval (π,π] : not including π but including π

eiθ=cosθ+isinθ, θC

Polar form

Every non-zero complex number can be written in the form of

z=r(cosθ+sinθ)=reiθ

Where

r=∣z, θ=arg(z)

Let z1 be r1eiθ1 and z2 be r2eiθ2

Multiplication

z1z2=(r1r2)ei(θ1+θ2)

Division

z1z2=(r1r2)ei(θ1θ2)

Powers (De Moivre's)

(r(cosθ+isinθ))n=rn(cosnθ+isinnθ)zn=(reiθ)n=rneinθ

Roots of units

z1n=r1nei(θ+2kπ)n, k=0,1,...n1

Example

Take z=1+i

Conjugates and Reciprocals

Why do they matter?


If z=a+bi , the conjugate is defined as: z¯=abi

Tothinkaboutthepointabovegeometrically,$z$and$z¯$arejusttwovectorswhichhavethesamemagnitudebutoneofthereflectionoftheotherintherealaxis,soifwemultiplythosetwovectors,weareessentiallydoublingthemagnitudeofthevector.Thisalsomakesintuitivesenseinpolarform$$z1z=1

The reciprocal by definition is the number that when multiplied to gives the result of 1
In terms of geometry, z is essentially scaling a vector by r and rotating it by θ, and what z1 does is scales the vector by 1r and rotates it by θ

Reciprocals: $z^{-1} = \frac{\bar{z}}{\mid z \mid

{ #2}
}$

Proof:

Let z=a+bi

We want z1=1z

Therefore:

1z=1a+bi×abiabi=abia2+b2

The conjugate z¯=abi

And the modulus is z∣=a2+b2
Therefore $$ z^{-1}= \frac{\bar{z}}{\mid z \mid
{ #2}
} , \space z \neq 0$$

Reciprocals in polar form

Any non-zero complex number can be written as:

z=reiθ

The reciprocal z1 can be written as:

z1=1z=1reiθ=1r×1eiθ

But 1eiθ=eiθ

So we have

z1=1reiθ

This form can also be derived from the earlier formula: z^{-1} = \frac{\bar{z}}{\mid z \mid { #2} }

We have z¯=reiθ and \mid z \mid { #2} = r^2

\frac{\bar{z}}{\mid z \mid { #2} } = \frac{re^{-i \theta}}{r^2} = \frac{1}{r} e^{-i \theta}

Polar form is very useful when converting numbers from their cartesian form to mod-arg form

If we had z=a+bi, we would have to expand a product, deal with conjugates and end up with fractions for both real and imaginary parts.

But if we had reciprocals in polar form with z=reiθ, then we just need to invert the modulus and flip the angle

Consider this example: z=3+4i

z1=34i32+42=325425i

And now consider the polar way:

r=5, θ=arctan(43)

z1=15eiθ

Proof of fundamental theorem of algebra here

Roots and conjugate pairs

If a polynomial has real coefficients, then any non-real complex root must appear with its complex conjugate as another root

So if a+bi,b0 is a root, then abi is also a root
Why?

Suppose the polynomial is

P(x)=anxn+an1xn1++a1x+a0, akR,

Now assume that zC is a root:

P(z)=anzn+an1zn1+...+a1z+a0=0

This is essentially saying that if we plug is z which is in the set of complex numbers into P(z), the result is 0, which means z is a solution to the polynomial

Then if we take the conjugate of both sides:

P(z)¯=0¯

However, 0¯=0

And because the coefficients are real:

P(z)¯=anz¯n+an1z¯n1+...+a1z¯+a0=P(z¯)

So we've shown that $$ P(\bar{z}) =0 $$
Hence, if z is a root, so is z¯

Solving roots in polar form


1. Real n space Rn

Definition The real n-space Rn is the collection of n-tuple of real numbers

v=(x1,x2,,xn), xiR

Eg:
R0 = a single point
R1 = the real number line
R2 = the plane
Etc…

Rn can also be viewed as set of points

Scalar/Dot product

A scalar product (aka dot product) of two vectors returns a scalar (i.e a real or a complex number)

For u=(u1,u2,un) and v=(v1,v2vn) in Rn, we define

uv=u1v1+u2v2++unvn=i=1nuivi

Properties of scalar product

Definition Orthogonal vectors - vectors at right angles

ab=0

Norm of a vector

Norm is basically the magnitude or length of a vector

||v||=vv

Unit vector - A vector with the magnitude of 1

a^=a||a||

Generalized Pythagoras theorem:

|| \vec{a} + \vec{b} || { #2} = ||\vec{a} || + ||\vec{b} ||

Projections

Intuitively, imagine we have two vectors: a and b, and we want to know how much of a lies in the direction of b .

The projection of a vector a onto another vector b is the "shadow" of a in the direction of b. It tells you how much of a lies along b.

Now, the problem is, a might not lie perfectly along b, it might be slanted, so we want to find the part of a that does lie along it, and that's what projection is.

We know that any point on the line b must look like a multiple of b. So every point on that line can be written as:

b=λb

We need to find a λ multiple of b gives the point on that line where a drops a perpendicular, so we need a scalar λ such that the point P on the line through b satisfies the equation:

OP=λb

and the vector from P to the tip of a (which is aλb) is perpendicular to P

Since perpendicular vectors have dot product = 0, we have

(aλb)b=0abλbb=0λ=abbb

Therefore, the component of a along b is the scalar ab|b|2, and the projection is vector λb

Cauchy-Schwarz Inequality

Using the dot product definition, we have

ab=|a||b|cosθ

Since cosθ1, we immediately get:

|ab||a||b|

This is the Cauchy-Schwarz inequality, and it essentially means that a project (shadow) can never be longer than the actual vector.

Triangle inequality

The triangle inequality here below can be proved using the Cauchy-Schwarz inequality:

|a+b|2=|a|2+2(ab)+|b|2

After expanding the LHS, we can how apply the Cauchy-Schwarz inequality

|ab||a||b|

So we have

|a+b|2|a|2+|a||b|+|b|2=(|a|+|b|)2

After taking square roots, we have:

|a+b||a|+|b|

Equation of a Line in Rn

To define a line, we need two things (maybe more, but two main):

There are a few ways to describe lines.

Vector equation

The vector equation of a line is the most geometric form, if the line passes through the point A and is parallel to the direction of vector d, then every r on the line can be reached by:

r=a+λd

where λ is a scalar that moves along the line. Intuitively, we're starting at a and then moving in direction d for some distance λ

Parametric equation

This is just a component form of the vector equation, but it's useful when working with intersections or substituting into other equations for λ.

λR is essentially the parameter controlling position along the line (and yes that's why it's parametric)

If we write vectors in components:

a=(x1,y1,z1), d=(a,b,c), r=(x,y,z)

Then

(x,y,z)=(x1,y1,z1)+λ(a,b,c)

This gives

In this case, the parametric equation of a line is:

x=x1+aλy=y1+bλz=z1+cλ

Cartesian equation

The cartesian form eliminates λ and expresses the relation between x,y,z. It's also useful when finding where two lines/planes intersect.

Converting from parametric to Cartesian equation example

When converting from parametric to cartesian, our goal is to eliminate λ

So given our three parametric equations above, we can solve each of them for λ to get our cartesian equation of a line, which in general form is:

xx1a=yy1b=zz1c

Geometrically speaking, we get a ratio of how much we've moved along the line (scaled by λ) in each coordinate direction

Parallel, Intersection, Skew

In R2, lines are either parallel or their intersect at a point. But in R3, lines can either be parallel, intersect, or skew, or coincident.

It's useful to outline what each of them mean and what their test would look like:

Definition

Forming and solving systems of equations to determine the arrangement of lines in R3

Let the two lines L1 and L2 be given by:

L1:r=a1+λd1L1:r=a1+λd2

where

In component form, this gives a system of 3 linear simultaneous equations in the two unknowns λ and μ

x1+a1λ=x2+a2μy1+b1λ=y2+b2μz1+c1λ=z2+c2μ

And we we can either

Once we have our result, we can use our definitions to check the arrangement of lines in R3

Equation of a plane

Let's start with the intuition, what is a plane? (Yes it is something that flies..)

A plane is a flat 2D surface that extends infinitely in three dimensions. We can think of it as :

Definition A plane can be defined by a point a that lies on it, and a normal vector n that is perpendicular to the plane

Vector equation

So a plane is a set of all points r=(x,y,z) such that the vector (ra) lies orthogonal (or perpendicular) to n

so if r=(x,y,z) is a general point on the plane, then we have

n(ra)=0

Check that this makes sense, the dot product being zero means that the vectors are perpendicular

Cartesian equation

Say the normal vector n=(A,B,C) and a=x0,y0,z0, then after computing the dot product and simplifying

Ax+By+Cz+D=0

Where D=(Ax0+By0+Cz0)
Also note that $$ \vec{r} \cdot \vec{n} = \vec{a} \cdot \vec{n} $$

Intuitively, this equation defines all the points (x,y,z) that live in a plane tilted such that its perpendicular points in the direction of (A,B,C)

There are pretty good 3b1b videos on Linear Algebra for some visual understanding. I often imagine solving problems in Grant Sanderson's voice to give myself some confidence when approaching questions

Parametric equation

As with lines, planes can also be defined using a point and two non-parallel direction vectors

So if a is a point on the plane, and b and c are two independent direction vectors lying in the plane, then any point on the plane can be written as:

r=a+λb+μc

And every point on the plane is reachable by some combination of λ and μ

Vector product

To find perpendicular/orthogonal vectors, we need to have something that lets us calculate that, and this is where the vector product comes in. The ’trick’ so to speak of calculating this is similar to finding determinants in matrices

Let

a=(a1a2a3)andb=(b1b2b3)

be two vectors in R3.

The vector product a×b is the vector in R3 with coordinates

a×b=(a2b3a3b2a3b1a1b3a1b2a2b1).

Vector product properties

When we take the cross product a×b, we get a new vector that's orthogonal to both a and b . It's magnitude is given by:

|a×b|=|a||b|sinθ

Parallelograms...

Where θ is the angle between a and b. This is also the area of a parallelogram formed by the two vectors. And the direction of the cross product is orthogonal to the parallelogram.

Given that there are two directions to a plane, up and down, we need to know which direction the cross product points towards. And this is where the right hand rule comes in.

Now, if we flip the order, and instead do b×a, the magnitude remains the same, but the direction reverses

So we get:

a×b=b×a

Geometrically speaking, the cross product represents the oriented area of the parallelogram spanned by a and b.

Intersection of Lines and Planes in R3

There are 3 cases for an intersection between a line and a plane:

The first two cases are pretty straightforward. And the final case can be checked by computing the normal vector to π and seeing if it’s orthogonal to the direction of L.

Now, the intersection of 3 planes in R3 has 4 possibilities:

Distances in R3

Point and a Plane

Given a point P and a plane π in R3, we want to find the shortest distance between a plane and a point

Say if we've got a plane:

n(ra)=0

and a point P with position vector p

Then the distance from P to the plane is the projection of (pa) onto the normal n :

d=|n(pa)||n|

Imagine standing above a plane with a flashlight pointing along the normal vector. Then the shadow of the position vector onto that normal is the perpendicular, i.e the shortest path.

Point and a line

Let's say we want to find the distance between the line L:R=A+λa and the point P, both in R3

Use Pythag, we can show that the distance between P and L is equal to the |PN| where N is the point such that PN is perpendicular to a

We can do this by considering projections. The projections of PA along a is

proja(PA)=(PAa)(aa)a

The PA here is made up of two parts, the parallel part (projection) and the perpendicular part. So to get the perpendicular component, we have to subtract the projection:

PN=PAproja(PA)

The distance is therefore given by the norm or the magnitude of both sides

||PN||=||PA(PAA)aaa||

2. Matrix Algebra

Definitions

Definition Suppose that m,nN are natural numbers. 
An m×n matrix A is a rectangular array with m rows and n columns:

A=(a11a12a1na21a22a2nam1am2amn).

The coefficients a11,a12,,amn are called the entries of the matrix. 
For any i{1,,m} and j{1,,n}, the entry aij is said to be the (i,j)-th element (or the (i,j)-th entry) of A.

(A)ij Is used to denote aij.

Definition Zero matrix
Size m×n matrix in which all entries are 0.

Definition Diagonal matrix
A square matrix A is diagonal if aij=0 whenever ij 
(i.e., all non-diagonal entries are zero):

A=(a11000a22000ann)

Definition Identity matrix
An n×n diagonal matrix with 1's on the diagonals. Denoted by In

In=(100010001)n×n 

Thus In=(δij), where

δij={1,if i=j,0,if ij.

The addition and subtraction for matrices is only defined for matrices of the same size, but multiplication, as well will see below, has other properties

Why can’t we add or subtract matrices that aren’t the same size??

Matrix multiplication

Let A be an m×n matrix and B be a n×p matrix.

then C=AB is the m×p matrix defined as:

cij=k=1naijbkj

The matrix product is essentially the product of the row vectors in a, and the column vectors in b. So it works like a dot product.

C=AB=(a1b1a1b2a1bpa2b1a2b2a2bpamb1amb2ambp)m×p

So, cij=aibj, the (i,j)-th entry of AB is equal to the scalar product of the i-th row of A with the j-th column of B
So the product AB is only defined if the number of columns in A equals the number of rows in B

Properties:

Transposition of a Matrix

Let A=(aji) be an m×n matrix. The transpose of A is the n×m matrix AT defined by

(AT)ij=(A)ji=aji,for all i,j, 1in, 1jm.A=(a11a12a1na21a22a2nam1am2amn)m×nthenAT=(a11a21am1a12a22am2a1na2namn)n×m.

In other words, rows of A become columns of AT, and columns of A become rows of AT.

Example:

(123456)T=(142536),(123456789)T=(147258369).

If A=AT, then the matrix A is said to be symmetric

Properties

Inverse of a Matrix

For numbers, the inverse "undoes" the effect. Eg: $$ 3 \times \frac{1}{3} = 1 $$
For matrices, the inverse works in the same way, it undoes the transformation a the matrix does:
So $$ AA^{-1} = I$$
Where I is the identity (the "do nothing") matrix

An n×n matrix A is said to be invertible if there exists an n×n matrix B such that

AB=InandBA=In,

where

In=(100010001)

is the n×n identity matrix. In this case, B is called the inverse of A, and is denoted B=A1.
Let A be a 2×2 matrix with adbc0.

A=(abcd)

Then A is invertible and

A1=1adbc(dbca).

Properties

Let A and B be invertible n×n matrices. Then the following are true:

Powers of a Matrix

If A is a square matrix (n×n), then we can multiply it by itself:

A2=AA,A3=AAA

3. Systems of Linear Equations

A system of m equations in n unknown can be written as:
And in matrix form:

Ax=b

(a11a12a1na21a22a2nam1am2amn)(x1x2xn)=(a11x1+a12x2++a1nxna21x1+a22x2++a2nxnam1x1+am2x2++amnxn).

Where:

Augmented Matrix

The augmented matrix of system is

(A|b)=(a11a1nb1am1amnbm)

Row Operations

There are 3 allowed row operations that allows us to simplify systems without changing the solution set. I.e. matrices are are row equivalent

Row Scaling

RiλRi

Row interchange (swap)

RiRj

Row replacement

RiRi+μRj,ij,μR

Gaussian Elimination

Consider we're trying to solve a system of simultaneous linear equations like

{x+2yz=32x+y+z=83x+y+2z=5

Instead of solving one variable at a time using substitution, we can turn this into an augmented matrix.

So we instead have:

(121321183125)

So each row is an equation, and each column, except the last one, is one variable's coefficients.

The idea now is to use the row operations listed above to reduce the augmented matrix to row echleon form (REF) or reduced row echleon form (RREF)

A pivot is the first non-zero entry in a row of a matrix

A matrix is is REF if:

A matrix is in RREF if:

Definition Gaussian Elimination is the method of solving linear equations by starting with (A|b) and performing row operations to put A into RREF, then reading the solutions .

Each row in the matrix represents a plane in 3D (or a line in 2D or a hyperplane in higher dimensions). So Gaussian elimination is essentially rotating and scaling these planes until they're "aligned with the axes". The intersection point doesn't change, we just manipulate it to make it easier to understand

Reading solutions

Once our matrix is in REF, the augmented matrix will correspond to the solutions of the equation

(100α010β001γ)(x=αy=βz=γ)

Reducing a matrix to RREF

Our goal is to transform a given matrix A into a version where:

ARREF=(1000100010000)

The means that:

To reduce a matrix to RREF:

Matrix inverses using row operations

Definition A system of linear equation is consistent iff the augmented matrix, when put into this REF has no rows of the form (00|b) with b0

Each matrix has a unique matrix in RREF

If A is invertible, then there is a unique solution given by x=A1b

Given any x with Ax=b

x=Inx=(A1A)x=A1(Ax)=A1b

Here, the key idea is that row operations that turn A into I will also turn I into A1.

Definition Elementary matrix - A matrix obtained from the identity matrix In by doing ONE row operation. Doing row operations is the same as multiplying by elementary matrices. One of the problem sheets goes through this proof

Each elementary matrix is invertible, and it’s inverse is an elementary matrix of the same type. Ie. if we multiply E by any matrix A, we perform that same row operation on A.

If A is an n×n matrix, then:

If A and B are n×n matrices and AB=In, then both A and B are invertible and A1=B and B1=A

Algorithm for finding inverses using row operations

Suppose we have an n×n matrix A and we want to find A1 or show that it is not invertible

We first form an augmented matrix:

(A|In)n×2n

Then we apply row operations to this matrix to being the left half A to In. If this works, then we get: (In|A1) so A is invertible. It if doesn’t work, then A is not invertible

Since we can get from A to In by applying finitely many row operations, there are elementary n×n matrices E1,E2,Ek=B s.t.

Ek(E2(E1A))=In(EkE1)A=InBA=InB=A1

Why the RHS equals A1

As each row operation can be represented by an elementary matrix E, when we do all of them in sequence, we're multiplying their product:

EkEk1E1A=I(EkEk1E1)=A1

But in the augmented matrix form, we've been applying those same exact operations to the identity matrix on the RHS, so the RHS ends up being:

EkEk1E1I=A1

And therefore:

(A|In)(In|A1)

Example

Suppose we want to find the inverse of the matrix:

A=(2111)

First we write it in the augmented form:

(21|1011|01)

We want to scale so that the first pivot is 1, so we have: R1R1R2. Remember to perform the same row operations on the RHS as well

(10|1111|01)

Next, we want to eliminate below the pivot: R2R2R1

(10|1101|12)

Now, as the LHS is I2, this means the RHS is by definition A1. And we can check this by computing:

AA1=(2111)(1112)=(1001)=I2

Rank of a Matrix

Definition Given a m×n matrix A, the rank of A is the number of non-zero rows in any REF of A

The rank essentially measures how many independent rows or columns a matrix has. So for an n×n matrix, having rank n means that all its columns and rows are linearly dependent. So in this case:

Therefore, if A is an n×n matrix then A is invertible rank(A)=n
This is also why if there is a zero row in an RREF form, the inverse of a matrix does not exist

Properties of a Rank

Let A be a m×n matrix and B be any matrix:

4. Determinants

Mn(R) - the set of all n×n matrices with real entries. We don't start by just saying what the determinant is, but what kind of function it must be. So the axiomatic definitions are given below. Note that these are slightly less formal than what's in the official lecture notes, but that's mostly down to making it easier and more intuitive to understand.

Definition
A function D:Mn(R)R is a determinant if the following 3 conditions hold:

1. Linearity in each row

If we multiply a row by a scalar λ, then the determinant scales by λ
And if we add the rows, determinants add. So the sum of the determinants is the determinant of the sums

D(,λri,)=λD(,ri)D(,ri+ri+)=D(,ri,)+D(ri,)

2. Alternating

If AMn(R) has two equal rows D(A)=0

So if two rows are equal, then the determinant is 0
Swapping two rows flips the sign of the determinant

3. Normalisation

D(I)=1,In=n×n identity matrix

Using the axioms above, we can further prove the row operations and their effects on matrices

So

D(B)=αD(A)

where α is the type of row operation.

Determinants of Upper Triangular Matrices

Definition

An upper triangular matrix is a matrix where all entries below the diagonal are zero:

A=(a11a12a13a140a22a23a2400a33a340000ann)

So if A is an upper triangular matrix, then the determinant of A is the product of the diagonal entries of A:

D(A)=a11a22a33ann

Intuitively, the matrix A represents a transformation that:

Each diagonal entry aii is essentially a scaling of one axis direction. So if we started with the identity matrix (which has D=1), and just scaled each axis by those number. The volume (which is basically what a determinant is) would be the product of all the diagonal entries. And shearing doesn't change the determinant as it's the same idea as adding a multiple of one row to another.

A more formal proof can be done by using row operations and transforming the above matrix into In, and all those transformations involve row replacement options only.

Determinant of any square matrix

Once we have the facts above, we now have an algorithm to compute the determinant of any square matrix:

To compute D(A):

Notation wise, determinant of A = |A|=detA

Proof for a 2×2 matrix

For

A=(abcd)det(A)=adbc

Geometrically speaking, in R2, the two column vectors of A span a parallelogram. And the area of that = |adbc|

Now, for the more formal proof, we can compute det(A) by performing row operations until we make A upper triangular, and then take the product of the diagonal entries

We can do the row operations R2R2caR1 which gives us:

B=(ab0dca)

From the property of determinants, the row replacement does not change the determinant, so det(B)=det(A). Now as B is upper triangular, we can calculate the determinant by:

det(B)=a(dcab)=adbc

Therefore, det(A)=adbc

The determinant of a matrix with integer entries is always an integer

Existence and Uniqueness

nN,  unique determinant D:Mn(R)R satisfying the axiomatic definitions of determinants

This will be proved in Linear Algebra II by giving a different definition of determinant. So the lesson is to never trust anything.

Determinants and Invertibility

 An n×n matrix is invertible  det(A)0

Proof

Starting with the side:

If A is invertible, then A1 s.t. AA1=In

Now we can take the determinant of both sides

det(AA1)=det(In)det(A)det(A1)=1det(A)=1det(A1)

This means that det(A)0 as 1/0 is not defined

Now for
For a bit of intuition, recall that the determinant is geometrically the scaling factor of volume when A acts as a linear transformation. So if det(A)=0, then the volume collapses to a lower dimension, and that transformation is not reversible.

Now for the algebraic proof. We know that when we apply Gaussian elimination to turn A into an upper triangular matrix (U):

A=EkE2E1U

Each Ei is an elementary matrix which is invertible. So we have that:

det(A)=(det(Ek)..det(E1))det(U)

Now, if det(A)0det(U)0
But det(U) is the product of diagonal entries, so none of them can be zero. This means that all pivot positions in Gaussian elimination are non-zero, which means we can reduce A all the way down to In. Hence A is invertible

There is another proof by contrapositive, given in the lectures notes.
So if we can prove that A is not invertible, then det(A)=0, that will be same as proving the original statement

So if A is not invertible, it means rank A<n. This means that for any REF form B of A, B has a row a zeroes. But A can be reduced to this REF form via a finite sequence of row operations, so det(B)=βdet(A) for some β0.

det(A)=1βdet(B)=0 B has a zero row

And if a matrix has a zero row, the determinant will be 0 as the it's the product of diagonals in a upper triangular matrix.

Determinant using Cofactors

Now that we know how to calculate the determinant of a 2×2 matrix, we want to a formula that still captures that same alternating and linear property. And the cofactor expansion can be thought of as a recursive way to do this, it breaks down a big determinant into small ones (minors)

So if A is an n×n matrix, then det(A) is defined in terms of (n1)×(n1) determinants.

For some definitions, consider an n×n matrix A=(aij)
Then:

Cij=(1)i+jMij

So to find the determinant, we pick a row/column, say row 1, then for each element a1j in that row:

So formally, if

A=(a11a12a13a21a22a23a31a32a33)

Then

det(A)=i=1na1jc1j+a2jc2j++anjcnj

Where

C1j=(1)1+jdet(M1j)

And M1j is the minor obtained by deleting row 1 and column j

So the formula for any n×n matrix A=(aij), expand along the i-th row:

det(A)=j=1n(1)i+jaijdet(Mij)

The sign pattern comes from one of the determinant axiom which tells us that if we swap two rows, the determinant changes the sign.

As we're expanding the determinant along a row, we're sort of "isolating" one element aij and its minor. Intuitively, we're bringing aij up to position by moving rows/columns around, and to account for that movement, we multiply by (1)i+j

This is quite efficient when we have a matrix with lots of zero entries

#explainfuture

Determinant of a Transpose

For any square matrix A,

det(AT)=det(A)

What this is means transposing a matrix (i.e. flipping rows and columns) of a matrix does not change its determinant

Intuitively, if we think determinant as the volume scaling factor, then transposing it just changes our point of view, so instead of looking at how A acts on row vectors, you look at how it acts on column vectors. Hence, the volume scaling remains the same.

The other way to think about is using the fact that the determinant is the product of the diagonals, and when we transpose a square matrix, the diagonals remain the same, hence the determinant is the same.

Determinant of a Product

If A,B are n×n matrices, then det(AB)=det(A)det(B)

Geometrically again, scaling a volume by a factor of det(A) and scaling it by det(B) is the same as scaling it by det(A)det(B)

More formally speaking, we can think of every invertible matrix A as a product of elementary matrices:

A=EkEk1E1

And as each elementary matrix corresponds to a single row operation, and because we know how determinant changes under row operations:

det(EiA)=det(Ei)detA$$andsinceeverymatrixcanbebuiltfromtheelementarymatrices,wehavethat:$$det(AB)=det(A)det(B)

The proof in the notes is bit more detailed and uses the 3 cases of the elementary matrices, but the idea is the same.

Powers of a matrix

det(Ak)=(detA)k,kZdet(A1)=1det(A)

The proof is rather trivial, and can be proved by induction if wanted as an exercise to the reader

Inverting a Matrix using Cofactors

This section is skipped as it will not be examined

5. Linear Transformations

This is where the proper linear algebra starts. A linear transformation is just taking a function from one vector space to another, following a couple of specific rules, namely additivity and scaling.

Definition Let m,nN, and let T:RnRn be a function. Then T is a linear transformation if u,vRn and λR if we satisfy two properties: additivity and homogeneity:

T(u+v)=T(u)+T(v)T(λu)=λT(u)

Intuitively, a linear transform never bends, curves, or shifts space, it can only scale, rotate, shear, reflect, etc. So essentially, if the origin doesn't stay fixed, then the transformation is not linear.

The most important thing is probably this: every linear transformation is a matrix multiplication.
So if A is an m×n matrix, then

T:RmRm defined by T(x)=Ax xRn

So every linear transform, is a multiplication by some matrix A. So if we know what the transformation does to the basis vectors, we know what it does to every vector in the space. The proof of this is trivial, and is explained in the notes.

Now that we know that T:RnRm, is a linear transform, then standard basis vectors of Rn can help us determine the whole transformation.

Consider the standard basis vectors in Rn :

ei=(0,0,1,0,0)

Where the i-th entry is the vector entry 1

And we know that any vectors x can be written in terms of the basis vectors:

x=x1e1+x2e2++xnen

So T is linear, then

T(x)=x1T(e1)+x2T(e1)++xnT(en)=(T(e1)T(en))x=Ax

Hence just showing the basis vectors, is enough to define the entire transformation. It's like taking the unit square and seeing how it changes under a matrix.

Linear Transformation in R2

Every linear transformation in R2 is completely determined by: T(e1),T(e2), where

e1=(10),e2=(01)

So to understand a transformation, we need to think about what it does to e1,e2

Rotation about the origin

Let T:R2R2 be the rotation by θ[0,2π) anti-clockwise about the origin:

Now think about what happens to e1=(1,0), when rotating, it lands at (cosθ,sinθ)
And when e2=(0,1) is rotated, it lands at (cos(θ+π/2),sin(θ+π/2)), which simplified is (sinθ,cosθ).
Therefore the matrix of rotation is given by:

A=(cosθsinθsinθcosθ)

We can check that rotation preserves the area, so the determinant of this matrix is in \cos^2 \theta + \sin { #2} \theta = 1fact

And therefore the matrix of linear transformation is given by:

T((xy))=A(xy)=(cosθxsinθysinθxcosθy), (xy)R2

Reflection in a line through origin

Note that a reflection keeps points on the line fixes, and flips points perpendicular to the line. There is a diagram in the notes which makes it easy to see why its 2θ, but another way to think about it to consider the basis vectors and how they're changing.

A reflection through the angle θ is the same at a rotation by θ to align the axis, then reflect across the x axis, and rotate it back by θ

The matrix is therefore given by:

T=(cos(2θ)sin(2θ)sin(2θ)cos(2θ))

The determinant of this matrix, when calculated, is 1

Stretch/shrink

A scaling just has the effect of multiplying axes. So the matrix here is self-explanatory:

T=(λ00μ)

T stretches/shrinks the plane by λ in the direction of the x axis and by μ in the y axis

Shear

(1λ01)(10μ1)

It keeps one of the axis fixed, and moves all the other points in that direction parallel to the axis.

Composition of Linear Transformation

Composition of transformations is the same as what it's like in functions:

(ST)(x)=S(T(x))

So we apply T first, then S. It's just like functions, because transformation is a function.

Matrix multiplication is the same as composing linear transformations. So if T(x)=Ax and S(x)=Bx, then

(ST)(x)=S(T(x))=B(Ax)=(BA)x

So the matrix of the composite transformation ST=BA. And that's why matrix multiplication is defined the way it is!

The composition of a linear transformation is a linear transformation. It can be proved using the definition of composition and the transformation definitions above

Finding linear transforms

Inverses of Linear Transformations

Definition A linear transformation T:RnRn is invertible if there exists a linear transformation S s.t.:

ST=TS= Id

Like any inverse, an inverse transformation just undoes the original transformation.

Proof of the inverse of linear transforms

Think back to calculus, an inverse exists T is a bijection.

So suppose T is a bijection, and now we want to show that it has an inverse linear transformation.

So let S=T1

And now we need to show that S is linear so that satisfies the definition of a linear transformation, ie. we need to show that:

S(u+v)=S(u)+S(v)S(λu)=λS(u)

So let a=S(u) and b=S(v), which lets us write T(a)=u and T(b)=v, and now we use this substitution.

Consider T(a+b), and because T is linear, we can use the linearity of T to get:

T(a+b)=T(a)+T(b)=u+vT(S(u+v))=u+v

As S is the inverse of T, we have:

T(S(u+v))=T(a+b)

As T is a bijection, we know that it's also an injection, so we have that: T(x)=T(y)x=y . And by that logic:

S(u+v)=a+b=S(u)+S(v)

Now... for the scalar multiplication, consider T(λa)
As T is linear:

T(λa)=λT(a)T(S(λu))=λu=T(λa)

Using the injectivity of T we have:

S(λu)=λa=λ(S(u))

Invertibility

Let T:RnRn, then T Is invertible A is invertible, AND T invertible, then matrix of T1=A1

As this is an proof, we need to show both sides

So T is invertible, then S s.t.

ST= Id,TS= Id

So let B be the matrix of S, and A be the matrix of T

By composition of matrices, (ST)=BA and the identity matrix is In

So since, (ST)= Id:

BA=In

And since TS= Id:

AB=In

Therefore we have that:

AB=BA=In

And by definition A is invertible, and its inverse is B

Now for the

We assume that A is invertible, and now we need to show that T is invertible

Let S(x)=A1x, and now we compute the composition ST:

(ST))(x)=S(T(x))=S(Ax)=A1(Ax)=(A1A)x=Inx=x

So we know that ST= Id

Now to compute the other composition to check for invertibility:

(TS)(x)=T(S(x))=T(A1x)=A(A1x)=(AA1)x=Inx=x

So TS= Id

And since both compositions have given us the identity, S is the inverse of T and since S is linear and given by A1, we know that T1=A1

6. Subspaces of Rn

Vector subspaces

Intuitively speaking, a vector subspace of Rn is just a subset of Rn sitting inside it that behaves like a vector space on its own. It's a region where all vectors behave consistently under the transformation rules of the space. So it must include the origin.

It has three properties:

So anytime we're given a vector, we just need to check for the properties above to see if it's a subspace of Rn or not

Null Spaces

Definition

Let T:RnRm be a linear transformation. Then the null space (kernel) of T is the subspace Rn defined by:

N(T)={xRn | T(x)=0Rm}

I.e. it's basically the set of all vectors that get mapped to the zero vector for some m×n matrix A:

N(A)={xRn |A(x)=0Rm}

The null space is a subspace of Rn, and to show that, we just need to show that it satisfies the three properties:

And by linearity of T and definition of a linear transformation, we can show that the null space/kernel is always a subspace

Invertibility and Null spaces

A is invertibleker(A)=0

If the only solution to A(x)=0 is x=0 A is invertible. Another way to think about is that if there any non-zero vector x0 with Ax=0, then

Linear Span

Linear span is just the set of all linear combinations of vectors, and this is always a subspace because linear combination rules automatically satisfy the subspace axioms

 span{v1vk}={c1v1++ckvkR }

Range and Column Space

The column space is of a matrix A is just the set of all linear combinations v1vk in the sub space, so:

 col(A)= span(v1vk)

And the range is the subset of Rm of a transformation T:RnRm:

R(T)={T(x) | xRn}

And the range is also a subspace, which can be tested using the properties of subspace again

Linear Independence

Definition

Vectors are linear independent if:

c1v1++ckvk=0Rc1=c2==ck=0

What this means is that no vector can be written as a combination of others. So if a columns in the matrix are linearly dependent:

Trivial solution just means that the solution is 0, so non-trivial solution means there is at least one non-zero solution. If vectors are linearly independent trivial solution exists. It's like if the only way to get 0 is to make the coefficients 0, then all the vectors span in different directions.

So with linear independence, if we can combine the vectors with non-zero coefficients to make them 0, then one of them must be a combination of the other.

So for a square matrix, all of the conditions below are equivalent:

Test for linear independence

To see if vectors are linearly dependent or not, put the matrix in REF and look out for non-zero rows

Bases

A basis is just a minimal set of vectors that can generate (span) the whole space with no redundancy. So they give you all possible directions in your space, and the smallest possible set of directions. A bit like a choosing the axes of the space.

So in R2, the standard basis would be e1=(1,0) and e2=(0,1). And that means any vector (x,y) can be made by a combination of these two basis vectors

Definition
A collection of vectors v1,..vk form a basis of vector space V if:

Basis are like the coordinate system for a vector space. The other important thing about basis is hat they are unique. Every vector x has a unique representation

7. Eigenvectors, Eigenvalues and their applications

A bit of a diversion, as in my notes I will try and focus on the intuitive idea and then the definition.

So consider a linear transformation T. Most vectors are rotated, stretched, squished, etc. So when we apply a matrix, most vectors change direction. But some vectors lie on their own span and don't change directions. I.e. they may stretch, shrink, flip, but they don't turn. And those no turning directions are what we call eigenvectors. And the amount they stretch/shrink by is the eigenvalue.

Definition
A number λR is said to be an eigenvalue of A if there is a non-zero vector vR s..t

Av=λv

A is the matrix representation the transformation, and v is the eigenvector, and λ is eigenvalue

So eigenvectors are the directions a transformation preserves, and eigenvalues are the scaling factors for those directions.

Determinant criteria

Notice how the left hand side of the equation is vector multiplication, but the right side is scalar. So to turn the RHS into a vector multiplication, we can write λ=λIn where In is the identity matrix.

So we have:

Av=λIv

And we can rearrange and factor out v to get:

(AλI)v=0

When solving the equation above, all we're doing is looking for a non-zero vector that gets sent to zero by the linear transformation AλI

The reason for this is because the kernal of A is just the set of directions that get sent to 0, i.e. the direction in which the transformation collapses into a lower dimension. It's answering the question: "For what values of λ does the equation have a non-zero solution".

And this happens when the determinant is 0:

λR is an eigenvalue of Adet(AλIn)=0

So an eigenvalue exists when the sifted matrix AλI loses invertibility

Characteristic polynomial

pA(λ)=0(det(AλIn)=0) is the characteristics equation of A

There’s a reason it’s called a characteristic polynomial, because it satisfies the characteristics of the matrix equation.

This basically helps us solve for values of λ for which the transformation collapses, which is all an eigenvalue is. And the roots of this polynomial are exactly the values of λ where (AλI) is not invertible. So pa(λ)=0λ is an eigenvalue

Upper triangular matrices

Say we have an upper triangular matrix:

A=(a10a200a3)

And now consider the matrix AλI

AλI=(a1λ0a2λ00a3λ)

This is still an upper triangular matrix! And the determinant of this matrix is the product of its diagonal entries.

det(AλI)=(a1λ)(a2λ)(anλ)

And if we set this equal to 0, the roots will be just: λ=a1,a2...an

Therefore, the diagonal entries of an upper triangular matrix are the eigenvalues of that matrix!

Applications in Google’s Page Ranking system

Read the MathsRant blog page!

Symmetric Matrices

A matrix A is symmetric because A=AT. So we can see that symmetric matrices correspond to transformations that don't twist space, only stretch, squash or reflect it

Let A be an n×n symmetric matrix with real entries. Then:

We can also think about this as the dot product with matrix product as scalar product

This is a result of Spectral Theorem, but the proof was omitted.

So as In is a symmetric matrix, all the eigenvectors are just the basis vectors. And an eigenbasis is just a basis of the space consisting entirely of eigenvectors of A. So if A has an eigenbasis, then every vector can be written as a combination of eigenvectors

We want to find linearly independent eigenvectors because a basis must be definition be linearly independent and geometrically, we need enough directions to describe the whole space. So if eigenvectors are dependent, they don't span enough directions.

Multiplicity

Multiplicity just tells us how many times an eigenvalue appears as a root of the characteristic polynomial. This is algebraic multiplicity.
Eg: (λ2)3)(λ+1) has eigenvalue 2 and algebraic multiplicity 3

Then there is geometric multiplicity, which tells us the dimension of the eigenspace (number of independent eigenvectors).

The key rule is that 1 geometric multiplicity algebraic multiplicity

However for symmetric matrices, the geometric multiplicity = algebraic multiplicity

Orthogonality of eigenvectors - Thm 7.27

For symmetric matrices - eigenvectors corresponding to distinct eigenvalues are orthogonal.

Recall that two vectors being orthogonal means that they essentially meet at right angles and are in completely independent directions. So movement in one direction has no component in the other. So transformations by the eigenvectors stretches space along perpendicular directions for symmetric matrices.

Diagonalisation of a Matrix - Thm 7.30

At its core, diagonalisation is trying to answer a fairly simple question: can we find a coordinate system where the linear transformation acts in the simplest possible way? i.e. just stretching along axes without rotating or shearing?

So a diagonal matrix D is a matrix s.t.

D=(λ1000λ2000λ3)D(xyz)=(λ1xλ2yλ3z)

Geometrically, each coordinate directions moves independently, and no direction interferes with another. Most matrices are not in this form because diagonalisation is about changing basis to one that the transformation works well with, and that basis being the eigenbasis

Recall that Av=λv
So applying A to v just stretches v, so all eigenvectors are directions where the matrix already behaves diagonally. So if we choose all our basis vectors to be eigenvectors, then the matrix must be diagonal in that basis.

Change of basis

Suppose that v1,...vn are all eigenvectors of A and they form a basis

Let P be a matrix with these eigenvectors as columns:

P=[v1v2vn]

Then all P does is converts coordinates in the eigenbasis into standard coordinates. So P1 converts standard coordinates into eigenbasis coordinates

Now consider taking a vector in standard coordinates of a matrix A

Writing this as a composite transformation, we have that:

A=PDP1D=P1AP

So finding a diagonal matrix just means finding perpendicular directions where the transformation acts by pure scaling.

Powers

Diagonalising also helps us computing large powers:

A=PDP1Ak=PDkP1Dk=P1AkP=(λ1k000λ2k000λ3k)

This is also why symmetric matrices also diagonalise, because they have enough linearly independent vectors

8. Orthogonal sets and Quadratic forms

Orthogonal vectors are essentially directions that don't overlap. So they're independent in the geometric sense. Think of the x and y axis for example.

Definition A set of vectors {v1,...vn}Rn is orthogonal if

vivj=0,  ij

Therefore, we can see that orthogonal vectors are automatically linearly independent, this will help us compute projections, eigenvectors, and simplify quadratic forms

Definition An orthonormal set is the same as orthogonal, but now each vector has a unit length, i.e. we have normalised the vectors

So a set is orthonormal if:

vivj=0,  ij and ||vi||=1  i

Orthonormal is more useful than just orthogonal because the coordinate along vi=xvi
So we just have to deal with dot products instead of worrying about scaling

Orthogonal matrices

Intuitively, an orthogonal matrix (thinking in terms of transformations):

So the lengths and angles are preserved

Definition
A matrix PRn×n is orthogonal if:

PTP=I

Then the following are all equivalent:

The reason for why PTP=I is because:

Gram-Schmidt Orthogonalisation

Before diving into the algorithm, it's good to think about the problem this process is trying to solve.

Consider vectors spanning a space, most of them are messy and angled, and what we want is the same space, but having all directions be perpendicular.

So the intuition behind Gram-Schmidt is given the vectors v1...vn:

So we remove the overlap of vectors step by step without changing the span.

I will not be covering the proof as that is given in the lecture notes, and my notes are just to cover a bit of the intuition that I find isn't in the notes, but it's also worth watching some YT videos to understand the intuition behind it. Tom Crawford has a nice video on it.

Orthonormal basis

A basis, as talked about briefly earlier, is just a coordinate system. Recall that for a set of vectors to form a basis of Rn, the vectors must:

With that, an orthogonal basis is just a basis where all directions are perpendicular

Now for an important Theorem: Every subspace of Rn has an orthonormal basis

What this means is that no matter how tilted or messy as subspace is:

This isn't obvious, but Gram-Schmidt process constructs it

Diagonalising Orthogonal Symmetric Matrices

Recall from previous chapter that diagonalisation is simply finding a coordinate system where the matrix acts simply.

For an orthogonal symmetric matrix, A=AT:

Symmetric matrices also respect the dot product.

Quadratic Forms

Definition

A quadratic form is a function:

Q(x)=xTAx  xRn

They are used to measure curvature and distance-like behaviour, so they describe shapes instead of transformations.

So every quadratic form becomes a sum of squares

Michael Penn has a good video on quadratic forms under the Number Theory playlist. Most of these notes, especially for the final two chapters were completed during the holidays, so the priority is given to doing past papers and building sufficient intuition instead of copying proofs into my own notes.

Definiteness

When working with quadratic forms, definiteness is a property of the sign of xTAx

We want to think of xTAx as a function that takes a vector x and output a single number, and definiteness is about whether the sign of that number is consistent.

There are 4 kinds of definite. So we let A be a symmetric matrix and define:

Positive/Negative definite

xTAx>0  x0 or xTAx<0  x0

This means that no matter which direction we go in, it is strictly positive/negative

Positive/Negative semi-definite

xTAx0  x0 or xTAx0  x0

Never positive/negative, but can be zero is some directions

Indefinite

xTAx can be positive for some x, negative for others

So the sign depends on the direction

This is also nice reason for why diagonalisation exists:

For a symmetric matrix, we have:

A=PDPTxTAx=λ1y12++λnyn2

where:

Now, yi20, so the sign of each term comes from the λi, so eigenvalues control definiteness

The lectures skipped the section on conic sections as that will not be assessed, and that concludes Linear Algebra I