Linear Algebra

These notes are taken along with my review on linear algebra by MIT’s online course. As I began my journey on Machine Learning, the problem of my unsturdy understanding of Linear Algbera and Matrices arised. Many of important concepts in Machine Learning require a solid comprehension of these knowledge, like column spaces, eigenvector and etc. There can be typos and misclear explanation. If you find any typos or want to contributeThe notes are still under construction!, please let me know by sending an email to mairuizhen1998@gmail.com including a [linear algebra] in the subject, or just submit a pull request with your fixes to the GitHub repository I really hope the notes can help out more people who find linear algbera boring or hard to grasp. ⊕ Thanks for Stanford’s CS228 notes on probabilistic models. They are very concise and insightful. And thanks for the modification on Tufte’s template. If the usage of this template violates the copyright, please let me know and I will delete this template.

MIT 18.06
Youtube Video Playlist

Preliminaries

Vector and Matrix

I: Ax = B and Four Subspaces

The Geometry of Linear Equations: A major application of linear algebra is to solving systems of linear equations. This lecture presents three ways of thinking about these systems. The “row method” focuses on the individual equations, the “column method” focuses on combining the columns, and the “matrix method” is an even more compact and powerful way of describing systems of linear equations.These introductory decsriptions are copied from the course page
Elimination with Matrices: This session introduces the method of elimination, an essential tool for working with matrices. The method follows a simple algorithm. To help make sense of material presented later, we describe this algorithm in terms of matrix multiplication.
Multiplication and inverse matrices: This lecture looks at matrix multiplication from five different points of view. We then learn how to find the inverse of a matrix using elimination, and why the Gauss-Jordan method works.
Factorization into A = LU: This session explains inverses, transposes and permutation matrices. We also learn how elimination leads to a useful factorization A = LU and how hard a computer will work to invert a very large matrix.
Transposes, Permutations, Vector Spaces: To account for row exchanges in Gaussian elimination, we include a permutation matrix P in the factorization PA = LU. Then we learn about vector spaces and subspaces; these are central to linear algebra.
Column Space and Nullspace: The column space of a matrix A tells us when the equation Ax = b will have a solution x. The null space of A tells us which values of x solve the equation Ax = 0.
Solving Ax = 0: Pivot Variables, Special Solutions: We apply the method of elimination to all matrices, invertible or not. Counting the pivots gives us the rank of the matrix. Further simplifying the matrix puts it in reduced row echelon form R and improves our description of the null space.
Solving Ax = b: Row Reduced Form R: We describe all solutions to Ax = b based on the free variables and special solutions encoded in the reduced form R.
Independence, Basis and Dimension: A basis is a set of vectors, as few as possible, whose combinations produce all vectors in the space. The number of basis vectors for a space equals the dimension of that space.
The Four Fundamental Subspaces: For some vectors b the equation Ax = b has solutions and for others it does not. Some vectors x are solutions to the equation Ax = 0 and some are not. To understand these equations we study the column space, nullspace, row space and left nullspace of the matrix A.
Matrix Spaces; Rank 1 ~~; Small World Graphs~~: As we learned last session, vectors don’t have to be lists of numbers. In this session we explore important new vector spaces while practicing the skills we learned in the old ones. Then we begin the application of matrices to the study of networks.
Graphs, Networks, Incidence Matrices:This session explores the linear algebra of electrical networks and the Internet, and sheds light on important results in graph theory.
Review: The video goes through the review question thoroughly. I will write down a few that I think the prof. did not explain quite clearly and those that might be insightful

II: Least Squares, Determinants and Eigenvalues
Orthogonal Vectors and Subspaces: Vectors are easier to understand when they’re described in terms of orthogonal bases. In addition, the Four Fundamental Subspaces are orthogonal to each other in pairs. If A is a rectangular matrix, Ax = b is often unsolvable. The matrix ATA will help us find a vector x̂ that comes as close as possible to solving Ax = b.
Projections onto Subspaces: We often want to find the line (or plane, or hyperplane) that best fits our data. This amounts to finding the best possible approximation to some unsolvable system of linear equations Ax = b. The algebra of finding these best fit solutions begins with the projection of a vector onto a subspace
Projection Matrices and Least Squares: Linear regression is commonly used to fit a line to a collection of data. The method of least squares can be viewed as finding the projection of a vector. Linear algebra provides a powerful and efficient description of linear regression in terms of the matrix ATA.
Orthogonal Matrices and Gram-Schmidt: Many calculations become simpler when performed using orthonormal vectors or othogonal matrices. In this session, we learn a procedure for converting any basis to an orthonormal one.
Properties of Determinants: The determinant of a matrix is a single number which encodes a lot of information about the matrix. Three simple properties completely describe the determinant. In this lecture we also list seven more properties like detAB = (detA)(detB) that can be derived from the first three.
~~Determinant Formulas and Cofactors: One way to compute the determinant is by elimination. In this lecture we derive two related formulas for the determinant using the properties from last lecture.~~
Cramer’s Rule, Inverse Matrix and Volume: Now we start to use the determinant. Understanding the cofactor formula allows us to show that A-1 = (1/detA)CT, where C is the matrix of cofactors of A. Combining this formula with the equation x = A-1b gives us Cramer’s rule for solving Ax = b. Also, the absolute value of the determinant gives the volume of a box.
Eigenvalues and Eigenvectors: If the product Ax points in the same direction as the vector x, we say that x is an eigenvector of A. Eigenvalues and eigenvectors describe what happens when a matrix is multiplied by a vector. In this session we learn how to find the eigenvalues and eigenvectors of a matrix.
Diagonalization and Powers of A: If A has n independent eigenvectors, we can write A = SΛS−1, where Λ is a diagonal matrix containing the eigenvalues of A. This allows us to easily compute powers of A which in turn allows us to solve difference equations uk+1 = Auk.
Differential Equations and exp(At): We can copy Taylor’s series for ex to define eAt for a matrix A. If A is diagonalizable, we can use Λ to find the exact value of eAt. This allows us to solve systems of differential equations du / dt = Au the same way we solved equations like dy / dt = ky.
Markov Matrices; Fourier Series: Like differential equations, Markov matrices describe changes over time. Once again, the eigenvalues and eigenvectors describe the long term behavior of the system. In this session we also learn about Fourier series, which describe periodic functions as points in an infinite dimensional vector space.

III: POSITIVE DEFINITE MATRICES AND APPLICATIONS
Symmetric Matrices and Positive Definiteness
~~Complex Matrices; Fast Fourier Transform (FFT)~~
Positive Definite Matrices and Minima
Similar Matrices ~~and Jordan Form~~
Singular Value Decomposition
Linear Transformations
Chang of Basis
Left and Right Inverses; Pesudoinverse

Linear Algebra

Preliminaries

I: Ax = B and Four Subspaces

II: Least Squares, Determinants and Eigenvalues

III: POSITIVE DEFINITE MATRICES AND APPLICATIONS