mt1173 Vle PDF
mt1173 Vle PDF
mt1173 Vle PDF
M. Anthony, M. Harvey
MT1173
2015
Undergraduate study in
Economics, Management,
Finance and the Social Sciences
This subject guide is for a 100 course offered as part of the University of London
International Programmes in Economics, Management, Finance and the Social Sciences.
This is equivalent to Level 4 within the Framework for Higher Education Qualifications in
England, Wales and Northern Ireland (FHEQ).
For more information about the University of London International Programmes
undergraduate study in Economics, Management, Finance and the Social Sciences, see:
www.londoninternational.ac.uk
This guide was prepared for the University of London International Programmes by:
M. Anthony, Professor of Mathematics, Department of Mathematics, London School of
Economics and Political Science.
M. Harvey, Course Leader, Department of Mathematics, London School of Economics and
Political Science.
This is one of a series of subject guides published by the University. We regret that due
to pressure of work the authors are unable to enter into any correspondence relating to,
or arising from, the guide. If you have any comments on this subject guide, favourable or
unfavourable, please use the form at the back of this guide.
The University of London asserts copyright over all material in this subject guide except where
otherwise indicated. All rights reserved. No part of this work may be reproduced in any form,
or by any means, without permission in writing from the publisher. We make every effort to
respect copyright. If you think we have inadvertently used your copyright material, please let
us know.
Contents
Contents
Preface 1
1 Introduction 3
1.1 This subject . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
1.1.1 Aims of the course . . . . . . . . . . . . . . . . . . . . . . . . . . 3
1.1.2 Learning outcomes . . . . . . . . . . . . . . . . . . . . . . . . . . 3
1.1.3 Route map to the guide . . . . . . . . . . . . . . . . . . . . . . . 4
1.2 Reading . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
1.2.1 Essential reading . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
1.2.2 Further reading . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
1.3 Online study resources . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
1.3.1 The VLE . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
1.3.2 Making use of the Online Library . . . . . . . . . . . . . . . . . . 7
1.4 Using the guide . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
1.5 Examination . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
1.6 The use of calculators . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
2 Preliminaries 9
Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
Aims . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
Reading . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
Synopsis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
2.1 Some basic set theory . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
2.1.1 Sets . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
2.1.2 Subsets . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
2.1.3 Union and intersection . . . . . . . . . . . . . . . . . . . . . . . . 11
2.1.4 Showing two sets are equal . . . . . . . . . . . . . . . . . . . . . . 11
2.2 Numbers, algebra and equations . . . . . . . . . . . . . . . . . . . . . . . 11
2.2.1 Numbers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
2.2.2 Basic notations . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12
2.2.3 Simple algebra . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
i
Contents
2.2.4 Powers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14
2.2.5 Quadratic equations . . . . . . . . . . . . . . . . . . . . . . . . . 14
2.2.6 Polynomial equations . . . . . . . . . . . . . . . . . . . . . . . . . 16
2.3 Mathematical statements and proof . . . . . . . . . . . . . . . . . . . . . 16
2.4 Introduction to proving statements . . . . . . . . . . . . . . . . . . . . . 18
2.5 Some basic logic . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21
2.5.1 Negation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21
2.5.2 Conjunction and disjunction . . . . . . . . . . . . . . . . . . . . . 22
2.6 Implications, converse and contrapositive . . . . . . . . . . . . . . . . . . 24
2.6.1 ‘If-then’ statements . . . . . . . . . . . . . . . . . . . . . . . . . . 24
2.6.2 Converse statements . . . . . . . . . . . . . . . . . . . . . . . . . 25
2.6.3 Contrapositive statements . . . . . . . . . . . . . . . . . . . . . . 26
2.7 Proof by contradiction . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26
2.8 Some terminology . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27
Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27
Learning outcomes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27
Test your knowledge and understanding . . . . . . . . . . . . . . . . . . . . . 28
Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28
Feedback on selected activities . . . . . . . . . . . . . . . . . . . . . . . . . . . 28
Comments on exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29
3 Matrices 31
Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31
Aims . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31
Essential reading . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31
Further reading . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31
Synopsis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32
3.1 Definitions and terminology . . . . . . . . . . . . . . . . . . . . . . . . . 32
3.2 Matrix addition and scalar multiplication . . . . . . . . . . . . . . . . . . 33
3.3 Matrix multiplication . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34
3.4 Matrix algebra . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35
3.5 Matrix inverses . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38
3.5.1 Definition of a matrix inverse . . . . . . . . . . . . . . . . . . . . 38
3.5.2 Properties of the inverse . . . . . . . . . . . . . . . . . . . . . . . 40
3.6 Powers of a matrix . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41
3.7 Transpose . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42
ii
Contents
4 Vectors 49
Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 49
Aims . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 49
Essential reading . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 49
Further reading . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 49
Synopsis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 50
4.1 Vectors in Rn . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 50
4.1.1 Definition of vector and Euclidean space . . . . . . . . . . . . . . 50
4.1.2 The inner product of two vectors . . . . . . . . . . . . . . . . . . 51
4.1.3 Vectors and matrices . . . . . . . . . . . . . . . . . . . . . . . . . 53
4.2 Developing geometric insight – R2 and R3 . . . . . . . . . . . . . . . . . 53
4.2.1 Vectors in R2 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 54
4.2.2 Inner product . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 56
4.2.3 Vectors in R3 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 58
4.3 Lines . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 59
4.3.1 Lines in R2 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 59
4.3.2 Lines in R3 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 62
4.4 Planes in R3 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 65
4.5 Lines and hyperplanes in Rn . . . . . . . . . . . . . . . . . . . . . . . . . 71
4.5.1 Vectors and lines in Rn . . . . . . . . . . . . . . . . . . . . . . . . 71
4.5.2 Hyperplanes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 71
Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 72
Test your knowledge and understanding . . . . . . . . . . . . . . . . . . . . . 72
Comments on selected activities . . . . . . . . . . . . . . . . . . . . . . . . . . 72
iii
Contents
Aims . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 77
Essential reading . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 77
Further reading . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 77
Synopsis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 78
5.1 Systems of linear equations . . . . . . . . . . . . . . . . . . . . . . . . . . 78
5.2 Row operations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 79
5.3 Gaussian elimination . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 81
5.3.1 The algorithm — reduced row echelon form . . . . . . . . . . . . 81
5.3.2 Consistent and inconsistent systems . . . . . . . . . . . . . . . . . 85
5.3.3 Linear systems with free variables . . . . . . . . . . . . . . . . . . 86
5.3.4 Solution sets . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 89
Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 89
Learning outcomes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 90
Test your knowledge and understanding . . . . . . . . . . . . . . . . . . . . . 90
Comments on selected activities . . . . . . . . . . . . . . . . . . . . . . . . . . 90
iv
Contents
Synopsis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 106
7.1 Elementary matrices . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 106
7.2 Row equivalence . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 109
7.3 The main theorem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 109
7.4 Using row operations to find the inverse matrix . . . . . . . . . . . . . . 110
7.5 Verifying an inverse . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 112
Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 112
Learning outcomes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 112
Test your knowledge and understanding . . . . . . . . . . . . . . . . . . . . . 113
Exercise . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 113
Comments on selected activities . . . . . . . . . . . . . . . . . . . . . . . . . . 113
Comment on exercise . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 114
8 Determinants 115
Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 115
Aims . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 115
Essential reading . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 115
Further reading . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 115
Synopsis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 116
8.1 Determinants . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 116
8.2 Results on determinants . . . . . . . . . . . . . . . . . . . . . . . . . . . 119
8.2.1 Determinant using row operations . . . . . . . . . . . . . . . . . . 119
8.2.2 The determinant of a product . . . . . . . . . . . . . . . . . . . . 123
8.3 Matrix inverse using cofactors . . . . . . . . . . . . . . . . . . . . . . . . 123
8.4 Cramer’s rule . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 126
Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 127
Learning outcomes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 128
Test your knowledge and understanding . . . . . . . . . . . . . . . . . . . . . 128
Comments on selected activities . . . . . . . . . . . . . . . . . . . . . . . . . . 129
v
Contents
vi
Contents
Reading . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 157
Synopsis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 157
11.1 First-order difference equations . . . . . . . . . . . . . . . . . . . . . . . 158
11.2 Solving first-order difference equations . . . . . . . . . . . . . . . . . . . 159
11.3 Long-term behaviour of solutions . . . . . . . . . . . . . . . . . . . . . . 161
11.4 The cobweb model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 162
11.5 Financial applications of first-order difference equations . . . . . . . . . . 163
11.6 Homogeneous second-order difference equations . . . . . . . . . . . . . . 164
11.7 Non-homogeneous second-order equations . . . . . . . . . . . . . . . . . . 168
11.8 Behaviour of solutions . . . . . . . . . . . . . . . . . . . . . . . . . . . . 169
11.9 Economic applications of second-order difference equations . . . . . . . . 169
Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 171
Learning outcomes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 171
Test your knowledge and understanding . . . . . . . . . . . . . . . . . . . . . 171
Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 171
Comments on selected activities . . . . . . . . . . . . . . . . . . . . . . . . . . 173
Comments on exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 174
vii
Contents
viii
Contents
17 Diagonalisation 233
Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 233
Aims . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 233
ix
Contents
x
Preface
This subject guide is not a course text. It sets out a logical sequence in which to study
the topics in this subject. The subject guide alone is not sufficient to gain a thorough
understanding of the subject; you will need to do the essential reading recommended for
each chapter and any further reading you find helpful.
We are very grateful to James Ward and Keith Martin for their careful readings of an
earlier edition of this guide and for their many helpful comments.
1
Preface
2
1
Chapter 1
Introduction
In this very brief introduction, we aim to give you an idea of the nature of this subject
and to advise on how best to approach it. We give general information about the
contents and use of this subject guide, and on recommended reading and how to use the
textbooks.
to enable students to acquire skills in the methods of algebra, as required for their
use in further mathematics subjects and economics-based subjects
to prepare students for further courses in mathematics and/or related disciplines.
As emphasised above, however, we do also want you to understand why certain
methods work: this is one of the ‘skills’ that you should aim to acquire. The examination
will test not simply your ability to perform routine calculations, but will probe your
knowledge and understanding of the fundamental principles underlying the area.
3
1
1. Introduction
used the concepts, terminology, methods and conventions covered in the course to
solve mathematical problems in this subject
seen how algebra can be used to solve problems in economics and related subjects
4
1
1.2. Reading
1.2 Reading
R
The guide closely follows the following textbook (which is also available as an e-book):
Anthony, M. and M. Harvey. Linear Algebra: Concepts and Methods. (Cambridge
University Press, 2012) [ISBN 9780521279482].
You will need a copy of this book. The guide repeats some of the content of the book,
but the book has much more extensive discussion and explanation, which will help you
understand better. The guide will make frequent references to the book, asking you to
read various sections of it as you work through the guide. We will often abbreviate it, in
such references, to ‘A-H’. The guide contains some exercises for you to attempt, but the
main source of exercises for most chapters will be the textbook (which contains full
solutions to its exercises). We will also regularly ask you to attempt some Problems
from the book, and discussion of how to approach these will be found on the VLE.
The textbook by Anthony and Harvey covers almost all the material needed for this
course. (It also contains much more, in Chapters 10 to 13, that is not needed for this
course but which will be useful if you continue the study of linear algebra after this
course). There is one reasonably small exception: the book does not cover Sequences,
Series and Difference Equations. For that, we recommend the following textbook (also
R
available as an e-book):
Anthony, M. and N. Biggs. Mathematics for Economics and Finance: Methods
and Modelling. (Cambridge: Cambridge University Press, 1996) [ISBN
9780521551137 (hardback); 9780521559138 (paperback)].
R
many, many others will be useful too.
Anton, H. and C. Rorres. Elementary Linear Algebra with Supplemental
5
1
1. Introduction
Applications (International Student Version). (John Wiley & Sons (Asia) Plc Ltd,
Self-testing activities: Doing these allows you to test your own understanding of
subject material.
Electronic study materials: The printed materials that you receive from the
University of London are available to download, including updated reading lists
and references.
Past examination papers and Examiners’ commentaries: These provide advice on
how each examination question might best be answered.
A student discussion forum: This is an open space for you to discuss interests and
experiences, seek support from your peers, work collaboratively to solve problems
and discuss subject material.
Videos: There are recorded academic introductions to the subject, interviews and
debates and, for some courses, audio-visual tutorials and conclusions.
Recorded lectures: For some courses, where appropriate, the sessions from previous
years’ Study Weekends have been recorded and made available.
1
There are many editions and variants of this book, such as the ‘Applications version’. Any one is
equally useful and you will not need more than one of them. You can find the relevant sections cited in
this guide in any edition by using the index.
2
Any earlier edition of this text will also be useful.
6
1
1.4. Using the guide
Study skills: Expert advice on preparing for examinations and developing your
digital literacy skills.
Feedback forms.
Some of these resources are available for certain courses only, but we are expanding our
provision all the time and you should check the VLE regularly for updates.
1.5 Examination
Important: the information and advice given here are based on the examination
structure used at the time this guide was written. Please note that subject guides may
be used for several years. Because of this we strongly advise you to always check both
the current Regulations for relevant information about the examination, and the virtual
learning environment (VLE) where you should be advised of any forthcoming changes.
7
1
1. Introduction
You should also carefully check the rubric/instructions on the paper you actually sit
and follow those instructions. Remember, it is important to check the VLE for:
where available, past examination papers and Examiners’ commentaries for the
course which give advice on how each question might best be answered.
There are no optional topics in this subject: you should do them all. This is reflected in
the structure of the examination paper. There are five questions (each worth 20 marks)
and all questions are compulsory.
Please do not think that the questions in a real examination will necessarily be very
similar to those in the Sample examination paper. An examination is designed (by
definition) to test you. You will get examination questions unlike questions in this
guide. The whole point of examining is to see whether you can apply knowledge in
familiar and unfamiliar settings. The Examiners (nice people though they are) have an
obligation to surprise you! For this reason, it is important that you try as many
examples as possible, from the guide and from the textbooks. This is not so that you
can cover any possible type of question the Examiners can think of! It’s so that you get
used to confronting unfamiliar questions, grappling with them, and finally coming up
with the solution.
Do not panic if you cannot completely solve an examination question. There are many
marks to be awarded for using the correct approach or method.
8
2
Chapter 2
Preliminaries
Introduction
Before we embark on the main material, we need quickly to point out some basic
mathematical skills that you need, and to introduce some concepts of mathematical
proof. Do not worry if you do not feel comfortable of familiar with this material. You
should not let it detain you too long: proceed on with the rest of the guide, and you will
pick up what you need as you go.
Aims
This chapter has two aims:
Briefly discuss some basic mathematics and mathematical notation you probably
already know.
Reading
For the first part of the chapter (concerning sets, notation and functions), you might
well find that you have studied most of these topics in previous mathematics courses
and that nearly all of the material is revision. But don’t worry if a topic is new to you.
We will mention the main results which you will need to know. If you are unfamiliar
with a topic, or if you find any of the topics difficult, then you should look up that topic
in any basic mathematics text. There are many textbooks you could consult. For
example, Chapter 2 of the Anthony and Biggs book might be useful.
For the material on proof, the discussion is relatively self-contained, so no additional
reading should be required.
R
(For full publication details, see Chapter 1.)
9
2. Preliminaries
2
Synopsis
We discuss the basic notation and ideas associated with sets. We then look at the
standard number systems and some associated important notation. Algebraic
manipulation is an important skill you should possess, and we review that briefly, and
look at how to solve quadratic and simple polynomial equations. Then we gently start
to explore mathematical proof. A proof is a way of establishing that a mathematical
statement is true and you will encounter proofs at various points in your study of this
subject. We give some examples of different proof methods and explore some of the
underlying mathematical logic.
2.1.1 Sets
A set may be thought of as a collection of objects. 1 A set is usually described by
listing or describing its members inside curly brackets. For example, when we write
A = {1, 2, 3}, we mean that the objects belonging to the set A are the numbers 1, 2, 3
(or, equivalently, the set A consists of the numbers 1, 2 and 3). Equally (and this is
what we mean by ‘describing’ its members), this set could have been written as
Here, the symbol | stands for ‘such that’. Often, the symbol ‘:’ is used instead, so that
we might write
A = {n : n is a whole number and 1 ≤ n ≤ 3}.
has as its members all of you (and nothing else). When x is an object in a set A, we
write x ∈ A and say ‘x belongs to A’ or ‘x is a member of A’.
The set which has no members is called the empty set and is denoted by ∅. The empty
set may seem like a strange concept, but it has its uses.
2.1.2 Subsets
We say that the set S is a subset of the set T , and we write S ⊆ T , or S ⊂ T , if every
member of S is a member of T . For example, {1, 2, 5} ⊆ {1, 2, 4, 5, 6, 40}. The difference
between the two symbols is that S ⊂ T literally means that S is a proper subset of T ,
meaning not all of T , and S ⊆ T means that S is a subset of T and possibly (but not
necessarily) all of T . So in the example just given we could have also written,
{1, 2, 5} ⊂ {1, 2, 4, 5, 6, 40}.
1
See Anthony and Biggs, Section 2.1.
10
2.2. Numbers, algebra and equations
2
2.1.3 Union and intersection
Given two sets A and B, the union A ∪ B is the set whose members belong to A or B
(or both A and B): that is,
A ∪ B = {x | x ∈ A or x ∈ B}.
2.2.1 Numbers
There are some standard notations for important sets of numbers. 3 The set R of real
numbers, may be thought of as the points on a line. Each such number can be
described by a decimal representation.
The set of real numbers R includes the following subsets.
N, the set of natural numbers: N = {1, 2, 3, . . . }, also referred to as the positive
integers.
Z, the set of integers: {. . . , −3, −2, −1, 0, 1, 2, 3, . . .}.
p 2 9 4
Q, the set of rational numbers: with p, q ∈ Z, q 6= 0; for example, , − , = 4.
q 5 2 1
The
√ set of irrational numbers, that is, real numbers which are not rational; for example,
2, π.
These sets are related by: N ⊂ Z ⊂ Q ⊂ R.
Given two real numbers a and b, we define intervals such as,
11
2. Preliminaries
2
and combinations of these. For example, [a, b) = {x| a ≤ x < b}. The numbers a and b
are called the endpoints of the interval. You should notice that when a square bracket,
‘[’ or ‘]’, is used to denote an interval, the number beside the bracket is included in the
interval, whereas if a round bracket, ‘(’ or ‘)’, is used, the adjacent number is not in the
interval. For example, [2, 3] contains the number 2, but (2, 3] does not. We can also
indicate unbounded intervals, such as
The symbol ∞ means ‘infinity’, but it is not a real number, merely a notational
convenience.
The absolute value of a real number a is defined by:
a if a ≥ 0
|a| = .
−a if a ≤ 0
So the absolute value of a equals a if a is non-negative (that is, if a ≥ 0), and equals −a
otherwise. For instance, |6| = 6 and | − 2.5| = 2.5. (This is sometimes called the
modulus of a). Roughly speaking, the absolute value of a number is obtained just by
ignoring any minus sign the number has. Note that
√
a2 = |a|,
√
since by x we always mean the positive square root to avoid ambiguity. So the √ two
solutions of the equation x2 = 4, are x = ±2 (meaning x = 2 or x = −2), but 4 = 2.
The absolute value of real numbers satisfies the following inequality,
|a + b| ≤ |a| + |b|, a, b ∈ R.
x1 + x2 + · · · + xn
4
You may consult any of a large number of basic maths texts for further information on basic notations.
12
2.2. Numbers, algebra and equations
2
of the numbers x1 , x2 , . . . , xn by
n
X
xi .
i=1
The ‘Σ’ indicates that numbers are being summed, and the ‘i = 1’ and n below and
above the Σ show that it is the numbers xi , as i runs from 1 to n, that are being
summed together. Sometimes we will be interested in adding up only some of the
numbers. For example,
n−1
X
xi
i=2
would denote the sum x2 + x3 + · · · + xn−1 , which is the sum of all the numbers except
the first and last.
For a positive whole number n, n! (read as n factorial) is the product of all the integers
from 1 up to n. For example, 4! = 1 · 2 · 3 · 4 = 24. By convention 0! is taken to be 1.
Finally, we often use the symbol to denote the end of a proof, where we have finished
explaining why a particular result is true. This is just to make it clear where the proof
ends and the following text begins.
(2x − 3y)(x + 4y) = 2x2 − 3xy + 8xy − 12y 2 = 2x2 + 5xy − 12y 2 .
Activity 2.2 Expand (x − 1)(x + 1). Then use this to expand (x − 1)(x + 1)(x + 2).
13
2. Preliminaries
2
2.2.4 Powers
When n is a positive integer, the nth power5 of the number a, denoted an , is simply
the product of n copies of a, that is,
an = a
| × a × a{z× · · · × a} .
n times
The number n is called the power, exponent, or index. We have the power rules (or
rules of exponents):
ar as = ar+s , (ar )s = ars ,
whenever r and s are positive integers.
Activity 2.3
Prove the power rules above using the definition of an for n ∈ N.
49x−2 4xy 2
− .
35y (2xy)3
14
2.2. Numbers, algebra and equations
2
so the equation x2 − 6x + 5 = 0 becomes (x − 1)(x − 5) = 0. Now the only way that two
numbers can multiply to give 0 is if at least one of the numbers is 0, so we can conclude
that x − 1 = 0 or x − 5 = 0; that is, the equation has two solutions, 1 and 5.
Activity 2.5
Use factorisation to find the solutions of each of these equations:
Although factorisation may be difficult, there is a general method for determining the
solutions to a quadratic equation using the quadratic formula,7 as follows. Suppose
we have the quadratic equation ax2 + bx + c = 0, where a 6= 0. Then the solutions of
this equation are:
√ √
−b − b2 − 4ac −b + b2 − 4ac
x1 = x2 = .
2a 2a
The term b2 − 4ac is called the discriminant.
If b2 − 4ac > 0, the equation has two real solutions as given above.
If b2 − 4ac = 0, the equation has exactly one solution, x = −b/(2a). (In this case we
say that this is a solution of multiplicity two.)
15
2. Preliminaries
2
2.2.6 Polynomial equations
A polynomial of degree n in x is an expression of the form,
Pn (x) = a0 + a1 x + a2 x2 + . . . + an xn
x3 − 7x + 6 = (x − 1)(x2 + λx − 6)
for some number λ, as the coefficient of x2 must be 1 for the product to give x3 , and the
constant term must be −6 so that (−1)(−6) = 6, the constant term in the cubic. It only
remains to find λ. This is accomplished by comparing the coefficients of either x2 or x in
the cubic polynomial and the product. The coefficient of x2 in the cubic is 0, and in the
product the coefficient of x2 is obtained from the terms (−1)(x2 ) + (x)(λx), so that we
must have λ − 1 = 0 or λ = 1. Then
x3 − 7x + 6 = (x − 1)(x2 + x − 6),
and the quadratic term is easily factored into (x − 2)(x + 3), that is
16
2.3. Mathematical statements and proof
2
and principles. Our discussion of this topic is limited because this course is not a course
in logic or proof, as such. What we do need is enough logic to understand what
mathematical statements mean and how we might prove or disprove them.
Consider the following statements (in which, you should recall that the natural numbers
are the positive integers):
(a) 20 is divisible by 4.
(c) 21 is divisible by 4.
(d) 21 is divisible by 3 or 5.
(f) n2 is even.
17
2. Preliminaries
2
is even if n is even, and n is even if n2 is even. Equivalently, it means that n2 is even if
n is even and that n2 is odd if n is odd. So statement (k) will be true precisely if (i) and
(j) are true.
(a) 20 is divisible by 4.
This statement is true. Yes, yes, we know it’s ‘obvious’, but stay with us. To give a
proper proof, we need first to understand exactly what the word ‘divisible’ means.
You will probably most likely think that this means that when we divide 20 by 4
we get no remainder. This is correct: in general, for natural numbers n and d, to
say that n is divisible by d (or, equivalently, that n is a multiple of d) means
precisely that there is some natural number m for which n = md. Since 20 = 5 × 4,
we see that 20 is divisible by 4. And that’s a proof! It’s utterly convincing,
watertight, and not open to debate.
(c) 21 is divisible by 4.
This is false, as can be established in a number of ways. First, we note that if the
natural number m satisfies m ≤ 5, then m × 4 will be no more than 20. And if
m ≥ 6 then m × 4 will be at least 24. Well, any natural number m is either at most
5 or at least 6 so, for all possible m, we do not have m × 4 = 21 and hence there is
no natural number m for which m × 4 = 21. In other words, 21 is not divisible by 4.
Another argument (which is perhaps more straightforward, but which relies on
properties of rational numbers rather than just simple properties of natural
numbers) is to note that 21/4 = 5.25, and this is not a natural number, so 21 is not
divisible by 4. (This second approach is the same as showing that 21 has remainder
1, not 0, when we divide by 4.)
(d) 21 is divisible by 3 or 5.
As we noted above, this is a compound statement and it will be true precisely when
one (or both) of the following statements is true:
(i) 21 is divisible by 3
(ii) 21 is divisible by 5.
Statement (i) is true, because 21 = 7 × 3. Statement (ii) is false. Because at least
one of these two statements is true, statement (d) is true.
18
2.4. Introduction to proving statements
2
(e) 50 is divisible by 2 and 5.
This is true. Again, this is a compound statement and it is true precisely if both of
the following statements are true:
(i) 50 is divisible by 2
(ii) 50 is divisible by 5.
Statements (i) and (ii) are indeed true because 50 = 25 × 2 and 50 = 10 × 5. So
statement (e) is true.
(f) n2 is even.
As mentioned above, whether this is true or false depends on the value of n. For
example, if n = 2 then n2 = 4 is even, but if n = 3 then n2 = 9 is odd. So, unlike
the other statements (which are propositions), this is a predicate P (n). The
predicate will become a proposition when we assign a particular value to n to it,
and the truth or falsity of the proposition can then be established. Statements (i),
(j) and (k) below do this comprehensively.
19
2. Preliminaries
2
which is, again, even, because (2k + 1)(k + 1) is an integer.
Right, we’re really proving things now. This is a very general statement, asserting
something about all natural numbers, and we have managed to prove it.
(h) There is a natural number n such that 2n = 2n .
This is an existential statement, asserting that there exists n with 2n = 2n . Before
diving in, let’s pause for a moment and think about how we might deal with such
statements. If an existential statement like this is true we would need only to show
that its conclusion (which in this case is 2n = 2n ) holds for some particular n. That
is, we need only find an n that works. If the statement is false, we have a lot more
work to do in order to prove that it is false. Because, to show that it is false, we
would need to show that, for no value of n does the conclusion hold. Equivalently,
for every n, the conclusion fails. So we’d need to prove a universal statement and,
as we saw in the previous example, that would require us to come up with a
suitably general argument.
In fact, this statement is true. This is because when n = 1 we have
2n = 2 = 21 = 2n .
(i) If n is even, then n2 is even.
This is true. The most straightforward way to prove this is to assume that n is
some (that is, any) even number and then show that n2 is even. So suppose n is
even. Then n = 2k for some integer k and hence n2 = (2k)2 = 4k 2 . This is even
because it is 2(2k 2 ) and 2k 2 is an integer.
(j) For all odd numbers n, n2 is odd.
This is true. The most straightforward way to prove this is to assume that n is any
odd number and then show that n2 is also odd. So suppose n is odd. Then
n = 2k + 1 for some integer k and hence n2 = (2k + 1)2 = 4k 2 + 4k + 1. To establish
that this is odd, we need to show that it can be written in the form 2K + 1 for
some integer K. Well, 4k 2 + 4k + 1 = 2(2k 2 + 2k) + 1. This is indeed of the form
2K + 1, where K is the integer 2k 2 + 2k. Hence n2 is odd.
Another possible way to prove this result is to prove that if n2 is even then n must
be even. We won’t do that, but let’s think about why it would be a possible
strategy. Suppose we were able to prove the following statement, which we’ll call Q:
Q: If n2 is even then n is even.
Why would that establish what we want (namely that if n is odd then n2 is odd)?
Suppose we have proved statement Q and suppose that n is odd. Then it must be
the case that n2 is odd. For, if n2 was not odd, it would be even and then Q would
tell us that this means n is even. But we have assumed n is odd. It cannot be both
even and odd, so we have reached a contradiction. By assuming that the opposite
conclusion holds (n2 even) we have shown that something impossible happens. This
type of argument is known as a proof by contradiction and it is often very
powerful. We will see more about this later.
(k) For natural numbers n, n2 is even if and only if n is even.
This is true. What we have shown in proving (i) and (j) is that if n is even then n2
is even, and if n is odd then n2 is odd. The first, (statement (i)) establishes that if
20
2.5. Some basic logic
2
n is even, then n2 is even. The second of these (statement (j)) establishes that n2 is
even only if n is even. This is because it shows that n2 is odd if n is odd, from
which it follows that if n2 is even, n must not have been odd, and therefore must
have been even. ‘If and only if’ statements of this type are very important. As we
see here, the proof of such statements breaks down into the proof of two ‘If-then’
statements.
These examples hopefully demonstrate that there are a wide range of statements and
proof techniques, and in the rest of this chapter we will explore these further.
Right now, one thing we hope comes out very clearly from these examples is that to
prove a mathematical statement, you need to know precisely what it means. Well, that
sounds obvious, but you can see how detailed we had to be about the meanings (that is,
the definitions) of the terms ‘divisible’, ‘even’ and ‘odd’. Definitions are very important.
2.5.1 Negation
The simplest way to take a statement and form another statement is to negate the
statement. The negation of a statement P is the statement ¬P (sometimes just
denoted ‘not P ’), which is defined to be true exactly when P is false. This can be
described in the very simple truth table, Table 2.1:
P ¬P
T F
F T
What does the table signify? Quite simply, it tells us that if P is true then ¬P is false
and if P is false then ¬P is true.
It has, we hope, been indicated in the examples earlier in this chapter, that to disprove
a universal statement about natural numbers amounts to proving an existential
statement. That is, if we want to disprove a statement of the form ‘for all natural
numbers n, property p(n) holds’ (where p(n) is some predicate, such as ‘n2 is even’) we
21
2. Preliminaries
2
need only produce some N for which p(N ) fails. Such an N is called a
counterexample. Equally, to disprove an existential statement of the form ‘there is
some n such that property p(n) holds’, one would have to show that for every n, p(n)
fails. That is, to disprove an existential statement amounts to proving a universal one.
But, now that we have the notion of the negation of a statement we can phrase this a
little more formally. Proving that a statement P is false is equivalent to proving that
the negation ¬P is true. In the language of logic, therefore, we have the following:
The negation of the universal statement ‘for all n, property p(n) holds’ is the
existential statement ‘there is n such that property p(n) does not hold’.
The negation of the existential statement ‘there is n such that property p(n) holds’
is the universal statement ‘for all n, property p(n) does not hold’.
We could be a little more formal about this, by defining the negation of a predicate p(n)
(which, recall, only has a definitive true or false value once n is specified) to be the
predicate ¬p(n) which is true (for any particular n) precisely when p(n) is false. Then
we might say that
The negation of the universal statement ‘for all n, p(n) is true’ is the existential
statement ‘there is n such that ¬p(n) is true’.
The negation of the existential statement ‘there is n such that p(n) is true’ is the
universal statement ‘for all n, ¬p(n) is true’.
Now, let’s not get confused here. None of this is really difficult or new. We meet such
logic in everyday life. If we say ‘It rains every day in London’ then either this statement
is true or it is false. If it is false, it is because on (at least) one day it does not rain. The
negation (or disproof) of the statement ‘On every day, it rains in London’ is simply
‘There is a day on which it does not rain in London’. The former is a universal
statement (‘On every day, . . .’) and the latter is an existential statement (‘there is . . .’).
Or, consider the statement ‘There is a student who enjoys reading this guide’. This is
an existential statement (‘There is . . .’). This is false if ‘No student enjoys reading this
guide’. Another way of phrasing this last statement is ‘Every student reading this guide
does not enjoy it’. This is a more awkward expression, but it emphasises that the
negation of the initial, existential statement, is a universal one (‘Every student . . .’).
We hope these examples illustrate the point that much of logic is simple common sense.
22
2.5. Some basic logic
2
‘50 is divisible by 2 and 5’
is the conjunction of the two statements
‘50 is divisible by 2’
‘50 is divisible by 5’
Statement (e) is true because both of these two statements are true. Table 2.2 gives the
truth table for the conjunction P and Q.
P Q P ∧Q
T T T
T F F
F T F
F F F
What Table 2.2 says is simply that P ∧ Q is true precisely when both P and Q are true
(and in no other circumstances).
Suppose that P and Q are two mathematical statements. Then ‘P or Q’, also denoted
P ∨ Q, and called the disjunction of P and Q, is the statement that is true precisely
when P , or Q, or both, are true. For example, statement (d) above, which is
‘21 is divisible by 3 or 5’
is the disjunction of the two statements
‘21 is divisible by 3’
‘21 is divisible by 5’
Statement (d) is true because at least one (namely the first) of these two statements is
true.
Note one important thing about the mathematical interpretation of the word ‘or’. It is
always used in the ‘inclusive-or’ sense. So P ∨ Q is true in the case when P is true, or Q
is true, or both. In some ways, this use of the word ‘or’ contrasts with its use in normal
everyday language, where it is often used to specify a choice between mutually exclusive
alternatives. (For example ‘You’re either with us or against us’.) But if someone said
‘Tomorrow I will wear brown trousers or I will wear a yellow shirt’ then, in the
mathematical way in which the word ‘or’ is used, the statement would be true if they
wore brown trousers and any shirt, any trousers and a yellow shirt, and also if they
wore brown trousers and a yellow shirt. You might have your doubts about their dress
sense in this last case, but, logically, it makes the statement true.
Table 2.3 gives the truth table for the disjunction P and Q.
P Q P ∨Q
T T T
T F T
F T T
F F F
23
2. Preliminaries
2
What Table 2.3 says is simply that P ∨ Q is true precisely when at least one of P and Q
is true.
P Q P =⇒ Q
T T T
T F F
F T T
F F T
Note that the statement P =⇒ Q is false only when P is true but Q is false. (To go
back to the previous example, the statement ‘If it rains, I wear a raincoat’ is false
precisely if it does rain but they do not wear a raincoat.) This is tricky, so you may
have to spend a little time understanding it. As we’ve suggested, perhaps the easiest
way is to think about when a statement ‘if P , then Q’ is false.
The statement P =⇒ Q can also be written as Q ⇐= P . There are different ways of
describing P =⇒ Q, such as:
if P then Q
P implies Q
P is sufficient for Q
Q if P
P only if Q
24
2.6. Implications, converse and contrapositive
2
Q whenever P
Q is necessary for P .
All these mean the same thing. The first two are the ones we will use most frequently.
If P =⇒ Q and Q =⇒ P then this means that Q will be true precisely when P is. That
is Q is true if and only if P is. We use the single piece of notation P ⇐⇒ Q instead of
the two separate P =⇒ Q and Q ⇐= P . There are several phrases for describing what
P ⇐⇒ Q means, such as:
P is equivalent to Q
P Q P =⇒ Q Q =⇒ P P ⇐⇒ Q
T T T T T
T F F T F
F T T F F
F F T T T
What the table shows is that P ⇐⇒ Q is true precisely when P and Q are either both
true or both false.
Activity 2.6 What is the converse of the statement ‘if the natural number n
divides 4 then n divides 12’ ? Is the converse true? Is the original statement true?
25
2. Preliminaries
2
2.6.3 Contrapositive statements
The contrapositive of an implication P =⇒ Q is the statement ¬Q =⇒ ¬P . The
contrapositive is equivalent to the implication, as Table 2.6 shows. (The columns
highlighted in bold are identical.)
P Q P =⇒ Q ¬P ¬Q ¬Q =⇒ ¬P
T T T F F T
T F F F T F
F T T T F T
F F T T T T
If you think about it, the equivalence of the implication and its contrapositive makes
sense. Because, ¬Q =⇒ ¬P says that if Q is false, P is false also. So, it tells us that we
cannot have Q false and P true, which is precisely the same information as is given by
P =⇒ Q.
So what’s the point of this? Well, sometimes you might want to prove P =⇒ Q and it
will, in fact, be easier to prove instead the equivalent (contrapositive) statement
¬Q =⇒ ¬P . See Anthony and Biggs, section 3.5 for an example.
This sort of argument can be a bit perplexing when you first meet it. What’s going on
in the example just given? Well, what we show is that if such m, n exist, then something
impossible happens: namely the number 1099 is both even and odd. Well, this can’t be.
If supposing something leads to a conclusion you know to be false, then the initial
supposition must be false. So the conclusion is that such integers m, n do not exist.
26
2.8. Some terminology
2
2.8 Some terminology
At this point, it’s probably worth introducing some important terminology. When, in
mathematics, we prove a true statement, we often say we are proving a theorem, or a
proposition. (Usually the word ‘proposition’ is used if the statement does not seem
quite so significant as to merit the description ‘theorem’.) A theorem that is a
preliminary result leading up to a theorem is often called a lemma, and a minor
theorem that is a fairly direct consequence of, or special case of, a theorem is called a
corollary, if it is not significant enough itself to merit the title theorem. For your
purposes, it is important just to know that these words all mean true mathematical
statements. You should realise that these terms are used subjectively: for instance, the
person writing the mathematics has to make a decision about whether a particular
result merits the title ‘theorem’ or is, instead, merely to be called a ‘proposition’.
Overview
This chapter has explored some of the basics of algebra, together with an introduction
to mathematical proof. It is a preliminary chapter, and you should not let it detain you
from proceeding with the rest of the course. As we mentioned earlier, if you’re not
entirely comfortable with it, it is best to proceed anyway: particularly when it comes to
proving things, you can pick up the key ideas in context in what follows.
Learning outcomes
At the end of this chapter and the relevant reading you should be able to:
27
2. Preliminaries
2
Test your knowledge and understanding
Attempt the following exercises.
Exercises
Exercise 2.1
Simplify, then solve for a:
a
6ab − (b2 − 4bc) = 1.
b
Exercise 2.2
Given that the polynomial P (x) = x3 + 3x2 + 4x + 4 has an integer root, find it and
hence show that the polynomial can be expressed as a product P (x) = (x − r)Q(x)
where Q(x) is an irreducible quadratic polynomial.
Exercise 2.3
Is the following statement about natural numbers n true or false? Justify your answer
by giving a proof or a counterexample:
What are the converse and contrapositive of this statement? Is the converse true? Is the
contrapositive true?
Exercise 2.4
Is the following statement about natural numbers n true or false? Justify your answer
by giving a proof or a counterexample:
What are the converse and contrapositive of this statement? Is the converse true? Is the
contrapositive true?
Exercise 2.5
Prove by contradiction that there is no largest natural number.
28
2.8. Comments on exercises
2
Feedback to activity 2.3
We will show the first, and leave the second to you.
ar as = (a × a × a × · · · × a) (a × a × a × · · · × a) .
| {z }| {z }
r times s times
Removing the brackets, we have the product of a times itself a total of r + s times; that
is,
ar as = |a × a × a{z× · · · × a} = ar+s .
r+s times
Comments on exercises
Solution to exercise 2.1
a
(a) 6ab − (b2 − 4bc) = 6ab − ab + 4ac = 5ab + 4ac = a(5b + 4c), so the equation
b
becomes a(5b + 4c) = 1, and solving for a:
1
a= , provided 5b + 4c 6= 0.
5b + 4c
Note that it is an important part of the solution to declare that it is only valid if
5b + 4c 6= 0, otherwise there is no solution.
29
2. Preliminaries
2
where Q(x) is an irreducible quadratic polynomial.
30
Chapter 3 3
Matrices
Introduction
Matrices will be the main tool in our study of linear algebra, so we begin by learning
what they are and how to use them. This chapter contains a lot of definitions with
which you should become familiar, including terminology associated with a matrix and
the operations defined on matrices. All of the operations are defined purposefully to
ensure matrices are a useful tool, as we shall see in later chapters. In particular, the
definition of matrix multiplication may seem strange at first, but it turns out to be
exactly what we need.
Aims
The aims of this chapter are to:
Define a matrix, the terminology associated with a matrix, and the operations
defined on matrices.
Become familiar with the properties and rules of matrix operations, how they
combine and interact.
R
Essential reading
Anthony, M. and M. Harvey. Linear Algebra: Concepts and Methods. Chapter 1,
Sections 1.1–1.7
This chapter of the subject guide closely follows the first half of Chapter 1 of the
textbook. You should read the corresponding sections of the textbook and work through
all the activities there while working through the sections of this subject guide.
Further reading
If you would like to have another source containing the material covered in this chapter,
you can look up the key concepts and definitions (as listed in the synopsis which
follows) in any elementary linear algebra textbook, such as the two listed in Chapter 1,
using either the table of contents or the index.
31
3. Matrices
Synopsis
3 We define a matrix and the basic terminology associated with a matrix (entry, row,
column, size, square matrix, diagonal matrix, equality of matrices) and then look at the
operations of addition, scalar multiplication and matrix multiplication. We show how to
algebraically manipulate matrices using these operations, and state what is meant by a
zero matrix and an identity matrix. We define the inverse of a square matrix, when it
exists, and its properties. Next we define what is meant by powers of a square matrix,
look at its properties and how it interacts with the inverse of a matrix. Then we define
the transpose of a matrix and what is meant by a symmetric matrix, and look at the
properties of the transpose of a matrix and how it interacts with the inverse of a matrix.
A square matrix is a matrix with the same number of rows as columns. The diagonal
of an n × n square matrix S is the list of entries s11 , s22 , . . . , snn .
32
3.2. Matrix addition and scalar multiplication
A diagonal matrix is a square matrix with all the entries which are not on the diagonal
equal to zero. So D = (dij ) is diagonal if it is n × n and dij = 0 if i 6= j,
3
d11 0 · · · 0
0 d22 · · · 0
D= ... .. .. .. .
. . .
0 0 · · · dnn
Activity 3.2 Write down the diagonal matrix with the same diagonal as the matrix
A in the previous example.
Definition 3.2 (Equality of matrices) Two matrices are equal if they are the same
size and if corresponding entries are equal. That is, if A = (aij ) and B = (bij ) are both
m × n matrices, then
A = B ⇐⇒ aij = bij 1 ≤ i ≤ m, 1 ≤ j ≤ n
A + B = (aij + bij ) 1 ≤ i ≤ m, 1 ≤ j ≤ n
We can also multiply a matrix of any size by a real number, which we call a scalar in
this context. If λ is a scalar and A is a matrix, then λA is the matrix whose entries are
λ times each of the entries of A.
Definition 3.4 (Scalar multiplication) If A = (aij ) is an m × n matrix and λ ∈ R,
then
λA = (λaij ) 1 ≤ i ≤ m, 1 ≤ j ≤ n
Example 3.3 If
1 2 −1 1 0 3
A= and B =
−2 3 5 −4 2 −1
33
3. Matrices
We write −B for the matrix (−1)B and if A and B are m × n matrices, then A − B is
defined by A − B = A + (−1)B.
3
Activity 3.3 Find the matrix A − B for the matrices A and B in the above
example.
Activity 3.4 Write down the missing entries in the matrices below:
1 2 5 −1
R
+ = .
4 2 3 4
Read sections 1.1 and 1.2 of the text A-H, working through the activities there.
Although this formula looks daunting, it is quite easy to use in practice. What it says is
that the element in row i and column j of the product is obtained by taking each entry
of row i of A and multiplying it by the corresponding entry of column j of B, then
adding these n products together.
b1j
b2j
row i of A −→ a i1 a i2 · · · a in
.
..
bnj
↑
column j of B
What size is C = AB? The matrix C must be m × p since it will have one entry for
each of the m rows of A and each of the p columns of B.
Example 3.4 In the following product the element in row 2 and column 1 of the
product matrix (indicated in bold type) is found, as described above, by using the
row and column printed in bold type.
1 1 1 3 4
2 0 1 3 0
= 5 3
AB = 1 1 −1 14
1 2 4
−1 3
2 2 −1 9 −1
34
3.4. Matrix algebra
We shall see in later chapters that this definition of matrix multiplication is exactly
what is needed for applying matrices in our study of linear algebra.
It is an important consequence of this definition that:
To see just how non-commutative matrix multiplication is, let’s look at some examples,
starting with the two matrices A and B in the example above. The product AB is
defined, but the product BA is not even defined. Since A is 4 × 3 and B is 3 × 2 it is
not possible to multiply the matrices in the order BA.
Now consider the matrices
3 1
2 1 3
A= and B = 1 0.
1 2 1
1 1
Both products AB and BA are defined, but they are different sizes, so they cannot be
equal. What sizes are they?
Activity 3.5 Answer the question just posed concerning the sizes of AB and BA.
Multiply the matrices to find the two product matrices, AB and BA.
Even if both products are defined and the same size, it is still generally true that
AB 6= BA.
Activity 3.6 Try this for any two 2 × 2 matrices. Write down two different matrices
A and B and find the products AB and BA. For example, you could use
1 2 1 1
A= B= .
3 4 0 1
3A + 2B = 2(B − A + C),
35
3. Matrices
we can solve this for the matrix C using the rules of algebra. You must always bear in
mind that to perform the operations, they must be defined. In this equation it is
3 understood that all the matrices A, B and C are the same size, say m × n.
We list the rules of algebra satisfied by the operations of addition, scalar multiplication
and matrix multiplication. The sizes of the matrices are dictated by the operations
being defined.
This is easily shown to be true. We will carry out the proof as an example. The matrices
A and B must be of the same size, say m × n for the operation to be defined, so both
A + B and B + A are also m × n matrices. They also have the same entries. The (i, j)
entry of A + B is aij + bij and the (i, j) entry of B + A is bij + aij , but aij + bij = bij + aij
by the properties of real numbers. So the matrices A + B and B + A are equal.
On the other hand, as we have seen, matrix multiplication is not commutative:
AB 6= BA in general.
We have the following ‘associative’ laws:
(A + B) + C = A + (B + C)
(AB)C = A(BC)
These rules allow us to remove brackets. For example the last rule says that we will get
the same result if we first multiply AB and then multiply by C on the right, as we will
if we first multiply BC and then multiply by A on the left, so the choice is ours.
All these rules follow from the definitions of the operations in the same way as we
showed the commutativity of addition. We need to know that the matrices on the left
and on the right of the equals sign have the same size and that corresponding entries
are equal. Only the associativity of multiplication presents any complications; it is
tedious, but it can be done.
Activity 3.7 Think about these rules. What sizes are each of the matrices?
Write down the (i, j) entry for each of the matrices λ(AB) and (λA)(B) and prove
that the matrices are equal.
36
3.4. Matrix algebra
A(B + C) = AB + AC 3
(B + C)A = BA + CA
λ(A + B) = λA + λB.
Why do we need both of the first two rules (which state that matrix multiplication
distributes through addition)? Since matrix multiplication is not commutative, we
cannot conclude the second distributive rule from the first; we have to prove it is true
separately. All these statements can be proved from the definitions using the same
technique as used earlier, but we will not take the time to do this here.
If A is an m × n matrix, what is the result of A − A? We obtain an m × n matrix all of
whose entries are 0. This is an ‘additive identity’: that is, it plays the same role for
matrices as the number 0 does for numbers, in the sense that A + 0 = 0 + A = A. There
is a zero matrix of any size m × n.
Definition 3.6 (Zero matrix) A zero matrix, denoted 0, is an m × n matrix with all
entries zero,
0 0 ··· 0 0
0 0 ··· 0 0
0= ... ... . . . ... ... .
0 0 ··· 0 0
Then
A+0=A
A−A=0
0A = 0, A0 = 0
where the sizes of the zero matrices above must be compatible with the size of the
matrix A for the operations to be defined.
Activity 3.8 If A is a 2 × 3 matrix, write down the zero matrix for each of the rules
involving addition. What sizes are the zero matrices for the rules involving matrix
multiplication?
What about matrix multiplication? Is there a ‘multiplicative identity’, which acts like
the number 1 does for multiplication of numbers? The answer is ‘yes’.
Definition 3.7 (Identity matrix) The n × n identity matrix, denoted In or simply I,
is the diagonal matrix with aii = 1, 1 ≤ i ≤ n,
1 0 ··· 0
0 1 ··· 0
I= ... ... . . . ... .
0 0 ··· 1
37
3. Matrices
3 AI = A and IA = A,
where it is understood that the identity matrix is the appropriate size for the product
to be defined.
Activity 3.10 You can apply these rules to solve the matrix equation,
3A + 2B = 2(B − A + C)
for the matrix C. Do this; solve the equation for C, stating what rule of matrix
R
algebra you are using for each step of the solution.
You should now read sections 1.3 and 1.4 of the text A-H, working through the
activities there. You will find the solution of the last activity in the text.
Example 3.5 If
0 0 1 −1 8 0
A= , B= , C= ,
1 1 3 5 −4 4
On the other hand, If A + 5B = A + 5C, then we can conclude that B = C because the
operations of addition and scalar multiplication have inverses. If we have a matrix A,
then the matrix −A = (−1)A is an additive inverse because it satisfies A + (−A) = 0. If
we multiply a matrix A by a non-zero scalar c we can ‘undo’ this by multiplying cA
by 1/c.
What about matrix multiplication, is there a multiplicative inverse? The answer is
‘sometimes’.
38
3.5. Matrix inverses
Notice that the matrix A must be square, and that both I and B = A−1 must also be
square n × n matrices for the products to be defined.
Activity 3.12 Check this. Multiply the matrices to show that AB = I and
BA = I, where I is the 2 × 2 identity matrix.
To show that a matrix B is equal to A−1 , find the matrix products AB and BA and
show that each product is equal to the identity matrix I.
You might have noticed that we have said that B is the inverse of A. This is because an
invertible matrix has only one inverse. We will prove this.
Theorem 3.1 If A is an n × n invertible matrix, then the matrix A−1 is unique.
Proof
Assume the matrix A has two inverses, B and C, so that AB = BA = I and
AC = CA = I. You need to show that B and C must actually be the same matrix, that
is, you need to show that B = C. Begin by the statement
B = BI = · · ·
R
and substitute an appropriate product for I until you obtain that B = C.
You can check your proof with the details in the text A-H, where this theorem is
labelled as Theorem 1.23.
Not all square matrices will have an inverse. We say that A is invertible or non-singular
if it has an inverse. We say that A is non-invertible or singular if it has no inverse.
0 0
Example 3.7 The matrix (used in Example 3.5) is not invertible. It is
1 1
not possible for a matrix to satisfy
0 0 a b 1 0
=
1 1 c d 0 1
39
3. Matrices
3
On the other hand, if
a b
A= , where ad − bc 6= 0,
c d
Activity 3.13 Check that this is indeed the inverse of A, by showing that if you
multiply A on the left or on the right by this matrix, then you obtain the identity
matrix I.
the scalar ad − bc is called the determinant of the matrix A, denoted |A|. We shall see
more about the determinant in Chapter 8. So if |A| = ad − bc 6= 0, then to construct
A−1 we take the matrix A, switch the main diagonal entries and put minus signs in
front of the other two entries, then multiply by the scalar 1/|A|.
If AB = AC, and A is invertible, can we conclude that B = C? This time the answer is
‘yes’, because we can multiply each side of the equation on the left by A−1 :
A−1 AB = A−1 AC =⇒ IB = IC =⇒ B = C.
40
3.6. Powers of a matrix
(A−1 )−1 = A.
It is important to understand the definition of an inverse matrix and be able to use it. 3
Basically, if we can find a matrix that works in the definition, then that matrix is the
inverse, and the matrices are invertible. For example, if A is an invertible n × n matrix,
and λ 6= 0 ∈ R, then
1 −1
(λA)−1 = A .
λ
This statement says that the matrix λA is invertible, and its inverse is given by the
matrix C = (1/λ)A−1 . To prove this is true, we just need to show that the matrix C
satisfies (λA)C = C(λA) = I. This is straightforward using matrix algebra:
1 −1 1 −1 1 −1 1
(λA) A = λ AA = I and A (λA) = λA−1 A = I.
λ λ λ λ
If A and B are invertible n × n matrices, then using the definition of the inverse, you
can show that
(AB)−1 = B −1 A−1 .
This last statement says that if A and B are invertible matrices of the same size, then
the product AB is invertible and its inverse is the product of the inverses in the reverse
order. The proof of this statement is left as an exercise at the end of this chapter.
If A is an n × n matrix and r ∈ N, then the usual rules of exponents hold: for integers
r, s:
Ar As = Ar+s
(Ar )s = Ars .
This follows immediately from the definition of inverse matrix and the associativity of
matrix multiplication. Think about what it says; that the inverse of the product of A
times itself r times, is the product of A−1 times itself r times.
41
3. Matrices
Example 3.8 If
1 2
A= and B = ( 1 5 3 ) ,
3 4
then
1
1 3
AT = , T
B = 5.
2 4
3
d11 0 ··· 0
0 d22 ··· 0
D= ... .. .. .. .
. . .
0 0 · · · dnn
Show that DT = D.
42
3.7. Transpose
Properties of transpose
If we take the transpose of a matrix A by switching the rows and columns, and then do 3
it again, we get back to the original matrix A. This is summarised in the following
equation:
(AT )T = A.
These follow immediately from the definition. In particular, the (i, j) entry of (λA)T is
λaji which is also the (i, j) entry of λAT .
The next property tells you what happens when you take the transpose of a product of
matrices:
(AB)T = B T AT
This can be stated as: The transpose of the product of two matrices is the product of the
transposes in the reverse order.
Showing that this is true is slightly more complicated since it involves matrix
multiplication. It is important to understand why the product of the transposes must be
in the reverse order by carrying out the following activity.
If A is an m × n matrix and B is n × p, from the above activity you know that (AB)T
and B T AT are the same size. To prove that (AB)T = B T AT you need to show that the
(i, j)-entries are equal.
The (i, j) entry of (AB)T is the (j, i) entry of AB, which is obtained by taking row j of
A and multiplying each of the n terms by the corresponding entry of column i of B and
then summing the terms.
Activity 3.19 How can you similarly describe in words the (i, j) entry of B T AT ?
The final property in this section states that the inverse of the transpose of an invertible
matrix is the transpose of the inverse; that is, if A is invertible, then
This follows from the previous property and the definition of inverse. We have
AT (A−1 )T = (A−1 A)T = I T = I, and in the same way (A−1 )T AT = (AA−1 )T = I.
Therefore, by the definition of the inverse of a matrix, (A−1 )T must be the inverse of AT .
43
3. Matrices
Only square matrices can be symmetric. If A is symmetric, then aij = aji . That is,
entries diagonally opposite to each other must be equal: the matrix is symmetric about
its diagonal.
R
Activity 3.16, DT = D; that is, all diagonal matrices are symmetric.
Read the remaining parts of Chapter 1, Sections 1.1–1.7.
Overview
In this chapter we have looked at the terminology associated with matrices and the
operations defined for matrices, when they are defined and the properties they satisfy.
We have seen how to manipulate matrices algebraically.
You will be working with matrices throughout this course so it is important for you to
gain a facility with these definitions and operations – you should be able to use them
with ease.
Learning outcomes
At the end of this chapter and the relevant reading you should be able to:
define a matrix and explain the terminology used with matrices, such as row,
column, size, square matrix, diagonal matrix, equality of matrices, transpose of a
matrix, symmetric matrix
define and use matrix addition, scalar multiplication and matrix multiplication
appropriately (know when and how these operations are defined)
manipulate matrices algebraically
define what is meant by the inverse of a square matrix, know and use the
properties of the inverse of a matrix and find the inverse of a 2 × 2 matrix
define what is meant by An where A is a square matrix and n is an integer;
demonstrate and use the fact that (An )−1 = (A−1 )n .
state and use the properties of the transpose, use transpose in combination with
the other operations defined on matrices
44
3.7. Test your knowledge and understanding
Exercises
Exercise 3.1
For fixed real numbers a, b, c, d and real numbers x, y, z, w assume that
a b x y 1 0
AB = = = I.
c d z w 0 1
Write down the four linear equations in x, y, z, w that you obtain by first multiplying
the matrices on the left and then equating the entries of the product to the entries of
the identity matrix. Then solve these equations for x, y, z, w.
You should find that solution is only possible if ad − bc 6= 0, and that the solution is
1 d −b
B= .
ad − bc −c a
Compare this with the result you were given on page 40.
Exercise 3.2
What is meant by the statement that A is a symmetric matrix?
If B is an m × k matrix, show that the matrix B T B is a k × k symmetric matrix.
45
3. Matrices
3
1 2 −2 5 −1 7
+ = .
1 4 2 0 3 4
1 3 4 6
AB = BA = .
3 7 3 4
1 2 −2 1 1 0
AB = 3 =
3 4 2
− 21 0 1
46
3.7. Comments on selected activities
and
−2 1 1 2 1 0
BA = = .
3
2
− 21 3 4 0 1 3
−2 1
Therefore A−1 = 3 .
2
− 12
Removing the brackets (matrix multiplication is associative) and replacing each central
AA−1 = I, the resultant will eventually be AIA−1 = AA−1 = I. In the same way,
(A−1 )r Ar = (A −1 −1 −1
| A {z. . . A })(A
| A{z
. . . A}) = I.
r times r times
47
3. Matrices
Since multiplication of real numbers is commutative, these two expression are the same
real number.
3 Feedback to activity 3.20
The matrix is
1 4 5
A = 4 2 −7 = AT
5 −7 3
Comments on exercises
Solution to exercise 3.1
The equations are ( (
ax + bz = 1 ay + bw = 0
and
cx + dz = 0 cy + dw = 1
To begin you can solve the first set by multiplying the top equation by d and the
bottom equation by b and then subtracting one equation from the other to eliminate
the terms in z. You will obtain (ad − bc)x = 1. Then provided ad − bc 6= 0,
d
x= .
ad − bc
Repeat the steps, this time eliminating the terms in x and solve for z = −c/(ad − bc).
Then solve the second set of equations in the same way.
48
Chapter 4
Vectors 4
Introduction
Matrices lead us to a study of vectors, which can be viewed as n × 1 matrices, but
which have far reaching applications viewed as elements of a Euclidean space, Rn . To
understand this, we develop our geometric intuition by looking at R2 and R3 , and use
vectors to obtain equations of familiar geometric objects, namely lines and planes.
Aims
The aims of this chapter are to:
Define the inner product of two vectors and establish the properties satisfied by
this operation
Become familiar with forming lines and planes in R2 and R3 using linear
combinations of vectors
R
Essential reading
Anthony, M. and M. Harvey. Linear Algebra: Concepts and Methods. Chapter 1,
Sections 1.8–1.12
This chapter of the subject guide closely follows the second half of Chapter 1 of the
textbook. You should read the corresponding sections of the textbook and work through
all the activities there while working through the sections of this subject guide.
Further reading
If you would like to have another source containing the material covered in this chapter,
you can look up the key concepts and definitions (as listed in the synopsis which
follows) in any elementary linear algebra textbook, such as the two listed in Chapter 1,
using either the table of contents or the index.
49
4. Vectors
Synopsis
We define a vector, what we mean by a linear combination of vectors and say what we
mean by Euclidean n-space, Rn . Then we define the inner product of two vectors in Rn
and look at its properties. We pause to establish a fundamental relationship between
4 the vector Ax, where A is an m × n matrix and x ∈ Rn , and the column vectors of A.
We then focus on developing geometric insight, beginning with R2 , looking at what we
mean by the length and direction of a vector, and the angle between two vectors, and
extend these concepts to R3 . We then study lines in R2 , learning how to describe them
using Cartesian equations or vector equations and how to switch from one description
to the other. We then extend these ideas to vectors in R3 , where the extra dimension
increases the possibilities of how lines can interact. We next extend the idea of linear
combinations of vectors to planes in R3 and look at vector equations and Cartesian
equations of planes. We give several examples of determining the interactions of planes
and of lines and planes. Finally we extend these concepts to Rn , to lines and
hyperplanes.
4.1 Vectors in Rn
v1
v2
v=
...
vi ∈ R
vn
We can also define a row vector to be a 1 × n matrix. However, in this text, by the
term vector we shall always mean a column vector.
The numbers v1 , v2 , . . . , vn , are known as the components (or entries) of the vector, v.
In order to distinguish vectors from scalars, and to emphasise that they are vectors and
not general matrices, in this text vectors are written in lowercase boldface type. (When
writing by hand, vectors should be underlined to avoid confusion with scalars.)
Addition and scalar multiplication are defined for vectors as for n × 1 matrices:
v1 + w 1 λv1
v2 + w 2 λv2
v+w = .. λv =
...
.
vn + w n λvn
For a fixed positive integer n, the set of vectors together with the operations of addition
and scalar multiplication form Rn , usually called Euclidean n-space.
We will often write a column vector in the text as the transpose of a row vector.
50
4.1. Vectors in Rn
Although
x1
x2
x=
... = ( x1
x2 · · · xn )T ,
xn
4
we will usually write x = (x1 , x2 , · · · , xn )T , with commas separating the entries. A
matrix does not have commas; however, we will use the commas in order to clearly
distinguish the separate components of the vector.
For vectors v1 , v2 , . . . , vk in Rn and scalars α1 , α2 , . . . , αk in R, the vector
v = α1 v1 + · · · + αk vk ∈ Rn
The 1 × 1 matrix vT w can be identified with the real number, or scalar, which is its
unique entry. This turns out to be particularly useful, and is known as the inner product
or scalar product or dot product of v and w.
v1 w1
v2
, w = w.2 ,
v= .
.. ..
vn wn
the inner product denoted hv, wi, is the real number given by
v1 w1 +
*
v2 w2
hv, wi = . , . = v1 w1 + v2 w2 + · · · + vn wn
.. ..
vn wn
51
4. Vectors
The inner product, hv, wi is also known as the scalar product of v and w, or as the dot
product. In the latter case it is denoted by v · w.
The inner product of v and w is precisely the scalar quantity given by
w1
4 w2
vT w = ( v1 v2 · · · vn )
... = v1 w1 + v2 w2 + · · · + vn wn ,
wn
so that we can write
hv, wi = vT w.
It is important to realise that the inner product is just a number, a scalar, not another
vector or a matrix.
The inner product on Rn satisfies certain basic properties as shown in the next theorem.
Theorem 4.1 The inner product
hx, yi = x1 y1 + x2 y2 + · · · + xn yn , x, y ∈ Rn
Proof
We have
hx, yi = x1 y1 + x2 y2 + · · · + xn yn = y1 x1 + y2 x2 + · · · + yn xn = hy, xi
which proves (i). We leave the proofs of (ii) and (iii) as an exercise. For (iv), note that
is a sum of squares, so hx, xi ≥ 0, and hx, xi = 0 if and only if each term x2i is equal to
zero, that is, if and only if each xi = 0, so x is the zero vector, x = 0.
Activity 4.2 Prove properties (ii) and (iii). Show, also, that these two properties
are equivalent to the single property
52
4.2. Developing geometric insight – R2 and R3
From the definitions, it is clear that it is not possible to combine vectors in different
Euclidean spaces, either by addition or by taking the inner product. If v ∈ Rn and
w ∈ Rm , with m 6= n, then these vectors live in different ‘worlds’, or more precisely, in
different ‘vector spaces’.
a1i
a2i
... , 1 ≤ i ≤ n.
ci =
ami
Ax = x1 c1 + x2 c2 + · · · xn cn .
This theorem states that the matrix product Ax, which is a vector in Rm , can be
expressed as a linear combination of the column vectors of A.
Activity 4.3 Prove this theorem; derive expressions for both the left-hand side and
the right-hand side of the equality as a single m × 1 vector and compare the
R
components to prove the equality.
Read section 1.8 of the text A-H, working through the activities there. You will
find the solution of the last activity in the text.
53
4. Vectors
4.2.1 Vectors in R2
The set R can be represented as points along a horizontal line, called a real-number line.
In order to represent pairs of real numbers, (a1 , a2 ), we use a Cartesian plane, a plane
with both a horizontal axis and a vertical axis, each axis being a copy of the
4 real-number line, and we mark A = (a1 , a2 ) as a point in this plane. We associate this
point with the vector a = (a1 , a2 )T , as representing a displacement from the origin (the
point (0, 0)) to the point A. In this context, a is the position vector of the point A.
This displacement is illustrated by an arrow, or directed line segment, with initial point
at the origin and terminal point at A.
y 6
a2 r
(a1 , a2 )
a
>
-
(0, 0) a1 x
position vector, a
Even if a displacement does not begin at the origin, two displacements of the same
length and the same direction are considered to be equal. So, for example, the two
arrows below represent the same vector, v = (1, 2)T .
y 6
-
(0, 0) x
displacement vectors, v
If an object is displaced from a point, say O, the origin, to a point P by the
displacement p, and then displaced from P to Q, by the displacement v, then the total
displacement is given by the vector from O to Q, which is the position vector q. So we
would expect vectors to satisfy q = p + v, both geometrically (in the sense of a
displacement) and algebraically (by the definition of vector addition). This is certainly
true in general, as illustrated below.
54
4.2. Developing geometric insight – R2 and R3
6
q2 v :
4
*
p2
>
p
q
-
(0, 0) p1 q1
v:
>
p >
p
:
v
-
(0, 0)
p + v = v + p.
From q = p + v, we have v = q − p. This is the displacement from P to Q. To help you
determine in which direction the vector v points, think of v = q − p as the vector which
is added to the vector p in order to obtain the vector q.
If v represents a displacement, then 2v must represent a displacement in the same
direction, but twice as far, and −v represents an equal displacement in the opposite
direction. This interpretation is compatible with the definition of scalar multiplication.
Activity 4.4 Sketch the vector v = (1, 2)T in a coordinate system. Then sketch 2v
and −v. Looking at the coordinates on your sketch, what are the components of 2v
and −v?
We have stated that a vector has both a length and a direction. Given a vector
a = (a1 , a2 )T , its length, denoted by kak, can be calculated using Pythagoras’ theorem
applied to the right triangle shown below:
55
4. Vectors
y 6
(a1 , a2 )
>
a
4 a2
-
(0, 0) a1 x
a = λb, λ ∈ R, (λ 6= 0),
then a and b are parallel. If λ > 0 then a and b have the same direction. If λ < 0
then we say that a and b have opposite directions.
The zero vector, 0, has length 0 and has no direction. For any other vector, v 6= 0, there
is one unit vector in the same direction as v, namely
1
u= v.
kvk
4
Activity 4.6 Write down a unit vector, u, which is parallel to the vector a = .
3
Then write down a vector, w, of length 2 which is in the opposite direction to a.
56
4.2. Developing geometric insight – R2 and R3
Let a, b be two vectors in R2 , and let θ denote the angle between them. (Note that
angles are always measured in radians, not degrees, here. So, for example 45 degrees is
π/4 radians.) By the angle between two vectors we shall always mean the angle, θ, such
that 0 ≤ θ ≤ π. If θ < π, the vectors a, b, and c = b − a form a triangle, where c is the
side opposite the angle θ, as, for example, in the figure below.
4
>
J
J
a J c=b−a
J
J
J
θ J^
-J
b
The law of cosines (which you may or may not know — don’t worry if you don’t)
applied to this triangle gives us the important relationship stated in the following
theorem.
Theorem 4.3 Let a, b ∈ R2 and let θ denote the angle between them. Then
Proof
The law of cosines states that c2 = a2 + b2 − 2ab cos θ where c = kb − ak, a = kak,
b = kbk. That is,
kb − ak2 = kak2 + kbk2 − 2kak kbk cos θ (1)
Expanding the inner product and using its properties, we have
That is,
kb − ak2 = kak2 + kbk2 − 2ha, bi (2)
Comparing equations (1) and (2) above, we conclude that
ha, bi
cos θ = .
kak kbk
57
4. Vectors
Since
ha, bi = kak kbk cos θ ,
and −1 ≤ cos θ ≤ 1 for any real number θ, the maximum value of the inner product is
ha, bi = kak kbk. This occurs precisely when cos θ = 1, that is, when θ = 0. In this case
the vectors a and b are parallel and in the same direction. If they point in opposite
directions, then θ = π and we have ha, bi = −kak kbk. The inner product will be
positive if and only if the angle between the vectors is acute, meaning that 0 ≤ θ < π2 . It
will be negative if the angle is obtuse, meaning that π2 < θ ≤ π.
The non-zero vectors a and b are orthogonal (or perpendicular or, sometimes,
normal) when the angle between them is θ = π2 . Since cos( π2 ) = 0, this is precisely
when their inner product is zero. We restate this important fact:
4.2.3 Vectors in R3
Everything we have said so far about the inner product and its geometric interpretation
in R2 extends to R3 .
a1 q
If a = a2 ,
then kak = a21 + a22 + a23 .
a3
z6
(a1 , a2 , a3 )
1
-
PP
P y
P
(a1 , a2 , 0)
PP
x
58
4.3. Lines
The vectors a, b and c = b − a in R3 lie in a plane and the law of cosines can still be
applied to establish the result that
ha, bi = kak kbk cos θ .
Activity 4.8 Calculate the angles of the triangle with sides a, b, c and show it is an
isosceles right triangle, where
4
1 −1
a = 2 b= 1 c=b−a
2 4
4.3 Lines
4.3.1 Lines in R2
In R2 , a line is given by a single Cartesian equation, such as y = ax + b, and as such, we
can draw a graph of the line in the xy-plane. This line can also be expressed as a single
vector equation with one parameter. To see this, look at the following examples.
Example 4.4 Consider the line y = 2x. Any point (x, y) on this line must satisfy
this equation, and all points that satisfy the equation are on this line.
y6
x
-
(0, 0)
The line y = 2x. The vector shown is v = (1, 2)T .
Another way to describe the points on the line is by giving their position vectors. We
can let x = t where t is any real number. Then y is determined by y = 2x = 2t. So if
x = (x, y)T is the position vector of a point on the line, then
t 1
x= =t = tv , t ∈ R.
2t 2
For example, if t = 2, we get the position vector of the point (2, 4) on the line, and if
t = −1 we obtain the point (−1, −2). As the parameter t runs through all real
numbers, this vector equation gives the position vectors of all the points on the line.
Starting with the vector equation,
x 1
x= = tv = t , t∈R
y 2
59
4. Vectors
we can retrieve the Cartesian equation using the fact that the two vectors are equal
if and only if their components are equal. This gives us the two equations x = t and
y = 2t. Eliminating the parameter t between these two equations yields y = 2x.
4 The line in the above example is a line through the origin. What about a line which
does not contain (0, 0)?
y6
-
(0, 0)
x
The line y = 2x + 1. The vector shown is v = (1, 2)T .
We can interpret this as follows. To locate any point on the line, first locate one
particular point which is on the line, for example the y intercept, (0, 1). Then the
position vector of any point on the line is a sum of two displacements, first going to
the point (0, 1) and then going along the line, in a direction parallel to the vector
v = (1, 2)T . It is important to notice that in this case the actual position vector of a
point on the line does not lie along the line. Only if the line goes through the origin
will that happen.
Activity 4.9 Sketch the line y = 2x + 1 and the position vector q of the point (3, 7)
which is on this line. Then express q as the sum of two vectors, q = p + tv where
p = (0, 1)T and v = (1, 2)T for some t ∈ R and add these vectors to your sketch.
In the vector equation, any point on the line can be used to locate the line, and any
vector parallel to the direction vector, v, can be used to give the direction. So, for
example,
x 1 −2
= +s , s∈R
y 3 −4
is also a vector equation of this line.
60
4.3. Lines
As before, we can retrieve the Cartesian equation of the line by equating components of
the vector and eliminating the parameter.
Activity 4.11 Do this for each of the vector equations given above for the line
y = 2x + 1.
2
4
In general, any line in R is given by a vector equation with one parameter of the form
x = p + tv
where x is the position vector of a point on the line, p is any particular point on the
line and v is the direction of the line.
Activity 4.12 Write down a vector equation of the line through the points
P = (−1, 1) and Q = (3, 2). What is the direction of this line? Find a value for c
such that the point (7, c) is on the line.
for some t ∈ R and for some s ∈ R. We need to use different symbols (s and t) in the
equations because they are unlikely to be the same number for each line. We are
looking for values of s and t which will give us the same point. Equating components
of the position vectors of points on the lines, we have
1 + t = 5 − 2s 2s + t = 4 2s + t = 4
⇒ ⇒ .
3 + 2t = 6 + s −s + 2t = 3 −2s + 4t = 6
Adding these last two equations, we obtain t = 2, and therefore s = 1. Therefore the
point of intersection is (3, 7):
1 1 3 5 −2
+2 = = +1
3 2 7 6 1
61
4. Vectors
4.3.2 Lines in R3
How can you describe a line in R3 ? Think about this. How do you describe the set of
points (x, y, z) which are on a given line?
Because there are three variables involved, the natural way is to use a vector equation.
4 To describe a line you locate one point on the line by its position vector, and then
travel along from that point in a given direction, or in the opposite direction.
z 6
q
-
(0, 0, 0)
y
x
A line in R3
describe the same line. This is not obvious, so how do we show it?
The lines represented by these equations are parallel since their direction vectors are
parallel
−3 1
−6 = −3 2 ,
3 −1
so they either have no points in common and are parallel, or they have all points in
common, and are really the same line. Since
3 1 1
7 = 3 + 2 2 ,
−2 0 −1
62
4.3. Lines
the point (3, 7, 2) is on both lines, so they must have all points in common. We say
that the lines are collinear.
On the other hand, the lines represented by the equations
1 1 3 −3
x = 3 + t 2 and x = 7 + t −6 , t∈R
4
0 −1 1 3
are parallel, with no points in common, since there is no value of t for which
3 1 1
7 = 3 + t 2 .
1 0 −1
Activity 4.14 Write down a vector equation of the line through the points
P = (−1, 1, 2) and Q = (3, 2, 1). What is the direction of this line?
Is the point (7, 1, 3) on this line? Suppose you want a point on this line of the form
(c, d, 3). Find one such point. How many choices do you actually have for the values
of c and d?
We can also describe a line in R3 by Cartesian equations, but this time we need two
such equations because there are three variables. Equating components in the vector
equation (∗) above, we have
x = p1 + tv1 , y = p2 + tv2 , z = p3 + tv3 .
Solving each of these equations for the parameter t and equating the results, we have
the two equations
x − p1 y − p2 z − p3
= = , provided vi 6= 0, i = 1, 2, 3.
v1 v2 v3
Example 4.8 To find Cartesian equations of the line
1 −1
x = 2 + t 0 , t ∈ R,
3 5
we equate components,
x = 1 − t, y = 2, z = 3 + 5t,
and then solve for t in the first and third equation. The Cartesian equations are
z−3
1−x= and y = 2.
5
This is a line parallel to the xz-plane in R3 . The direction vector has a 0 in the
second component, so there is no change in the y direction, the y coordinate has the
constant value y = 2.
63
4. Vectors
In R2 , two lines are either parallel or intersect in a unique point. In R3 more can
happen. Two lines in R3 either intersect in a unique point, are parallel, or are skew,
which means that they lie in parallel planes and are not parallel.
Try to imagine what skew lines look like. If you are in a room with a ceiling parallel to
the floor, imagine a line drawn in the ceiling. It is possible for you to draw a parallel
4 line in the floor, but instead it is easier to draw a line in the floor which is not parallel
to the one in the ceiling. These lines will be skew. They lie in parallel planes (the ceiling
and the floor). If you could move the skew line in the floor onto the ceiling, then the
lines would intersect in a unique point.
Two lines are said to be coplanar if they lie in the same plane, in which case they are
either parallel or intersecting.
We have already seen in Example 4.6 on page 61, that the first two equations have
the unique solution, s = 1, t = 2. Substituting these values into the third equation,
7s + t = 7(1) + 2 6= 3,
we see that the system has no solution. Therefore the lines do not intersect and must
be skew.
Example 4.17 On the other hand, if we take a new line L3 , which is parallel to L2
but which passes through the point (5, 6, −5), then the lines
x 1 1 x 5 −2
L1 : y = 3 + t 2 , L3 : y = 6 + t 1 , t ∈ R
z 4 −1 z −5 7
do intersect in the unique point (3, 7, 2).
64
4.4. Planes in R3
Activity 4.16 Check this. Find the point of intersection of the two lines L1 and L3 .
4.4 Planes in R3
4
On a line, there is essentially one direction in which a point can move (forwards or
backwards) given as all possible scalar multiples of a given direction, but on a plane
there are more possibilities. A point can move in two different directions, and in any
linear combination of these two directions. So how do we describe a plane in R3 ?
The vector parametric equation
x = p + sv + tw, s, t, ∈ R
describes the position vectors of points on a plane in R3 provided that the vectors v and
w are non-zero and are not parallel. The vector p is the position vector of any
particular point on the plane and the vectors v and w are displacement vectors which
lie in the plane. By taking all possible linear combinations x = p + sv + tw, for s, t ∈ R,
we obtain all the points on the plane.
The equation
x = sv + tw, s, t, ∈ R
describes a plane through the origin. In this case the position vector, x, of any point on
the plane lies in the plane.
Example 4.18 You have shown that the lines L1 and L3 given in Example 4.17
intersect in the point (3, 7, 2). (See Activity 4.16 on page 65.) Two intersecting lines
determine a plane. A vector equation of the plane containing the two lines is given by
x 3 1 −2
y = 7 + s 2 + t 1 , s, t ∈ R.
z 2 −1 7
Why? We know that (3, 7, 2) is a point on the plane, and the directions of each of
the lines must lie in the plane. As s and t run through all real numbers, this
equation gives the position vector of all points on the plane. Since the point (3, 7, 2)
is on both lines, if t = 0 we have the equation of L1 , and if s = 0 we get L3 .
Any point which is on the plane can take the place of the vector (3, 7, 2)T , and any
non-parallel vectors which are linear combinations of v and w can replace these in
the equation. So, for example
x 1 1 −3
y = 3 + t 2 + s −1 , s, t ∈ R
z 4 −1 8
65
4. Vectors
Activity 4.18 Verify this. Show that (1, 3, 4) is a point on the plane given by each
equation, and show that (−3, −1, 8)T is a linear combination of (1, 2, −1)T and
(−2, 1, 7)T .
hn, xi = 0,
so this equation gives the position vectors, x, of points on the plane. If n = (a, b, c)T
and x = (x, y, z)T , then this equation can be written as
* +
a x
hn, xi = b , y =0
c z
or
ax + by + cz = 0.
This is a Cartesian equation of a plane through the origin in R3 . The vector n is called
a normal vector to the plane. Any vector which is parallel to n will also be a normal
vector and will lead to the same Cartesian equation.
On the other hand, given a Cartesian equation,
ax + by + cz = 0
then this equation represents a plane through the origin in R3 with normal vector
n = (a, b, c)T .
66
4.4. Planes in R3
To describe a plane which does not go through the origin, we choose a normal vector n
and one point P on the plane with position vector p. We then consider all displacement
vectors which lie in the plane with initial point at P . If x is the position vector of any
point on the plane, then the displacement vector x − p lies in the plane, and x − p is
orthogonal to n. Conversely, if the position vector x of a point satisfies hx − p, ni = 0,
then the vector x − p lies in the plane, so the point (with position vector x) is on the 4
plane.
(Again, think about the pencil perpendicular to the table top, only this time the point
where the pencil is touching the table is a point, P , on the plane, and the origin of your
coordinate system is somewhere else, say, in the corner on the floor.)
The orthogonality condition means that the position vector of any point on the plane is
given by the equation
hn, x − pi = 0.
Using properties of the inner product, we can rewrite this as
hn, xi = hn, pi
ax + by + cz = d
is a Cartesian equation of a plane in R3 . The plane goes through the origin if and only if
d = 0.
2x − 3y − 5z = 2
represents a plane which does not go through the origin, since (x, y, z) = (0, 0, 0)
does not satisfy the equation. To find a point on the plane we can choose any two of
the coordinates, say y = 0 and z = 0, and then the equation tells us that x = 1. So
the point (1, 0, 0) is on this plane. The components of a normal vector to the plane
can be read from this equation as the coefficients of x, y, z: n = (2, −3, −5)T .
How does the Cartesian equation of a plane relate to the vector parametric equation of
a plane? A Cartesian equation can be obtained from the vector equation algebraically,
by eliminating the parameters in the vector equation, and vice versa, as the following
example shows.
which is a plane through the origin parallel to the plane in Example 4.18 on page 65.
The direction vectors v = (1, 2, −1)T and w = (−2, 1, 7) lie in the plane.
67
4. Vectors
Finally, we substitute for s and t in the third equation, z = −s + 7t, and simplify to
obtain a Cartesian equation of the plane,
3x − y + z = 0.
Activity 4.19 Carry out this last step to obtain the Cartesian equation of the
plane.
The vector n is a normal vector to the plane. We can check that n is, indeed,
orthogonal to the plane by taking the inner product with the vectors v and w, which lie
in the plane.
Activity 4.20 Do this. Calculate hn, vi and hn, wi, and verify that both inner
products are equal to zero.
Activity 4.21 Using the properties of inner product, show that this last statement
is true. That is, if hn, vi = 0 and hn, wi = 0, then hn, sv + twi = 0, for any
s, t ∈ R.
Can we do the same for a plane which does not pass through the origin? Consider the
following example.
68
4.4. Planes in R3
Example 4.21 The plane we just considered in Example 4.20 is parallel to the
plane with vector equation
x 3 1 −2
y = 7 + s 2 + t 1 = p + sv + tw, s, t ∈ R,
z 2 −1 7 4
which passes through the point (3, 7, 2). Since the planes are parallel, they will have
the same normal vectors. So the Cartesian equation of this plane is of the form
3x − y + z = d.
Since (3, 7, 2) is a point on the plane, it must satisfy the equation for the plane.
Substituting into the equation we find d = 3(3) − (7) + (2) = 4 (which is equivalent
to finding d by using d = hn, pi). So the Cartesian equation we obtain is
3x − y + z = 4.
69
4. Vectors
Activity 4.22 Do this. You should be able to immediately spot the values of s and
t which work.
Using the examples we have just done, you should now be able to tackle the following
question.
4
Activity 4.23 The two lines, L1 and L2 ,
x 1 1 x 5 −2
L1 : y = 3 + t 2 , L2 : y = 6 + t 1 , t∈R
z 4 −1 z 1 7
in Example 4.9 on page 64 are skew, and therefore are contained in parallel planes.
Find vector equations and Cartesian equations for these two planes.
Two planes in R3 are either parallel or intersect in a line. Considering such questions, it
is usually easier to use the Cartesian equations of the planes. If the planes are parallel,
then this will be obvious from looking at their normal vectors. If they are not parallel,
then the line of intersection can be found by solving the two Cartesian equations
simultaneously.
70
4.5. Lines and hyperplanes in Rn
Activity 4.24 Carry out the calculations in the above example and verify that the
line is in both planes.
A line in Rn is the set of all points (x1 , x2 , . . . , xn ) whose position vectors x satisfy a
vector equation of the form
x = p + tv, t ∈ R,
where p is the position vector of one particular point on the line and v is the direction
of the line. If we can write x = tv, t ∈ R, then the line goes through the origin.
4.5.2 Hyperplanes
The set of all points (x1 , x2 , . . . , xn ) which satisfy one Cartesian equation,
a1 x1 + a2 x2 + · · · + an xn = d,
is called a hyperplane in Rn .
In R2 , a hyperplane is a line; in R3 it is a plane. For n > 3, we use the term hyperplane.
The vector
a1
a2
a= ...
an
is a normal vector to the hyperplane. Writing the Cartesian equation in vector form, a
hyperplane is the set of all vectors, x ∈ Rn such that
hn, x − pi = 0,
where the normal vector n and the position vector p of a point on the hyperplane are
given.
Activity 4.25 How many Cartesian equations would you need to describe a line in
Rn ?
R
How many parameters would there be in a vector equation of a hyperplane?
Read the remaining parts of Chapter 1, Sections 1.8–1.12.
71
4. Vectors
Overview
In this chapter we have defined and looked at vectors and Euclidean n-space, Rn ,
together with the definition and properties of the inner product in Rn . We have worked
with lines in R2 and lines and planes in R3 in order to gain geometric insight into the
4 possibilities that arise from linear combinations of vectors, so that we may be able to
apply this intuition to Rn . Vectors are the fundamental building blocks of linear
algebra, as we shall see in the next chapters.
At the end of this chapter and the relevant reading you should be able to:
72
4.5. Comments on selected activities
z6
(a1 , a2 , a3 )
1
-
PPP y
PP
P (a , a , 0)
1 2
x
the line from the originpto the point (a1 , a2 , 0) lies in the xy-plane, and by Pythagoras’
theorem, it has length a21 + a22 . Applying Pythagoras’ theorem again to the right
triangle shown, we have
s 2
q q
kak = a1 + a2 + (a3 )2 = a12 + a22 + a32
2 2
73
4. Vectors
ha, ci 2+2−4
= √ √ = 0;
kakkck 9 9
hb, ci 2−1+8 1
= √ √ =√
kbkkck 18 9 2
4 Thus the triangle has a right-angle, and two angles of π/4.
Alternatively, as the vectors a and c are orthogonal, and have the same length, it
follows immediately that the triangle is right-angled and isosceles.
Feedback to activity 4.9
If t = 3, then q = (3, 7)T . You are asked to sketch the position vector q as this sum to
illustrate that the vector q does locate a point on the line, but the vector q does not lie
on the line.
Feedback to activity 4.10
Here s = −1.
The point (7, 1, 3) is not on this line, but the point (−5, 0, 3) is on the line. The value
t = −1 will then satisfy all three component equations. There is, of course, only one
possible choice for the values of c and d.
74
4.5. Comments on selected activities
75
4. Vectors
76
Chapter 5
Linear systems I: Gaussian
elimination
5
Introduction
Being able to solve systems of many linear equations in many unknowns is a vital part
of linear algebra. This is where we begin to use matrices and vectors as essential
elements of obtaining and expressing the solutions. In this chapter we investigate linear
systems and present a useful method known as Gaussian elimination.
Aims
The aims of this chapter are to:
Learn how to solve linear systems by using the method of Gaussian elimination.
R
Essential reading
Anthony, M. and M. Harvey. Linear Algebra: Concepts and Methods. Chapter 2.
Sections 2.1-2.3.
This chapter of the guide closely follows the first half of Chapter 2 of the textbook. You
should read the corresponding sections of the textbook and work through all the
activities there while working through the sections of this subject guide.
Further reading
R
The material of this chapter is also discussed in the following book:
Anthony, M. and N. Biggs. Mathematics for Economics and Finance: Methods
and Modelling. Chapters 16, 17.
If you would like to have another source containing the material covered in this chapter,
you can look up the key concepts and definitions (as listed in the synopsis which
follows) in any elementary linear algebra textbook, such as the two listed in Chapter 1,
using either the table of contents or the index.
77
5. Linear systems I: Gaussian elimination
Synopsis
We begin by expressing a system in matrix form and defining elementary row
operations on an augmented matrix. These operations mimic standard operations on
systems of equations. We then learn a precise algorithm to apply these operations in
order to put the matrix in a form called reduced echelon form, from which the general
solution to the system is readily obtained. The method of manipulating matrices in this
way to obtain the solution is known as Gaussian elimination.
5
5.1 Systems of linear equations
A system of m linear equations in n unknowns x1 , x2 , . . . , xn is a set of m
equations of the form
x1 = s1 , x2 = s2 , . . . , xn = sn .
78
5.2. Row operations
whose entries are the left-hand sides of our system of linear equations.
If we define another column vector b, whose m components are the right-hand sides bi ,
the system is equivalent to the matrix equation
Ax = b.
Example 5.1 Consider the following system of three linear equations in the three
unknowns, x1 , x2 , x3 :
5
x1 + x2 + x3 = 3
2x1 + x2 + x3 = 4
x1 − x2 + 2x3 = 5
The entries of the matrix A are the coefficients of the xi . If we perform the matrix
multiplication of Ax,
1 1 1 x1 x1 + x2 + x3
2 1 1 x2 = 2x1 + x2 + x3
1 −1 2 x3 x1 − x2 + 2x3
and these two 3 × 1 matrices are equal if and only if their components are equal.
This gives precisely the three linear equations.
79
5. Linear systems I: Gaussian elimination
then subtracting it from the second equation. Let’s do this. Twice the first equation
gives the equation 2x1 + 2x2 + 2x3 = 6. Subtracting this from the second equation,
2x1 + x2 + x3 = 4 yields the equation −x2 − x3 = −2. We can now replace the second
equation in the original system by this new equation,
x1 + x2 + x3 = 3
−x2 − x3 = −2
5 x1 − x2 + 2x3 = 5
and the new system will have the same set of solutions as the original system.
We can continue in this manner to obtain a simpler set of equations with the same
solution set as the original system. So what are the operations that we can perform on
the equations of a linear system without altering the set of solutions? We can:
O1 multiply both sides of an equation by a non-zero constant
O2 interchange two equations
O3 add a multiple of one equation to another.
These operations do not alter the set of solutions since the restrictions on the variables
x1 , x2 , . . . , xn given by the new equations imply the restrictions given by the old ones
(that is, we can undo the manipulations made on the old system).
At the same time, we observe that these operations really only involve the coefficients of
the variables and the right sides of the equations.
For example, using the same system as above expressed in matrix form, Ax = b, then
the matrix
1 1 1 3
(A|b) = 2 1 1 4
1 −1 2 5
which is the coefficient matrix A together with the constants b as the last column,
contains all the information we need to use, and instead of manipulating the equations,
we can instead manipulate the rows of this matrix. For example, subtracting twice
equation 1 from equation 2 is executed by taking twice row 1 from row 2.
These observations form the motivation behind a method to solve systems of linear
equations, known as Gaussian elimination. To solve a linear system Ax = b we first
form the augmented matrix, denoted (A|b) which is A with column b tagged on.
Definition 5.2 (Augmented matrix) If Ax = b is a system of linear equations,
a11 a12 · · · a1n x1 b1
a21 a22 · · · a2n x2 b.2
A= ... .. .. .. x = .
.. b = ..
. . .
am1 am2 · · · amn xn bm
Then the matrix
a11 a12 · · · a1n b1
a21 a22 · · · a2n b2
(A|b) =
... .. ... .. ..
. . .
am1 am2 · · · amn bm
is called the augmented matrix of the linear system.
80
5.3. Gaussian elimination
From the operations listed above for manipulating the equations of the linear system,
we define corresponding operations on the rows of the augmented matrix.
Definition 5.3 (Elementary row operations) These are:
RO1 multiply a row by a non-zero constant
RO2 interchange two rows
RO3 add a multiple of one row to another.
5
5.3 Gaussian elimination
We will describe a systematic method for solving systems of linear equations by an
algorithm which uses row operations to put the augmented matrix into a form from
which the solution of the linear system can be easily read. To illustrate the algorithm,
we will use two examples: the augmented matrix (A|b) of the example in the previous
section and the augmented matrix (B|b) of a second system of linear equations.
1 1 1 3 0 0 2 3
( A|b ) = 2 1 1 4 , ( B|b ) = 0 2 3 4 .
1 −1 2 5 0 0 1 5
(3) Make this entry 1; multiply the first row by a suitable number or interchange two
rows. This is called a leading one.
The left-hand matrix already had a 1 in this position. For the second matrix, we
multiply row 1 by 12 .
3
1 1 1 3 0 1 2
2
2 1 1 4 0 0 2 3
1 −1 2 5 0 0 1 5
81
5. Linear systems I: Gaussian elimination
(4) Add suitable multiples of the top row to rows below to make all entries below the
leading one become zero.
For the matrix on the left, we add −2 times row 1 to row 2, then we add −1 times row 1
to row 3. The first operation is the same as the one we performed earlier on the example
using the equations. The matrix on the right already has zeros under the leading one.
0 1 32 2
1 1 1 3
0 −1 −1 −2 0 0 2 3
5 0 −2 1 2 0 0 1 5
At any stage we can read the modified system of equations from the new augmented
matrix, remembering that column 1 gives the coefficients of x1 , column 2 the coefficients
of x2 and so on, and that the last column represents the right-hand side of the equations.
For example the matrix on the left is now the augmented matrix of the system
x1 + x2 + x3 = 3
−x2 − x3 = −2
−2x2 + x3 = 2
82
5.3. Gaussian elimination
Definition 5.4 (Row echelon form) A matrix is said to be in row echelon form,
(or echelon form) if it has the following three properties:
(1) Every non-zero row begins with a leading one.
(2) A leading one in a lower row is further to the right.
(3) Zero rows are at the bottom of the matrix.
Activity 5.1
Check that the above matrix satisfies these three properties. 5
The term echelon form takes its name from the form of the equations at this stage.
Reading from the matrix, these equations are
x1 + x2 + x3 = 3
x2 + x 3 = 2
x3 = 2
We could now use a method called back substitution to find the solution of the
system. The last equation tells us that x3 = 2. We can then substitute this into the
second equation to obtain x2 , and then use these two values to obtain x1 . This is an
acceptable approach, but we can effectively do the same calculations by continuing with
row operations. So we continue with one final step of our algorithm.
(6) Begin with the last row and add suitable multiples to each row above to get zeros
above the leading ones.
Continuing from the row echelon form and using row 3, we replace row 2 with row
2−row 3, and at the same time we replace row 1 with row 1−row 3.
1 1 1 3 1 1 0 1
(A|b) −→ 0 1 1 2 −→ 0 1 0 0
0 0 1 2 0 0 1 2
We now have zeros above the leading one in column 3. There is only one more step to
do, and that is to get a zero above the leading one in column 2. So the final step is row
1−row 2,
1 0 0 1
−→ 0 1 0 0 .
0 0 1 2
This final matrix is now in reduced (row) echelon form. It has the additional property
that every column with a leading one has zeros elsewhere.
Definition 5.5 (Reduced row echelon form) A matrix is said to be in reduced
row echelon form (or reduced echelon form) if it has the following four properties:
(1) Every non-zero row begins with a leading one.
(2) A leading one in a lower row is further to the right.
(3) Zero rows are at the bottom of the matrix.
(4) Every column with a leading one has zeros elsewhere.
83
5. Linear systems I: Gaussian elimination
We now return to the example (B|b) which we left after the first round of steps 1 to 4,
and apply step 5. We cover up the top row and apply steps 1 to 4 again. We need to
have a leading one in the second row, which we achieve by switching row 2 and row 3:
0 1 32 2 0 1 32 2
(B|b) −→ 0 0 2 3 −→ 0 0 1 5
0 0 1 5 0 0 2 3
We obtain a zero under this leading one by replacing row 3 with row 3 + (−2) times
row 2,
0 1 32 2
−→ 0 0 1 5
0 0 0 −7
and then finally multiply row 3 by − 17
3
0 1 2
2
−→ 0 0
1 5
0 0 0 1
This matrix is now in row echelon form, but we shall see that there is no point in going
on to reduced row echelon form. This last matrix is equivalent to the system
0 1 32
x1 2
0 0 1 x2 = 5
0 0 0 x3 1
What is the bottom equation of this system? Row 3 says 0x1 + 0x2 + 0x3 = 1, that is
0 = 1 which is impossible! This system has no solution.
84
5.3. Gaussian elimination
If the row echelon form (REF) of the augmented matrix ( A|b ) contains a row
(0 0 · · · 0 1) then it is inconsistent.
5
It is instructive to look at the original systems represented by these augmented matrices,
1 1 1 3 0 0 2 3
( A|b ) = 2 1 1 4 ( B|b ) = 0 2 3 4
1 −1 2 5 0 0 1 5
x1 + x2 + x3 = 3 2x3 = 3
2x + x2 + x3 = 4 2x + 3x3 = 4 .
1 2
x1 − x2 + 2x3 = 5 x3 = 5
We see immediately that the system Bx = b is inconsistent since it is not possible for
both the top and the bottom equation to hold.
Since these are systems of three equations in three variables, we can interpret these
results geometrically. Each of the equations above represents a plane in R3 . The system
Ax = b represents three planes which intersect in the point (1, 0, 2). This is the only
point which lies on all three planes. The system Bx = b represents three planes, two of
which are parallel (the horizontal planes 2x3 = 3 and x3 = 5), so there is no point which
lies on all three planes.
This method of reducing the augmented matrix to reduced row echelon form is known
as Gaussian elimination or Gauss-Jordan elimination.
We have been very careful in illustrating this method to explain what the row
operations were for each step of the algorithm, but in solving a system with this method
it is not necessary to include all this detail. The aim is to use row operations (following
the algorithm) to put the augmented matrix into reduced row echelon form, and then
read off the solutions from this form. Where it is useful to indicate the operations, you
can do so by writing, for example, R2 − 2R1 , where we always write down the row we
are replacing first, so that R2 − 2R1 indicates ‘replace row 2 (R2 ) with row 2 plus −2
times row 1 (R2 − 2R1 )’. Otherwise, you can just write down the sequence of matrices
linked by arrows. It is important to realise that once you have performed a row
operation on a matrix, the new matrix obtained is not equal to the previous one, this is
why you must use arrows between the steps and not equal signs.
Example 5.2 We repeat the reduction of (A|b) to illustrate this for the system
x1 + x2 + x3 = 3
2x1 + x2 + x3 = 4
x1 − x2 + 2x3 = 5
Begin by writing down the augmented matrix, then apply the row operations to
85
5. Linear systems I: Gaussian elimination
carry out the algorithm. Here we will indicate the row operations.
1 1 1 3
(A|b) = 2 1 1 4 →
1 −1 2 5
1 1 1 3
R2 − 2R1 0 −1 −1 −2 →
5 R3 − R1 0 −2 1 2
1 1 1 3
(−1)R2 0 1 1 2 →
0 −2 1 2
1 1 1 3
0 1 1 2 →
R3 + 2R2 0 0 3 6
1 1 1 3
0 1 1 2.
1
( 3 )R3 0 0 1 2
The matrix is now in row echelon form, continue to reduced row echelon form,
R1 − R3 1 1 0 1
R2 − R3 0 1 0 0 →
0 0 1 2
R1 − R2 1 0 0 1
0 1 0 0.
0 0 1 2
The augmented matrix is now in reduced row echelon form.
Activity 5.3 Use Gaussian elimination to solve the following system of equations,
x1 + x2 + x3 = 6
2x1 + 4x2 + x3 = 5
2x1 + 3x2 + x3 = 6.
Be sure to follow the algorithm to put the augmented matrix into reduced row
echelon form using row operations.
x 1 + x2 + x3 + x4 + x5 = 3
2x1 + x2 + x3 + x4 + 2x5 = 4
86
5.3. Gaussian elimination
x1 − x2 − x3 + x4 + x5 = 5
x1 + x4 + x5 = 4.
87
5. Linear systems I: Gaussian elimination
There are only three leading ones in the reduced row echelon form of this matrix. These
appear in columns 1, 2 and 4. Since the last row gives no information, but merely states
that 0 = 0, the matrix is equivalent to the system of equations
x1 + 0 + 0 + 0 + x5 = 1
x2 + x3 + 0 + 0 = −1
x4 + 0 = 3.
5 The form of these equations tells us that we can assign any values to x3 and x5 , and
then the values of x1 , x2 and x4 will be determined.
Definition 5.7 (Leading variables) The variables corresponding to the columns with
leading ones in the reduced row echelon form of an augmented matrix are called
leading variables. The other variables are called non-leading variables.
In this example the variables x1 , x2 and x4 are leading variables, x3 and x5 are
non-leading variables. We assign x3 , x5 the arbitrary values s, t, where s, t represent any
real numbers, and then solve for the leading variables in terms of these. We get
x4 = 3 x2 = −1 − s x1 = 1 − t.
Observe that there are infinitely many solutions, because any values of s ∈ R and t ∈ R
will give a solution.
The solution given above is called a general solution of the system, because it gives a
solution for any values of s and t. For any particular assignment of values to s and t,
such as s = 0, t = 1, we obtain a particular solution of the system.
Activity 5.4 Let s = 0 and t = 0 and show (by substituting it into the equation)
that x0 = (1, −1, 0, 3, 0)T is a solution of Ax = b. Then let s = 1 and t = 2 and show
that the new vector x1 you obtain is also a solution.
With practice, you will be able to read the general solution directly from the reduced
row echelon form of the augmented matrix. We have
1 0 0 0 1 1
0 1 1 0 0 −1
(A|b) −→ 0 0 0 1 0 3 .
0 0 0 0 0 0
Locate the leading ones, and note which are the leading variables. Then locate the
non-leading variables and assign each an arbitrary parameter. So, as above, we note
88
5.3. Overview
that the leading ones correspond to x1 , x2 and x4 and we assign arbitrary parameters to
the non-leading variables; that is, values such as x3 = s and x5 = t where s and t
represent any real number. Then write down the vector x = (x1 , x2 , x3 , x4 , x5 )T (as a
column) and fill in the values starting with x5 and working up. We have x5 = t. Then
the third row tells us that x4 = 3. We have x3 = s. Now look at the second row, which
says x2 + x3 = −1, or x2 = −1 − s. Then the top row tells us that x1 = 1 − t. In this
way we obtain the solution in vector form.
5
Activity 5.5 Write down the system of three linear equations in three unknowns
represented by the matrix equation Ax = b, where
1 2 1 x 3
A= 2 2 0 ,
x= y ,
b = 2.
3 4 1 z 5
Use Gaussian elimination to solve the system. Express your solution in vector form.
If each equation represents the Cartesian equation of a plane in R3 , describe the
intersection of these three planes.
Therefore v is also a solution for any t ∈ R, so there are infinitely many of them.
Notice that in this proof, the vector w = q − p satisfies the equation Ax = 0.
Overview
In this chapter, we have seen how the method of Gaussian elimination can be used to
solve linear systems. We will see some applications of linear systems shortly, but we will
also see that the basic method of Gaussian elimination (the use of elementary row
operations) is a crucial tool in many areas of linear algebra.
89
5. Linear systems I: Gaussian elimination
Learning outcomes
At the end of this chapter and the relevant reading you should be able to:
90
5.3. Comments on selected activities
91
5. Linear systems I: Gaussian elimination
92
Chapter 6
Linear systems II: an application and
homogeneous systems
Introduction 6
In this chapter we give an economic application of linear systems. We also study some
general properties of the set of solutions to a linear system.
Aims
The aims of this chapter are to:
Explain what is meant by a homogeneous linear system and what is meant by the
null space of a matrix
Explain how the general solution to any linear system is related, through the
‘Principle of Linearity’ to that of a related homogeneous system.
R
Essential reading
Anthony, M. and M. Harvey. Linear Algebra: Concepts and Methods. Chapter 3,
Section 3.5 and Chapter 2, Section 2.4.
Further reading
If you would like to have another source containing the material covered in this chapter,
you can look up the key concepts and definitions (as listed in the synopsis which
follows) in any elementary linear algebra textbook, such as the two listed in Chapter 1,
using either the table of contents or the index.
R
Input-output analysis can also be found in the Anthony and Biggs book.
93
6. Linear systems II: an application and homogeneous systems
Synopsis
We begin by describing a model developed by Leontief known as input-output analysis.
This is an economic application of linear systems.
We then examine the forms of solutions to systems of linear equations and look at their
properties, defining what is meant by a homogeneous system and the null space of a
matrix. We explain how the solution set of a linear system is related to the null space of
the coefficient matrix (or, equivalently, to the solution set of the related homogeneous
linear system).
6
6.1 Application: Leontief input-output analysis
In 1973 Wassily Leontief was awarded the Nobel prize in Economics for work he did
analysing an economy with many interdependent industries using a system of linear
equations. We present a brief outline of his method here.
Suppose an economy has n interdependent production processes; the outputs of the n
industries are used to run the industries and to satisfy an outside demand. We will
assume that prices are fixed so that they can be used to measure the output. The
problem we wish to solve is to determine the level of output of each industry which will
satisfy all demands exactly; that is, both the demands of the other industries and the
outside demand. The problem can be described as a system of linear equations, as we
shall see by considering the following simple example.
Example 6.1 Suppose there are two industries: water and electricity. Let
x1 = total output of water ($ value)
x2 = total output of electricity ($ value)
We can express this as a vector,
x1
x= , called a production vector.
x2
Suppose we know that
$0.01 water
water uses to produce $1.00 water output
$0.15 electricity
$0.21 water
electric uses to produce $1.00 electricity.
$0.05 electricity
What is the total water used by the industries? Water is using $0.01 for each unit
output, so a total of 0.01x1 , and electricity is using $0.21 water for each unit of its
output, so a total of 0.21x2 . The total amount of water used by the industries is
therefore 0.01x1 + 0.21x2 . In the same way, the total amount of electricity used by
the industries is 0.15x1 + 0.05x2 . The totals can be expressed as
water 0.01 0.21 x1
= = Cx.
electricity 0.15 0.05 x2
94
6.1. Application: Leontief input-output analysis
After the industries have used water and electricity to produce their outputs, how
much water and electricity are left to satisfy the outside demand?
Activity 6.1 Think about this before continuing. Write down an expression for the
total amount of water which is left after the industries have each used what they
need to produce their output. Do the same for electricity.
In matrix notation,
x1 0.01 0.21 x1 d1
− = ,
x2 0.15 0.05 x2 d2
or, x − Cx = d, where
d1
d= is the outside demand vector.
d2
If we use the fact that Ix = x, where I is the 2 × 2 identity matrix, then we can
rewrite this system in matrix form as
Ix − Cx = d or (I − C)x = d.
This is now in the usual matrix form for a system of linear equations. A solution, x,
to this system of equations will determine the output levels of each industry required
to satisfy all demands exactly.
Now let’s look at the general case. Suppose we have an economy with n interdependent
industries. If cij denotes the amount of industry i used by industry j to produce $1.00
of industry j, then the consumption or technology matrix is C = (cij ):
where
95
6. Linear systems II: an application and homogeneous systems
If, as before, we denote by d the n × 1 outside demand vector, then in matrix form the
problem we wish to solve is to find the production vector x such that
(I − C)x = d,
a system of n linear equations in n unknowns.
Activity 6.2 Return to Example 6.1 and assume that the public demand for water
is $627 and for electricity is $4,955. Find the levels of output which satisfy all
demands exactly. (You should find that x1 = 1, 800 and x2 = 5, 500.)
6
6.2 Homogeneous systems and null space
Note that if Ax = 0 has a unique solution, then it must be the trivial solution,
x = 0.
If we form the augmented matrix, (A | 0), of a homogeneous system, then the last
column will consist entirely of zeros. This column will remain a column of zeros
throughout the entire row reduction, so there is no point in writing it. Instead, we use
Gaussian elimination on the coefficient matrix A, remembering that we are solving
Ax = 0.
x + y + 3z + w = 0
x−y+z+w = 0
y + 2z + 2w = 0
96
6.2. Homogeneous systems and null space
Activity 6.3 Work through the above calculation and state what row operation is
being done at each stage. For example, the first operation is R2 − R1 .
Then write down the solution from the reduced row echelon form of the matrix.
The solution is
x 3
y
= t 2 ,
x=
z −2 t∈R
w 1
which is a line through the origin, x = tv, with v = (3, 2, −2, 1)T . There are infinitely
many solutions, one for every t ∈ R. 6
This example illustrates the following fact.
Theorem 6.1 If A is an m × n matrix with m < n then Ax = 0 has infinitely many
R
solutions.
Read the proof of this theorem in the textbook A-H, where it is labelled as
Theorem 2.21. But before you do so, think about why the theorem is true and try
to prove it yourself. Why were there infinitely many solutions in the above example?
What about a linear system Ax = b? If A is m × n with m < n, does Ax = b have
infinitely many solutions? The answer is, that provided the system is consistent, then
there are infinitely many solutions, as the following examples show.
x+y+z = 6
x+y+z = 1
is inconsistent, since there are no values of x, y, z which can satisfy both equations.
These equations represent parallel planes in R3 .
x + y + 3z + w = 2
x−y+z+w = 4
y + 2z + 2w = 0
is consistent and will have infinitely many solutions. Notice that the coefficient
matrix of this linear system is the same matrix A as that used in the previous
example of a homogeneous system.
The augmented matrix is
1 1 3 1 2
(A|b) = 1 −1 1 1 4 .
0 1 2 2 0
97
6. Linear systems II: an application and homogeneous systems
Activity 6.4 Show that the reduced row echelon form of the augmented matrix is
1 0 0 −3 1
0 1 0 −2 −2 .
0 0 1 2 1
is a line which does not go through the origin. It is parallel to the line of solutions of
the homogeneous system, Ax = 0, and goes through the point determined by p.
This should come as no surprise, since the coefficient matrix forms the first four
columns of the augmented matrix. Compare the solution sets:
Ax = 0 : Ax = b :
1 0 1
The reduced row echelon form of the augmented matrix of a system Ax = b will always
contain the information for the solution of Ax = 0, since the matrix A is the first part
of (A|b). We therefore have the following definition.
Definition 6.2 (Associated homogeneous system) Given a system of linear
equations, Ax = b, the linear system Ax = 0 is called the associated homogeneous
system.
The solutions of the associated homogeneous system form an important part of the
solution of the system Ax = b, as we shall see in the next section.
Activity 6.5 Look at the reduced row echelon form of A in Example 6.13,
1 0 0 −3
0 1 0 −2 .
0 0 1 2
98
6.2. Homogeneous systems and null space
Explain why you can tell from this matrix that for all b ∈ R3 , the linear system
Ax = b is consistent with infinitely many solutions.
x1 + 2x2 + x3 = 1
2x1 + 2x2 = 2
3x1 + 4x2 + x3 = 2.
We now formalise the connection between the solution set of a consistent linear system,
and the null space of the coefficient matrix of the system.
Theorem 6.2 Suppose that A is an m × n matrix, that b ∈ Rm , and that the system
Ax = b is consistent. Suppose that p is any solution of Ax = b. Then the set of all
solutions of Ax = b consists precisely of the vectors p + z for z ∈ N (A); that is,
R {x | Ax = b} = {p + z | z ∈ N (A)}.
Read the proof of this in the textbook A-H, where it is labelled as Theorem 2.29.
Note that it uses the strategy (Section 2.1.4) of proving that two sets are equal by
showing that each is a subset of the other.
The above result is the ‘Principle of Linearity’. It says that the general solution of a
consistent linear system Ax = b is equal to any one particular solution p, where
Ap = b, plus the general solution of the associated homogeneous system.
99
6. Linear systems II: an application and homogeneous systems
which represents the intersection of two planes, since the equations 2z = 0 and z = 0
each represent the xy-plane. To find the solution, we continue to reduce the matrix B to
reduced row echelon form.
0 1 23
0 1 0
B −→ 0 0 1 −→ 0 0 1 .
0 0 0 0 0 0
The non-leading variable is x, so we set x = t, and the solution is
t 1
x = 0 = t0, t ∈ R
0 0
which is a line through the origin, namely the x-axis. So the plane 2y + 3z = 0
intersects the xy-plane along the x-axis.
We summarise what we have noticed so far:
100
6.2. Overview
Activity 6.7
Look at the example we solved in section 5.3.3 on page 86.
x1 + x2 + x3 + x4 + x5 = 3
6
2x1 + x2 + x3 + x4 + 2x5 = 4
x 1 − x2 − x3 + x4 + x5 = 5
x1 + x4 + x5 = 4.
Overview
We have explored an economic application of linear systems, known as input-output
analysis. We have defined the null space of a matrix and expanded our understanding of
linear systems by looking at solutions of homogeneous systems, and showing how the
solution set of a consistent linear system is related to the solution set of the associated
homogeneous system.
Learning outcomes
At the end of this chapter and the relevant reading you should be able to:
101
6. Linear systems II: an application and homogeneous systems
102
6.2. Comments on selected activities
103
6. Linear systems II: an application and homogeneous systems
104
Chapter 7
Matrix inversion
Introduction
How do we know if a matrix A is invertible, and how do we find the inverse if it is? In
this chapter we will answer these two questions using row operations. The ‘main
theorem’ which accomplishes this will feature throughout the course.
Only a square matrix can have an inverse, so in this chapter all matrices will be square
7
unless explicitly stated otherwise.
Aims
The aims of this chapter are to:
State and prove (using elementary matrices) the main result answering the
question, “When is a matrix A invertible?”
Deduce and demonstrate a method to find the inverse of a matrix using row
operations
Establish the result that if A and B are square matrices with AB = I, then A and
B are invertible and one is the inverse of the other.
R
Essential reading
Anthony, M. and M. Harvey. Linear Algebra: Concepts and Methods. Chapter 3,
Section 3.1
Further reading
If you would like to have another source containing the material covered in this chapter,
you can look up the key concepts and definitions (as listed in the synopsis which
follows) in any elementary linear algebra textbook, such as the two listed in Chapter 1,
using either the table of contents or the index.
105
7. Matrix inversion
Synopsis
We examine the effects of a row operation on a matrix and on the product of two
matrices. This leads us to the definition of an elementary matrix and we observe how
multiplying a matrix A on the left by an elementary matrix performs a single row
operation on A. This observation enables us to prove the main theorem, which states
that A is invertible if and only if any one of three other conditions holds. From the
proof of this theorem we deduce the method of finding the inverse of a matrix using row
operations. We then use the theorem to prove that if A and B are square matrices with
AB = I, then A and B are invertible and are inverses of each other.
106
7.1. Elementary matrices
Now consider the effect of a row operation on a product AB. For example, the first
matrix below is the product AB after the row operation ‘add 4 times row 1 of AB to
row 2 of AB’.
A1 B A1 B A1
A2 B + 4A1 B (A2 + 4A1 )B A2 + 4A1
.. = .. = .. B
. . .
An B An B An
For example,
1 0 0 0 1 0 1 0 0
0 3 0 1 0 0 4 1 0
0 0 1 0 0 1 0 0 1
are elementary matrices. The first has had row 2 multiplied by 3, the second had row 1
and row 2 interchanged, and the last matrix had 4 times row 1 added to row 2.
Elementary matrices provide a useful tool to relate a matrix to its reduced row echelon
form. We have shown above that the matrix obtained from a matrix B after performing
one row operation is equal to a product EB, where E is the elementary matrix obtained
from I by that same row operation.
107
7. Matrix inversion
We now want to look at the invertibility of elementary matrices and row operations.
Any elementary row operation can be undone by an elementary row operation.
RO1 is multiply a row by a non-zero constant.
To undo RO1, multiply the row by 1/(constant).
RO2 is interchange two rows.
To undo RO2 interchange the rows again.
RO3 is add a multiple of one row to another.
To undo RO3 subtract the multiple of one row from the other.
If we obtain an elementary matrix by performing one row operation on the identity, and
another elementary matrix from the row operation which ‘undoes’ it, then multiplying
these matrices together will return the identity matrix. That is, they are inverses of one
another. This argument establishes the following theorem.
Theorem 7.1 Any elementary matrix is invertible, and the inverse is also an
elementary matrix.
Activity 7.2 For the matrix E below, write down E −1 , and then show that
EE −1 = I and E −1 E = I.
1 0 0
E = −4 1 0 .
0 0 1
108
7.2. Row equivalence
We can undo this row operation and return the matrix B by multiplying on the left
by E1−1 ,
1 0 0 1 2 4 1 2 4
1 1 0 0 1 2 = 1 3 6.
0 0 1 −1 0 1 −1 0 1
reflexive A∼A
transitive A ∼ B and B ∼ C ⇒ A ∼ C
Activity 7.3 Argue why this is true: that is, explain why row equivalence as defined
above satisfies these three conditions.
The algorithm for putting a matrix A into reduced row echelon form by a sequence of
row operations means that every matrix is row equivalent to a matrix in reduced row
echelon form. This fact is stated in the following theorem.
Theorem 7.2 Every matrix is row equivalent to a matrix in reduced row echelon form.
109
7. Matrix inversion
It is particularly important for you to appreciate how one statement of this theorem
implies the next. As you read through the proof stop and think about this.
Proof
If we show that (1) ⇒ (2) ⇒ (3) ⇒ (4) ⇒ (1), then any one statement will imply all
the others, so the statements are equivalent.
(1) =⇒ (2). We assume that A−1 exists, and consider the system of linear equations
Ax = b where x is the vector of unknowns and b is any vector in Rn . We use the
matrix A−1 to solve for x by multiplying the equation on the left by A−1 ,
A−1 Ax = A−1 b =⇒ Ix = A−1 b =⇒ x = A−1 b.
This shows that x = A−1 b is a solution, and that it is the only possible solution. So
Ax = b has a unique solution for any b ∈ Rn .
7 (2) =⇒ (3). If Ax = b has a unique solution for all b ∈ Rn , then this is true for b = 0.
The unique solution of Ax = 0 must be the trivial solution, x = 0.
(4) =⇒ (1). We now make use of elementary matrices. If A is row equivalent to I, then
there is a sequence or row operations which reduce A to I, so there must exist
elementary matrices E1 , . . . , Er such that
Er Er−1 · · · E1 A = I.
Each elementary matrix has an inverse. We use these to solve the above equation for A,
−1
by first multiplying the equation on the left by Er−1 , then by Er−1 , and so on, to obtain
A = E1−1 · · · Er−1
−1
Er−1 I
This says that A is a product of invertible matrices, and therefore A is invertible.
(Recall from Chapter 3 that if A and B are invertible matrices of the same size, then
the product AB is invertible and its inverse is the product of the inverses in the reverse
order, (AB)−1 = B −1 A−1 .)
This proves the theorem.
110
7.4. Using row operations to find the inverse matrix
This tells us that if we apply the same row operations to the matrix I that we use to
reduce A to I, then we will obtain the matrix A−1 . That is,
Er Er−1 · · · E1 A = I , A−1 = Er · · · E1 .I
This gives us a method to find the inverse of a matrix A. We start with the matrix A
and we form a new, larger matrix by placing the identity matrix to the right of A,
obtaining the matrix denoted (A|I). We then use row operations to reduce this to
(I|B). If this is not possible (which will become apparent) then the matrix is not
invertible. If it can be done, then A is invertible and B = A−1 .
Example 7.2 We use this method to find the inverse of the matrix
7
1 2 4
A= 1 3 6.
−1 0 1
In order to determine if the matrix is invertible and, if so, to determine the inverse,
we form the matrix
1 2 4 1 0 0
(A | I) = 1 3 6 0 1 0.
−1 0 1 0 0 1
(We have separated A from I by a vertical line just to emphasise how this matrix is
formed. It is also helpful in the calculations.) Then we carry out elementary row
operations.
1 2 4 1 0 0
R2 − R1
0 1 2 −1 1 0
R3 + R1
0 2 5 1 0 1
1 2 4 1 0 0
0 1 2 −1 1 0
R3 − 2R2
0 0 1 3 −2 1
1 2 0 −11 8 −4
R1 − 4R3
0 1 0 −7 5 −2
R2 − 2R3
0 0 1 3 −2 1
1 0 0 3 −2 0
R1 − 2R2
0 1 0 −7 5 −2 .
0 0 1 3 −2 1
This is now in the form (I|B) so we deduce that A is invertible and that
3 −2 0
A−1 = −7 5 −2 .
3 −2 1
It is very easy to make mistakes when row reducing a matrix, so the next thing you
should do is check that AA−1 = I.
111
7. Matrix inversion
Activity 7.4 Do this. Check that when you multiply AA−1 , you get the identity
matrix I.
(In order to establish that this is the inverse matrix, you should also show
A−1 A = I, but we will forgo that here.)
If the matrix A is not invertible, what will happen? By the theorem, if A is not
invertible, then the reduced row echelon form of A cannot be I, so there will be a row of
zeros in the row echelon form of A.
Activity 7.5 Find the inverse, if it exists, of each of the following matrices
−2 1 3 2 1 3
A = 0 −1 1 B = 0 −1 1 .
7 1 2 0 1 2 0
R
invertible matrices, and A = B −1 and B = A−1 .
Read the proof of this theorem in the textbook A-H, where it is labelled as
Theorem 3.12. Before you do so, think about how you might prove it yourself. If
you can show that the homogeneous system of equations Bx = 0 has only the
trivial solution, x = 0, then by Theorem 7.3 this will prove that B is invertible.
Then you can use B −1 to complete the proof. Try it, and then read the textbook.
Overview
In this chapter we defined and used elementary matrices in order to establish the main
theorem concerning the invertibility of a matrix. The main theorem and its proof are of
fundamental importance, linking the concepts we have studied so far, and we shall be
using and adding to it in later chapters. Indeed, we have already used it to establish the
fact that if A and B are square matrices then it is sufficient to just show that AB = I
in order to conclude that B = A−1 . We no longer have to show that also BA = I.
Learning outcomes
At the end of this chapter and the relevant reading you should be able to:
112
7.5. Test your knowledge and understanding
113
7. Matrix inversion
1 2 0 0 0 1 1 2 0 0 0 1
R3 +2R1 (−1)R2
−→ 0 −1 1 0 1 0
−→ 0 1 −1 0 −1 0
0 5 3 1 0 2 0 5 3 1 0 2
1 2 0 0 0 1 1 R 1 2 0 0 0 1
!
R3 −5R2 8 3
−→ 0 1 −1 0 −1 0 −→ 0 1 −1 0 −1 0
1 5 1
0 0 8 1 5 2 0 0 1 8 8 4
2 6 1
1 2 0 0 0 1 1 0 0 − 8 8 2
R2 +R3 1 3 1 R1 −2R2
−→ 0 1 0 8 − 8 4 −→ 0 1 0 18 − 83 14
0 0 1 18 5
8
1
4
0 0 1 1
8
5
8
1
4
So
−2 6 4
1
A−1 = 1 −3 2
8
1 5 2
7
Now check that AA−1 = I.
When you carry out the row reduction, it is not necessary to always indicate the
separation of the two matrices by a line as we have done so far. You just need to keep
track of what you are doing.
In the calculation for the inverse of B, we have omitted the line but added a bit of
space to make it easier for you to read.
2 1 3 1 0 0 1 2 0 0 0 1
R1 ↔R3
(B|I) = 0 −1 1 0 1 0 −→ 0 −1 1 0 1 0
1 2 0 0 0 1 2 1 3 1 0 0
1 2 0 0 0 1 1 2 0 0 0 1
R3 −2R1 (−1)R2
−→ 0 −1 1 0 1 0 −→ 0 1 −1 0 −1 0
0 −3 3 1 0 −2 0 −3 3 1 0 −2
1 2 0 0 0 1
R3 +3R2
−→ 0 1 −1 0 −1 0
0 0 0 1 −3 −2
which indicates that the matrix B is not invertible; it is not row equivalent to the
identity matrix.
Comment on exercise
Solution to exercise 7.1
To prove this using Theorem 7.4, write (AB)(AB)−1 = I. By the associativity of matrix
multiplication, this says that A(B(AB)−1 ) = I, and by the theorem this implies that
A−1 exists. Multiplying in the opposite order, (AB)−1 (AB) = I shows in the same way
that B is invertible.
114
Chapter 8
Determinants
Introduction
The determinant provides an alternative and efficient way to answer the question of
invertibility of a square matrix. In this chapter we will establish another, often more
practical, way to find the inverse of a matrix. This will lead to another method, known
as Cramer’s rule, to find the solution of a system of n linear equations in n unknowns
with a unique solution.
The determinant is only defined for a square matrix, so in this chapter all matrices are
square unless indicated otherwise.
8
Aims
The aims of this chapter are to:
R
Essential reading
Anthony, M. and M. Harvey. Linear Algebra: Concepts and Methods. Chapter 3,
Sections 3.2–3.4
Further reading
If you would like to have another source containing the material covered in this chapter,
you can look up the key concepts and definitions (as listed in the synopsis which
follows) in any elementary linear algebra textbook, such as the two listed in Chapter 1,
using either the table of contents or the index.
115
8. Determinants
Synopsis
We define the terms minor and cofactor for a square matrix in order to define the
determinant of an n × n matrix as a cofactor expansion. We find that the same real
number, the determinant, can be obtained as a cofactor expansion by any row or any
column of the matrix. We then use this fact to discover properties of the determinant,
in particular we find how the determinant of a matrix is affected by changing the
matrix using row operations in order to facilitate its calculation. We next prove the
result that a matrix is invertible if and only if the determinant is non-zero, and see how
to find the inverse of a matrix using cofactors and the adjoint matrix. Finally we apply
this to finding a solution of a system of n linear equations in n unknowns if the
coefficient matrix has non-zero determinant, a method known as Cramer’s Rule.
8.1 Determinants
8 The determinant of a square matrix A is a particular real number intrinsically
associated with A, written |A| or det A. (You can think of it as a function from the set
of all square matrices to the real numbers.) How do we find this number and what is its
purpose?
The determinant will provide a quick way to determine whether or not a matrix A is
invertible. In view of this, suppose A is a 2 × 2 matrix, and that we wish to find A−1
using row operations. Then we form the matrix (A | I) and attempt to row reduce A to
I. We assume a 6= 0, otherwise we would begin by switching rows,
a b 1 0 (1/a)R1 1 b/a 1/a 0
(A | I) = −→
c d 0 1 c d 0 1
R2 −cR1 1 b/a 1/a 0 aR2 1 b/a 1/a 0
−→ −→ ,
0 d − cb/a −c/a 1 0 (ad − bc) −c a
which shows that A−1 exists if and only if ad − bc 6= 0.
For a 2 × 2 matrix, the determinant is given by the formula
a b
= a b = ad − bc.
c d c d
For example,
1 2
3 4 = (1)(4) − (2)(3) = −2.
116
8.1. Determinants
So the cofactor is equal to the minor if i + j is even, and it is equal to −1 times the
minor if i + j is odd.
There is a simple way to associate the cofactor Cij with the entry aij of the matrix. 8
Locate the entry aij and cross out the row and the column containing aij . Then
evaluate the determinant of the (n − 1) × (n − 1) matrix which remains. This is the
minor, Mij . Then give it a ‘+’ or ‘−’ sign according to the position of aij on the
following pattern:
+ − + − ···
− + − + ···
+ − + − ···.
.. .. .. .. . .
. . . . .
Activity 8.1 Write down the cofactor C13 for the matrix A above using this
method.
This is called the cofactor expansion of |A| by row one. It is a recursive definition,
meaning that the determinant of an n × n matrix is given in terms of (n − 1) × (n − 1)
determinants.
117
8. Determinants
Note that if A is a 2 × 2 matrix, then the determinant as defined earlier is just the
cofactor expansion:
a11 a12
a21 a22 == a11 C11 + a12 C12 = a11 a22 − a12 a21 .
You might ask, ‘Why is the cofactor expansion given by row 1, rather than any other
row?’ In fact it turns out that using a cofactor expansion by any row or column of A
will give the same number |A|, as the following theorem states.
Theorem 8.1 If A is an n × n matrix, then the determinant of A can be computed by
multiplying the entries of any row (or column) by their cofactors and summing the
8 resulting products:
|A| = ai1 Ci1 + ai2 Ci2 + . . . + ain Cin
(cofactor expansion by row i)
Before we look into a proof, note that this result allows you to choose any row or any
column of a matrix to find its determinant using a cofactor expansion. So you should
choose a row or column which gives the simplest calculations.
Obtaining the correct value for |A| is important, so it is a good idea to check your result
by calculating the determinant by another row or column.
Example 8.3 In the example we have been using (see page 117), instead of using
the cofactor expansion by row 1 as shown above, we can choose to evaluate the
determinant of the matrix A by row 3 or column 3, which will involve fewer
calculations since a33 = 0. To check the result |A| = 34, we will evaluate the
determinant again using column three. Remember the correct cofactor signs.
1 2 3
4 1 1 2
|A| = 4 1 1 = 3
− 1 −1 3 + 0 = 3(13) − (5) = 34.
−1 3 0 −1 3
118
R
8.2. Results on determinants
Read Section 3.2.2 of the text A-H, which gives an informal proof of
Theorem 8.1. The purpose of this section is to show that the determinant is a
number intrinsically defined by the matrix as a sum of signed elementary products
and that this very same number can be obtained as a cofactor expansion by any
row or column of the matrix.
|AT | = |A|.
Proof
This theorem follows immediately from Theorem 8.1. The cofactor expansion by row i
of |AT | is precisely the same, number for number, as the cofactor expansion by column i
of |A|. 8
Each of the following three statements follows from Theorem 8.1. By Theorem 8.2, it
follows that each is true if the word row is replaced by column. We will need these
results in the next section. In all of them we assume that A is an n × n matrix.
Corollary 3 If the cofactors of one row are multiplied by the entries of a different row,
R
then the result is 0.
Read the proofs of these Corollaries in Section 3.3 of the text A-H where they
are labelled Corollary 3.28, Corollary 3.29 and Corollary 3.30.
Activity 8.4 Write down any 3 × 3 matrix and use it to illustrate Corollary 3:
multiply the cofactors of one row by the entries of a different row. Then find the
determinant of your matrix using the correct row entries and verify it using a
cofactor expansion by a different row or column.
119
8. Determinants
a11 0 ... 0
Which row or column should we use for the cofactor expansion? Clearly the calculations
8 are simplest if we expand by column 1 or row n. Expansion by column 1 gives us
a22 . . . a2n
. .. ..
|A| = a11 .. . . .
0 ... a
nn
A square matrix in row echelon form is upper triangular. If we know how a determinant
is affected by a row operation, then this observation will give us an easier way to
calculate large determinants. We can use row operations to put the matrix into row
echelon form, keep track of any changes, and then easily calculate the determinant of
the reduced matrix. So how does each row operation affect the value of the determinant?
RO1 multiply a row by a non-zero constant
Suppose the matrix B is obtained from a matrix A by multiplying row i by a non-zero
constant α. For example,
a11 a12 . . . a1n a11 a12 . . . a1n
a21 a22 . . . a2n αa21 αa22 . . . αa2n
|A| = .. .. .. .. |B| = .. .. .. ..
. . . . . . . .
a a . . . a a a . . . ann
n1 n2 nn n1 n2
|B| = αai1 Ci1 + αai2 Ci2 + · · · + αain Cin = α(ai1 Ci1 + ai2 Ci2 + · · · + ain Cin ) = α|A|
120
8.2. Results on determinants
When we actually need this, we will use it to factor out a constant α from the
determinant, as
a11 a12 . . . a1n a11 a12 . . . a1n
αa21 αa22 . . . αa2n a21 a22 . . . a2n
.
.. .. .. .. = α .. .. .. .. .
. . . . . . .
a an2 . . . ann a an2 . . . ann
n1 n1
so |B| = −|A|. 8
Now let A be a 3 × 3 matrix and let B be a matrix obtained from A by interchanging
two rows. Then if we expand |B| using a different row, each cofactor contains the
determinant of a 2 × 2 matrix which is a cofactor of A with two rows interchanged, so
each will be multiplied by −1, and |B| = −|A|. To visualise this, consider for example
a b c g h i
|A| = d e f , |B| = d e f
g h i a b c
121
8. Determinants
a11 a12 ... a1n
a21 + 4a11 a22 + 4a12 . . . a2n + 4a1n
|B| = .. .. .. .. .
. . . .
an1 an2 ... ann
In general, in a situation like this, we can expand |B| by row j:
|B| = (aj1 + kai1 )Cj1 + (aj2 + kai2 )Cj2 + · · · + (ajn + kain )Cjn
= aj1 Cj1 + aj2 Cj2 + · · · + ajn Cjn + k(ai1 Cj1 + ai2 Cj2 + · · · + ain Cjn )
= |A| + 0
The last expression in brackets is 0 because it consists of the cofactors of one row
multiplied by the entries of another row. So this row operation does not change the
value of |A|.
There is no change in the value of the determinant if a multiple of one row is added
to another.
The final steps all use RO3, so there is no change in the value of the determinant.
Finally we evaluate the determinant of the upper triangular matrix
1 2
−1 4
0 1 −1 2
= 3 = −12.
0 0 4 −4
0 0 0 −1
A word of caution with row operations! What is the change in the value of |A|
(1) if R2 is replaced by R2 − 3R3 or
(2) if R2 is replaced by 3R1 − R2 ?
For (1) there is no change, but for (2) the determinant will change sign. Why? 3R1 − R2
is actually two elementary row operations: first multiply row 2 by −1 and then add
three times row 1 to it. When performing row operation RO3, you should always add a
multiple of another row to the row you are replacing. 8
Activity 8.5 You can shorten the writing in the above example by expanding the
4 × 4 determinant using the first column as soon as you have obtained the
determinant with zeros under the leading one. You will then be left with a 3 × 3
determinant to evaluate. Do this. Without looking at the example above, work
through the calculations in this way to evaluate
1 2 −1 4
−1 3 0 2
|A| = .
2 1 1 2
1 4 1 3
R
|AB| = |A||B|.
Read the proof of this theorem in the text A-H where it is labelled Theorem 3.37.
We will give two proofs of this theorem, the first proof follows easily from Theorem 7.3
and establishes |A| 6= 0 as another equivalent condition for a matrix A to be invertible.
123
8. Determinants
The second proof follows from our results on determinants and gives us another method
to calculate the inverse of a matrix.
Proof 1: We have already established this theorem indirectly by our arguments in the
previous section; we will repeat and collect them here.
By Theorem 7.3 on page 109, A is invertible if and only if the reduced row echelon form
of A is the identity matrix. Let R be the reduced row echelon form of A. Since A is a
square matrix, R is either the identity matrix, or a matrix with a row of zeros. (Indeed,
if R has a leading one in every row, then it must also have a leading one in every
column, and since it is n × n it must be the identity matrix. Otherwise, there is a row of
R without a leading one, and this must, therefore, be a row of zeros.)
So either R = I, which is the case if and only if A is invertible, with |R| = 1 6= 0; or
|R| = 0 because it has a row of zeros, which is the case if and only if A is not invertible.
As we have seen, row operations cannot alter the fact that a determinant is zero or
non-zero. By performing a row operation we might be multiplying the determinant by a
non-zero constant, or by −1, or not changing the determinant at all. Therefore we can
conclude that |A| = 0 if and only if the determinant of its reduced row echelon form,
8 |R| = 0, which is if and only if A is not invertible. Or, put the other way, |A| 6= 0 if and
only if |R| = 1, if and only if the matrix A is invertible.
Proof 2: We will now prove this theorem directly. Since it is an if and only if
statement, we must prove both implications.
First we show that if A is invertible, then |A| 6= 0. We assume A−1 exists, so that
AA−1 = I. Then taking the determinant of both sides of this equation,
|AA−1 | = |I| = 1. Applying Theorem 8.4 to the product,
|AA−1 | = |A| |A−1 | = 1.
If the product of two real numbers is non-zero, then neither number can be zero, which
proves that |A| =
6 0.
As a consequence of this argument we have the bonus result that
1
|A−1 | = .
|A|
We now show the other implication, that if |A| 6= 0 then A is invertible. To do this we
will construct A−1 , and to do this we need some definitions.
Definition 8.4 If A is an n × n matrix, the matrix of cofactors of A is the matrix
whose (i, j) entry is Cij , the (i, j) cofactor of A. The adjoint (referred to as the
adjugate in some textbooks) of the matrix A is the transpose of the matrix of cofactors.
That is, the adjoint of A, adj(A), is the matrix
C11 C21 . . . Cn1
C12 C22 . . . Cn2
adj(A) = ... .. .. .. .
. . .
C1n C2n . . . Cnn
Notice that column 1 of this matrix consists of the cofactors of row 1 of A (and row 1
consists of the cofactors of column 1 of A), and similarly for each column and row.
124
8.3. Matrix inverse using cofactors
|A| 0 · · · 0
8
0 |A| · · · 0
A adj(A) =
... .. .. .. = |A| I,
. . .
0 0 · · · |A|
First calculate |A| to see if A is invertible. Using the cofactor expansion by row 1,
125
8. Determinants
Change the minors into cofactors, by multiplying by −1 those minors with i + j equal
to an odd number. Finally transpose the result to form the adjoint matrix, so that
1 1 −4
1 1
⇒ A−1 = adj(A) = − 5 −11 −4 .
|A| 16
−9 7 4
As with all calculations, it is easy to make a mistake. Therefore, having found A−1 ,
the next thing you should do is check your result by showing that AA−1 = I,
1 2 3 1 1 −4 −16 0 0
1 1
− −1 2 1 5 −11 −4 = − 0 −16 0 = I.
8 16
4 1 1 −9 7 4
16
0 0 −16
Activity 8.6 Use this method to find the inverse of the matrix
1 2 3
A= 0 4 0
5 6 7
Remember: the adjoint matrix only contains the cofactors of A; the (i, j) entry is the
cofactor Cji of A. The entries only multiply the cofactors when calculating the
determinant of A, |A|.
|Ai |
xi =
|A|
where Ai is the matrix obtained from A by replacing the ith column with the vector b
Before you look at the proof of this theorem, let’s see how it works.
126
8.4. Overview
Example 8.6 Use Cramer’s rule to find the solution of the linear system
x + 2y + 3z = 7
−x + 2y + z = −3
4x + y + z = 5
We first check that |A| 6= 0. This is the same matrix A we used Example 8.5 to find
the inverse of a matrix on page 125; |A| = −16. Then applying Cramer’s rule, we
find x by evaluating the determinant of the matrix obtained from A by replacing
column 1 with b,
7 2 3
8
−3 2 1
5 1 1 −16
x= = =1
|A| −16
and in the same way we obtain y and z.
1 7 3 1 2 7
−1 −3 1 −1 2 −3
4 5 1 48 4 1 5 −64
y= = = −3 z= = =4
|A| −16 |A| −16
which can be easily checked by substitution into the original equations (or
R
multiplying Ax).
Read and work through the proof of this theorem in the text A-H, where it is
labelled Theorem 3.43. Be sure you understand how the proof works.
Activity 8.7 Can you think of any other methods you can use to obtain the
solution to Example 8.6?
Overview
In this chapter we have shown how to obtain the determinant of a square matrix, the
real number intrinsically associated with the matrix. We looked at properties of the
determinant and how its value is affected by changing the matrix using row operations.
127
8. Determinants
We then used this information to obtain the result that a square matrix A is invertible
if and only if |A| 6= 0. This led in turn to the method of finding the inverse of A using
cofactors (the adjoint matrix) and to Cramer’s rule.
Learning outcomes
At the end of this chapter and the relevant reading you should be able to:
128
8.4. Comments on selected activities
|A| = −32 6= 0
28 4 −12 −7 −1 3
1 1 1
A−1 = adj(A) = − 0 −8 0 = 0 2 0 .
|A| 32 8
−20 4 4 5 −1 −1
129
8. Determinants
130
Chapter 9
Rank, range and linear systems
Introduction
In this short chapter we aim to extend and consolidate what we have learned so far
about systems of equations and matrices, and tie together many of the results of the
previous chapters. We will intersperse an overview of the previous chapters with two
new concepts, the rank of a matrix and the range of matrix.
This chapter will serve as a synthesis of what we have learned so far in anticipation of a
return to these topics later in the guide.
Aims 9
The aims of this chapter are to:
R
Essential reading
Anthony, M. and M. Harvey. Linear Algebra: Concepts and Methods. Chapter 4.
Further reading
If you would like to have another source containing the material covered in this chapter,
you can look up the key concepts and definitions (as listed in the synopsis which
follows) in any elementary linear algebra textbook, such as the two listed in Chapter 1,
using either the table of contents or the index.
Synopsis
We start by introducing the rank of a matrix. We then show how the rank of a matrix is
connected with the set of solutions of linear systems corresponding to the matrix and
relate the number of free parameters in the general solution (when it exists) to the rank
of the corresponding matrix. Finally, we define the range of a matrix.
131
9. Rank, range and linear systems
There are several ways of defining the rank of a matrix, and we shall meet some other
(more sophisticated) ways later. All are equivalent. We begin with the following
definition.
Definition 9.1 (Rank of a matrix) The rank, rank(A), of a matrix A is the number of
non-zero rows in a row echelon matrix obtained from A by elementary row operations.
Notice that the definition only requires that the matrix A be put into row echelon form,
because by then the number of non-zero rows is determined. By a non-zero row, we
9 simply mean one that contains entries other than 0. Since every non-zero row of a
matrix in row echelon form begins with a leading one, this is equivalent to the following
definition.
Definition 9.2 The rank, rank(A), of a matrix A is the number of leading ones in a
row echelon matrix obtained from A by elementary row operations.
Reducing this to row echelon form using elementary row operations, we have:
1 2 1 1 1 2 1 1 1 2 1 1
2 3 0 5 → 0 −1 −2 3 → 0 1 2 −3
3 5 1 6 0 −1 −2 3 0 0 0 0
This last matrix is in row echelon form and has two non-zero rows (and two leading
ones), so the matrix M has rank 2.
132
9.2. Rank and systems of linear equations
has rank 3.
If a square matrix A of size n × n has rank n, then its reduced row echelon form has a
leading one in every row and (since the leading ones are in different columns) a leading
one in every column. Since every column with a leading one has zeros elsewhere, it
follows that the reduced echelon form of A must be I, the n × n identity matrix.
Conversely, if the reduced row echelon form of A is I, then by the definition of rank, A
has rank n. We therefore have one more equivalent statement to add to our theorem:
Theorem 9.2 If A is an n × n matrix, then the following statements are equivalent.
A−1 exists.
|A| 6= 0.
The rank of A is n.
x1 + 2x2 + x3 = 1
2x1 + 3x2 = 5
3x1 + 5x2 + x3 = 4.
The augmented matrix is the matrix B in the previous activity. When you reduced
B to find the rank, after two steps you found,
1 2 1 1 1 2 1 1 1 2 1 1
2 3 0 5 → 0 −1 −2 3 → 0 1 2 −3 .
3 5 1 4 0 −1 −2 1 0 0 0 −2
133
9. Rank, range and linear systems
x1 + 2x2 + x3 = 1
x2 + 2x3 = −3
0x1 + 0x2 + 0x3 = −2.
But this system has no solutions, since there are no values of x1 , x2 , x3 that satisfy
the last equation. It reduces to the false statement ‘0 = −2’, whatever values we give
the unknowns. We deduce, therefore, that the original system has no solutions, and
we say that it is inconsistent. Notice that in this case there is no reason to reduce
the matrix further.
If, as in Example 9.2, the row echelon form of an augmented matrix has a row of the
kind (0 0 . . . 0 a), with a 6= 0, then the original system is equivalent to one in which
there is an equation
0x1 + 0x2 + · · · + 0xn = a (a 6= 0).
Clearly this equation cannot be satisfied by any values of the xi s, and the system is
inconsistent.
9 Continuing with our example:
The rank of the coefficient matrix A is 2, but the rank of the augmented matrix
(A|b), is 3.
If a linear system is consistent then there can be no leading one in the last column of
the reduced augmented matrix, for that would mean there was a row of the form
(0 0 . . . 0 1). Thus, a system Ax = b is consistent if and only if the rank of the
augmented matrix is precisely the same as the rank of the matrix A.
x1 + 2x2 + x3 = 1
2x1 + 3x2 = 5
3x1 + 5x2 + x3 = 6.
134
9.2. Rank and systems of linear equations
This system has the same coefficient matrix A as Example 9.2, and the rank of A is
2. The augmented matrix for the system is the matrix M in Example 9.1 on page
132, which also has rank 2, so this system is consistent. Since the rank is 2 and there
are 3 columns in A, there is a free variable and therefore infinitely many solutions.
Activity 9.2 Write down a general solution for this system to verify these remarks.
If an m × n matrix A has rank m, then there will be a leading one in every row of an
echelon form of A, and in this case a system of equations Ax = b will never be
inconsistent; it will be consistent for all b ∈ Rm . Why? There are two ways to see this.
In the first place, if there is a leading one in every row of A, the augmented matrix
(A|b) can never have a row of the form (0 0 . . . 0 1). Second, the augmented matrix
also has m rows, its size is m × (n + 1). So the rank of (A|b) can never be more than m.
Example 9.4 Consider again the matrix B from Activity 9.1 on page 133, which
we interpreted as the augmented matrix B = (A|b) in Example 9.2, and its row
echelon form:
1 2 1 1 1 2 1 1
B = 2 3 0 5 → . . . → 0 1 2 −3 .
9
3 5 1 4 0 0 0 1
for some constants pi , which could be zero. This system will have infinitely many
solutions for any d ∈ R3 , because the number of columns is greater than the rank of
B. There is one column without a leading one, so there is one non-leading variable.
Suppose we have a consistent system, and suppose that the rank r is strictly less than
n, the number of unknowns. Then, as we have just seen in Example 9.4, the system in
reduced row echelon form (and hence the original one) does not provide enough
information to specify the values of x1 , x2 , . . . , xn uniquely. Let’s consider this in more
detail.
135
9. Rank, range and linear systems
Example 9.5 Suppose we are given a system for which the augmented matrix
reduces to the row echelon form
1 3 −2 0 2 0 0
0 0 1 2 0 3 1
0 0 0 0 0 1 5.
0 0 0 0 0 0 0
Here the rank (number of non-zero rows) is r = 3 which is strictly less than the
number of unknowns, n = 6.
Continuing to reduced row echelon form, we obtain the matrix
1 3 0 4 2 0 −28
0 0
1 2 0 0 −14 .
0 0 0 0 0 1 5
0 0 0 0 0 0 0
Activity 9.4 Verify this. What are the additional two row operations which need to
9 be carried out?
The variables x1 , x3 and x6 correspond to the columns with the leading ones and are
the leading variables. The other variables are the non-leading variables.
The form of these equations tells us that we can assign any values to x2 , x4 and x5 ,
and then the leading variables will be determined. Explicitly, if we give x2 , x4 , x5 the
arbitrary values s, t, u, where s, t, u represent any real numbers, the solution is given
by
There are infinitely many solutions because the so-called ‘free variables’ x2 , x4 , x5
can take any values s, t, u ∈ R.
Generally, for a consistent system, we can describe what happens when the row echelon
form has r < n non-zero rows (0 0 . . . 0 1 ∗ ∗ . . . ∗). If the leading one is in the kth
column, it is the coefficient of the variable xk . So if the rank is r and the leading ones
occur in columns c1 , c2 , . . . , cr then the general solution to the system can be expressed
in a form where the unknowns xc1 , xc2 , . . . , xcr (the leading variables) are given in terms
of the other n − r unknowns (the non-leading variables), and those n − r unknowns are
free to take any values. In Example 9.5, we have n = 6 and r = 3, and the 3 variables
x1 , x3 , x6 can be expressed in terms of the 6 − 3 = 3 free variables x2 , x4 , x5 .
136
9.3. General solution of a linear system in vector notation
In the case r = n, where the number of leading ones r in the echelon form is equal to
the number of unknowns n, there is only one solution to the system — for there is a
leading one in every column since the leading ones move to the right as we go down the
rows. In this case there is a unique solution obtained from the reduced echelon form. In
fact, this can be thought of as a special case of the more general one discussed above:
since r = n there are n − r = 0 free variables, and the solution is therefore unique.
We can now summarise our conclusions thus far concerning a general linear system of m
equations in n variables, written as Ax = b, where the coefficient matrix A is an m × n
matrix of rank r.
If the echelon form of the augmented matrix has a row (0 0 . . . 0 a), with a 6= 0,
the original system is inconsistent; it has no solutions. In this case
rank(A) = r < m and rank(A|b) = r + 1.
If the echelon form of the augmented matrix has no rows of the above type the
system is consistent, and the general solution involves n − r free variables, where r
is the rank of the coefficient matrix. When r < n there are infinitely many solutions,
but when r = n there are no free variables and so there is a unique solution.
A homogeneous system of m equations in n unknowns is always consistent. In this case
the last statement still applies. 9
The general solution of a homogeneous system involves n − r free variables, where r
is the rank of the coefficient matrix. When r < n there are infinitely many
solutions, but when r = n there are no free variables and so there is a unique
solution, namely the trivial solution, x = 0.
137
9. Rank, range and linear systems
where
−28 −3 −4 −2
0 1 0 0
−14 0 −2 0
p=
0 ,
v1 =
0 ,
v2 =
1 ,
v3 =
0 .
0 0 0 1
5 0 0 0
Applying the same method generally to a consistent system of rank r with n unknowns,
we can express the general solution of a consistent system Ax = b in the form
x = p + a1 v1 + a2 v2 + · · · + an−r vn−r .
Note that, if we put all the ai s equal to 0, we get a solution x = p, which means that
Ap = b, so p is a particular solution of the system. Putting a1 = 1 and the remaining
9 ai s equal to zero, we get a solution x = p + v1 , which means that A(p + v1 ) = b. Thus
Comparing the first and last expressions, we see that Av1 = 0. Clearly, the same
equation holds for v2 , . . . , vn−r . So we have proved the following.
If A is an m × n matrix of rank r, the general solution of Ax = b is the sum of:
x1 − x2 + x3 + x4 + 2x5 = 4
−x1 + x2 + x4 − x5 = −3
x1 − x2 + 2x3 + 3x4 + 4x5 = 7.
Show that your solution can be written in the form p + su1 + tu2 where Ap = b,
Au1 = 0 and Au2 = 0.
138
9.4. Range
9.4 Range
The range of a matrix A is defined as follows.
Definition 9.3 (Range of a matrix) Suppose that A is an m × n matrix. Then the
range of A, denoted by R(A), is the subset
R(A) = {Ax | x ∈ Rn }
of Rm . That is, the range is the set of all vectors y ∈ Rm of the form y = Ax for some
x ∈ Rn .
What is the connection between the range of a matrix A and a system of linear
equations Ax = b? If A is m × n, then x ∈ Rn and b ∈ Rm . If the system Ax = b is
consistent, then this means that there is a vector x ∈ Rn such that Ax = b, so b is in
the range of A. Conversely, if b is in the range of A, then the system Ax = b must have
a solution. Therefore, we have shown that for an m × n matrix A:
The range of A, R(A), consists of all vectors b ∈ Rm for which the system of
equations Ax = b is consistent.
9
Let’s look at R(A) from a different point of view. Suppose that the columns of A are
c1 , c2 , . . . , cn . Then we may write A = (c1 c2 . . . cn ). If x = (α1 , α2 , . . . , αn )T ∈ Rn ,
then the product Ax is equal to
Ax = α1 c1 + α2 c2 + · · · + αn cn .
Activity 9.6 You proved this result earlier; it is Theorem 4.2 in the subject guide.
Prove it again now to make sure you understand how and why it works.
(This is where we start to make good use of this equality.)
The equality says that R(A), the set of all matrix products Ax, is also the set of all
linear combinations of the columns of A. For this reason R(A) is also called the
column space of A. (More on this in Chapter 13.)
If A = (c1 c2 . . . cn ), where ci denotes column i of A, then we can write
139
9. Rank, range and linear systems
so
α1 + 2α2
R(A) = −α1 + 3α2 α1 , α2 ∈ R ;
2α1 + α2
or
R(A) = {α1 c1 + α2 c2 | α1 , α2 ∈ R} ,
1 2
where c1 = −1 and c2 = 3 are the columns of A.
2 1
9
Example 9.7 Consider the following systems of three equations in two unknowns.
x + 2y = 0
x + 2y = 1
−x + 3y = −5 −x + 3y = 5
2x + y = 3 2x + y = 2
Solving these by Gaussian elimination (or any other method) you will find that the
first system is consistent and the second system has no solution. The first system has
the unique solution (x, y)T = (2, −1)T .
The coefficient matrix of each of the systems is the same, and is equal to the matrix A
in the Example 9.7. For the first system,
1 2 0
x
A = −1 3, x = , b = −5 .
y
2 1 3
On the other hand, it is not possible to express the vector (1, 5, 2)T as a linear
combination of the column vectors of A. Trying to do so would lead to precisely the
same set of inconsistent equations.
140
9.4. Overview
Notice, also, that the homogeneous system Ax = 0 has only the trivial solution, and
that the only way to express 0 as a linear combination of the columns of A is by
0c1 + 0c2 = 0.
Activity 9.9 Look at your solution to Activity 9.5 on page 138, and express the
vector b = (4, −3, 7)T as a linear combination of the columns of the coefficient matrix
1 −1 1 1 2
A = −1 1 0 1 −1 .
1 −1 2 3 4
Overview
This chapter has drawn together what we have already learned about linear systems,
and added important new ingredients: rank and range. We now should have a better 9
understanding of the nature of the solution set to a linear system.
Learning outcomes
At the end of this chapter and the relevant reading you should be able to:
141
9. Rank, range and linear systems
Work Problems 4.3 and 4.5 in Chapter 4 of the text A-H. You will find the solutions on
the VLE.
Since Bp = d, multiplying Bp you will find that d = (1, 4, 5)T . (You can check all this
by row reducing (B|d).)
Feedback to activity 9.5
Put the augmented matrix into RREF:
1 −1 1 1 2 4 R2 +R1 1 −1 1 1 2 4
(A|b) = −1 1 0 1 −1 −3 R−→3 −R1
0 0 1 2 1 1
1 −1 2 3 4 7 −→ 0 0 1 2 2 3
1 −1 1 1 2 4 R2 −R3 1 −1 1 1 0 0
R3 −R2
−→ 0 0 1 2 1 −→
1 R1 −2R3 0 0 1
2 0 −1
0 0 0 0 1 2 −→ 0 0 0 0 1 2
1 −1 0 −1 0 1
R1 −R2
−→ 0 0 1 2 0 −1 .
0 0 0 0 1 2
Set the non-leading variables to arbitrary constants: x2 = s, x4 = t. Then solve for the
leading variables in terms of these parameters, starting with the bottom row.
For s, t ∈ R,
x5 = 2, x4 = t, x3 = −1 − 2t, x2 = s, x1 = 1 + s + t
x1 1+s+t 1 1 1
x2 s 0 1 0
x3 = −1 − 2t = −1 + s 0 + t −2 = p + su1 + tu2
x=
x4 t 0 0 1
x5 2 2 0 0
142
9.4. Comments on selected activities
Verify:
1
1 −1 1 1 2 0
4
Ap = −1 1 0 1 −1 −1 = −3
1 −1 2 3 4 0 7
2
1
1 −1 1 1 2 1
0
Au1 = −1 1 0 1 −1 0 = 0
1 −1 2 3 4 0 0
0
1
1 −1 1 1 2 0
0
Au2 = −1 1 0 1 −1 −2 = 0
1 −1 2 3 4 1 0
0
Ap = c1 − c3 + 2c5 = b.
You should write this out in detail and check that the sum of the vectors does add to
the vector b. Notice that this combination uses only the columns corresponding to the
leading variables:
1 1 2 4
−1 − 0 + 2 −1 = −3 .
1 2 4 7
Similarly, since Au1 = 0 and Au2 = 0, any linear combination of these two vectors will
give a vector v = su1 + tu2 for which Av = 0, and you can rewrite Av as a linear
combination of the columns of A. For example, taking u1 ,
1 −1 0
c1 + c2 = −1 + 1 = 0 .
1 −1 0
143
9. Rank, range and linear systems
144
Chapter 10
Sequences and series
Introduction
In this chapter and the next, we make a slight detour into the topic of sequences, series
and difference equations (also known as recurrence equations). Many problems in
economics and finance involve sequences, particularly those involving quantities which
change with time, but not continuously (such as the balance of a deposit account where
interest is paid once a year, at the end of the year). This chapter and the next one are
independent of the other chapters so far, but the material is important in its own right
and, moreover, we will see later that matrices and linear algebra can be used to solve
systems of difference equations.
Aims
10
The aims of this chapter are to:
Reading
This chapter and the next concern a topic that, although algebraic, is not linear
algebra, and therefore is not discussed in the A-H text, or in linear algebra books
generally. No significant additional reading is required, but we recommend the following
R
supplementary reading (which will also be useful for the next chapter of the guide).
Anthony, M. and N. Biggs. Mathematics for Economics and Finance: Methods
and Modelling. Chapters 3 and 4.
Synopsis
We start by introducing sequences, and two very important types: arithmetic and
geometric. We explain how geometric sequences arise naturally when modelling
compound interest. We then look at series, which involves finding the sum of members
145
10. Sequences and series
10.1 Sequences
y0 = 0, y1 = 1, y2 = 4, y3 = 9, y4 = 16, . . .
yt = 2t + 3 (t ≥ 0).
10
10.1.2 Arithmetic progressions
The arithmetic progression with first term a and common difference d has its terms
given by the formula yt = a + dt. For example, the arithmetic progression with first
term 5 and common difference 3 is 5, 8, 11, 14, . . .. Note that yt is obtained from yt−1 by
adding the common difference d. In symbols, yt = yt−1 + d.
146
10.1. Sequences
Generally, if the annual percentage rate of interest is R%, then the interest rate is
r = R/100 and in the course of one year, a balance of $P becomes $P + rP =
$(1 + r)P . One year after that, the balance in dollars becomes $(1 + r)((1 + r)P ), which
is $(1 + r)2 P . Continuing in this way, we can see that if P dollars are deposited in an
account where interest is paid annually at rate r, and if no money is taken from or
added to the account, then after t years we have a balance of P (1 + r)t dollars. This
process is known as compounding (or compound interest), because interest is paid on
interest previously added to the account.
Activity 10.1 Suppose that $1, 000 is invested in an account that pays interest at a
fixed rate of 7%, paid annually. How much is there in the account after four years?
dollars, which is slightly more than the $108 which results from the single annual
addition. If the interest is added quarterly (so that 2% is added four times a year), the 10
amount after one year will be
dollars (approximately). In general, when the year is divided into m equal periods, the
rate is r/m over each period, and the balance after one year is
r m
P 1+ ,
m
where P is the initial deposit.
Taking m larger and larger — formally, letting m tend to infinity — we find ourselves in
the situation of continuous compounding. Now, it is a standard fact (that we won’t
verify here) that, as m gets larger and larger, tending to infinity,
r m
1+
m
approaches er , where e is the base of the natural logarithm. (See the subject guide for
MT1174 Calculus.) Formally,
r m
lim 1 + = er .
m→∞ m
So the balance after one year should be P er . If invested for a further year, we would
have P er er = P (er )2 = P e2r . After t years continuous compounding, the balance of the
account would be P ert .
147
10. Sequences and series
10.2 Series
Let us continue with the story of our investor. It is natural to investigate how the
balance varies if the investor adds a certain amount to the account each year. Suppose
that they add $P to the account at the beginning of each year, so that at the beginning
of the first year the balance is $P . At the beginning of the second year the balance in
dollars will be $P (1 + r) + P ; this represents the money from the first year with interest
added, and the new, further, deposit of $P . Convince yourself that, continuing in this
way, the balance at the beginning of year t is, in dollars,
P + P (1 + r) + · · · + P (1 + r)t−2 + P (1 + r)t−1 .
How can we calculate this expression? Note that it is the sum of the first t terms (that
is, term 0 to term t − 1) of the geometric progression with first term P and common
ratio 1 + r. Before coming back to this, we shall discuss such things in a more general
setting.
Given a sequence y0 , y1 , y2 , y3 , . . ., a finite series is a sum of the form
y0 + y2 + · · · + yt−1 ,
the first t terms added together, for some number t. There are two important results
10 about series, concerning the cases where the corresponding sequence is an arithmetic
progression (in which case the series is called an arithmetic series) and where it is a
geometric progression (in which case the series is called a geometric series).
then
t(2a + (t − 1)d)
St = .
2
There is a useful way of remembering this result. Notice that St may be rewritten as
so that we have the following easily remembered result: an arithmetic series has value
equal to the number of terms, t, times the value of the average of the first and last
terms (y0 + yt−1 )/2. Equivalently, the average value St /t of the t terms is the average,
(y0 + yt−1 )/2 of the first and last terms.
Activity 10.2 Find the sum of the first n terms of an arithmetic series whose first
term is 1 and whose common difference is 5.
148
10.3. Finding a formula for a sequence
is therefore given by
a(1 − xt )
St = .
1−x
Example 10.1 In our earlier discussion on savings accounts, we came across the
expression
P + P (1 + r) + . . . + P (1 + r)t−2 + P (1 + r)t−1 .
We now see that this is a geometric series with t terms, first term P and common
ratio 1 + r. Therefore it equals
1 − (1 + r)t P
(1 + r)t − 1 .
P =
1 − (1 + r) r
10
Activity 10.3 Find an expression for
y1 = 2y0 + 1 = 2(1) + 1 = 2 + 1
y2 = 2y1 + 1 = 2(2 + 1) + 1 = 22 + 2 + 1
y3 = 2y2 + 1 = 2(22 + 2) + 1 = 23 + 22 + 2 + 1
y4 = 2y3 + 1 = 2(23 + 22 + 2 + 1) + 1 = 24 + 23 + 22 + 2 + 1.
yt = 2t + 2t−1 + · · · + 22 + 1.
149
10. Sequences and series
yt = 1 + 2 + 22 + · · · + 2t−1 + 2t ,
from which it is clear that this is the sum of the first t + 1 terms of the geometric
progression with first term 1 and common ratio 2. By the formula for the sum of a
geometric series, we have
1 − 2t+1
yt = = 2t+1 − 1,
1−2
xt → 0 as t → ∞ or lim xt = 0.
t→∞
We notice that, while xt gets closer and closer to 0 for all values of x in the range
10 −1 < x < 1, its behaviour depends to some extent on whether x is positive or negative.
When x is negative, the terms are alternately positive and negative, and we say that the
approach to zero is oscillatory. For example, when x = −0.2, the sequence xt is
When x is less than −1, the sequence is again oscillatory, but it does not approach any
limit, the terms being alternately large-positive and large-negative. In this case, we say
that at oscillates increasingly.
As an application of this, let us consider again the geometric series
St = a + ax + ax2 + · · · + axt−1 .
We have
a(1 − xt )
St = .
1−x
If −1 < x < 1 then xt → 0 as t → ∞. This means that St approaches the number
a(1 − 0)/(1 − x) = a/(1 − x), as t increases. In other words,
a
St → as t → ∞.
1−x
We call this limit the sum to infinity of the sequence given by yt = axt . Note that a
geometric sequence has a finite sum to infinity only if the common ratio is strictly
between −1 and 1.
3
See Anthony and Biggs, Section 3.3.
150
10.5. Financial applications
Example 10.3 Consider the sequence with yi = 1/2i for i ≥ 0. The sum of the first
t terms of this sequence is
1 1 1
St = 1 + + 2 + · · · + t−1 .
2 2 2
By the formula for the sum of a geometric series,
t !
1
St = 2 1 − ,
2
Example 10.4 John has opened a savings account with a bank, and they pay a
fixed interest rate of 5% per annum, with the interest paid once a year, at the end of
the year. He opened the savings account with a payment of $100 on 1 January 2003,
and will be making deposits of $200 yearly, on the same date. What will his savings
be after he has made N of these additional deposits? (Your answer will be an
expression involving N .)
If yN is the required amount, then we have
y1 = (1.05)100 + 200,
y2 = (1.05)y1 + 200
= 100(1.05)2 + 200(1.05) + 200,
1 − (1.05)N
= 100(1.05)N + 200
1 − (1.05)
= 100(1.05)N + 4000 (1.05)N − 1 ,
151
10. Sequences and series
Overview
We have explored arithmetic and geometric sequences and series and shown how these
can be used to find general formulae for other types of sequence. We have also seen how
to apply this material to financial modelling. In the next chapter, we continue the study
of sequences, by looking at how to find formulae for sequences defined by a difference
equation (that is, recursively).
Learning outcomes
At the end of this chapter and the relevant reading you should be able to:
explain what is meant by arithmetic and geometric progressions, and calculate the
sum of finite arithmetic and geometric series
explain compound interest and calculate balances under compound interest
apply sequences and series in finance
analyse the long-term behaviour of series and sequences
Exercises
Exercise 10.1
A geometric progression has a sum to infinity of 3 and has second term y1 equal to 2/3.
Show that there are two possible values of the common ratio x and find the
corresponding values of the first term a.
Exercise 10.2
Suppose we have an initial amount, A0 , to invest and we add an additional investment
F at the end of each subsequent year. All investments earn an interest of i% per annum,
paid at the end of each year.
(a) Use the formula for the sum of a geometric series to derive a formula for the value of
the investment, An , after n years.
(b) An investor puts $10, 000 into an investment account that yields interest of 10% per
annum. The investor adds an additional $5, 000 at the end of each year. How much will
there be in the account at the end of five years? Show that if the investor has to wait N
years until the balance is at least $80, 000, then
ln(13/6)
N≥ .
ln(1.1)
152
10.5. Comments on selected activities
Exercise 10.3
An amount of $1,000 is invested and attracts interest at a rate equivalent to 10% per
annum. Find expressions for the total after one year if the interest is compounded:
(a) annually
(b) quarterly
(c) monthly
Exercise 10.4
Suppose yi = 1/22i . Find the limit, as t → ∞, of
St = y0 + y2 + · · · + yt−1 .
2(1 − 3n+1 )
= 3n+1 − 1.
1−3
As t → ∞, (2/3)t → 0 and so St → 2.
153
10. Sequences and series
Comments on exercises
Solution to exercise 10.1
We know that the sum to infinity is given by the formula a/(1 − x) and that y1 = ax.
Therefore, the given information is
a 2
= 3, ax = .
1−x 3
From the first equation, a = 3(1 − x) and the second equation then gives
3(1 − x)x = 2/3, from which we obtain the quadratic equation 9x2 − 9x + 2 = 0. This
has the two solutions x = 2/3 and x = 1/3. The corresponding values of the first term a
(given by a = 3(1 − x)) are 1 and 2, respectively. So, as suggested by the question, there
are two geometric progressions that have the required sum to infinity and second term.
i
A3 = 1+ A2 + F
100
2 !
i i i
= 1+ A0 1 + +F 1+ +F +F
100 100 100
3 2
i i i
= A0 1 + +F 1+ +F 1+ + F.
100 100 100
In general, if we continued, we could see that
n n−1 n−2
i i i i
An = A0 1 + +F 1+ +F 1+ + ··· + F 1 + + F.
100 100 100 100
Now,
n−1 n−1
i i i i
F 1+ + ··· + F 1 + +F = F +F 1+ + ··· + F 1 +
100 100 100 100
n
F (1 − (1 + i/100) )
=
1 − (1 + i/100)
n
100F i
= 1+ −1 ,
i 100
154
10.5. Comments on exercises
where we have used the formula for the sum of a geometric progression. Therefore
n n
i 100F i
An = A0 1 + + 1+ −1 .
100 i 100
For (b), we use the formula just obtained, with A0 = 10000, i = 10, F = 5000 and
n = 5, and we see that
5 5 !
10 100(5000) 10
A5 = 10000 1 + + 1+ −1
100 10 100
= 10000 (1.1)5 + 50000 (1.1)5 − 1
= 46630.60.
Now, for the balance to be at least $80, 000 after N years, we need AN ≥ 80000 which
means
10000 (1.1)N + 50000 (1.1)N − 1 ≥ 80000.
This is equivalent, after a little manipulation, to
60000(1.1)N ≥ 130000,
or (1.1)N ≥ 13/6. To solve this, we can take logarithms and see that we need
N ln(1.1) ≥ ln(13/6),
10
so
ln(13/6)
N≥ ,
ln(1.1)
as required.
155
10. Sequences and series
10
156
Chapter 11
Difference equations
Introduction
We now turn our attention to sequences that are defined recursively or (equivalently) by
a difference equation. Difference equations occur naturally in mathematical modelling,
where we know how one member of a sequence is obtained from previous members and
what we would like to do is find a general formula for the sequence.
Aims
The aims of this chapter are to:
explain what is meant by a first-order difference equation and show how to solve
them
explain what is meant by a second-order difference equation and show how to solve
them
Reading
R
We recommend the following supplementary reading.
Anthony, M. and N. Biggs. Mathematics for Economics and Finance: Methods
and Modelling. Chapters 3, 4, 5 and 23. (Note that this text uses the phrase
‘recurrence equation’ rather than ‘difference equation’, but they mean the same
thing.)
Synopsis
We start by explaining what is meant by a first-order difference equation. We then
present a method for solving them, discussing the behaviour of the solution, and
157
11. Difference equations
The equation
yt = ayt−1 + b, t ≥ 1,
where a and b are numbers is called a first-order linear difference equation with
11 constant coefficients and the value of y0 is known as an initial condition. It is said
to be first-order because the equation only involves the previous value of the sequence.
Once the value of y0 is known, all the numbers yt of the sequence are determined.
Important point: It should, of course, be understood that the difference equation
yt = ayt−1 + b (t ≥ 1)
is entirely equivalent to the difference equation
yt+1 = ayt + b (t ≥ 0).
They say precisely the same thing about the sequence.
By a solution of a first-order difference equation we mean an expression for yt as a
function of the positive integer t, depending on the initial condition y0 . The difference
equation yt = yt−1 + 3, with y0 = 5, has, as we have seen, the solution yt = 5 + 3t.
The question we want to consider here is how to determine such explicit solutions to
any linear first-order difference equation.
It is easy to see (and you should convince yourself of this) that if a = 1 then the
sequence of numbers yt given by the general first-order difference equation
yt = ayt−1 + b
is an arithmetic progression with common difference b and first term y0 . Therefore we
shall discuss the solution to such a general difference equation when a 6= 1.
158
11.2. Solving first-order difference equations
Activity 11.1 Write down the solution of the difference equation yt = yt−1 + b for a
fixed constant b and initial condition y0 .
y1 = 2y0 + 1 = 2(1) + 1 = 3
y2 = 2y1 + 1 = 2(3) + 1 = 7
y3 = 2y2 + 1 = 2(7) + 1 = 15,
and so on.
Now, suppose you wanted to know the value of y312 . Do you really want to have to
carry out 312 calculations of the type we have just seen? Certainly not! Which is
why we want a solution of the difference equation, a formula or expression for yt
involving only t and y0 (and not yt−1 ).
159
11. Difference equations
You might wonder where this formula comes from. We’ll briefly give an indication, but
first we state this as a theorem, just to make it clear how important it is.
Theorem 11.1 The general solution of the equation yt = ayt−1 + b (with a 6= 1) is
yt = y ∗ + (y0 − y ∗ )at
where y ∗ = b/(1 − a).
To see why this is true, first, note that the difference equation can be written as
yt − ayt−1 = b
and that, by the way in which y ∗ is defined, y ∗ − ay ∗ = b. This means that the constant
sequence yt = y ∗ is a particular solution of the difference equation. Now suppose we
considered the homogeneous difference equation yt − ayt−1 = 0. It’s easy to see that, for
any constant k, yt = kat satisfies this (and, indeed, is the general solution, meaning
that all solutions will look like this). So (just as for linear systems of equations, Ax = b,
11 – and the analogy is a real one, not just a coincidence), if we consider yt = y ∗ + kat , the
sum of the particular solution and the general solution to the homogeneous equation,
we will have
yt − ayt−1 = (y ∗ + kat ) − a(y ∗ + kat−1 ) = (y ∗ − ay ∗ ) + k(at − at ) = y ∗ − ay ∗ = b.
So yt = y ∗ + kat is a solution, for any k. To get a solution that has the right value, y0 ,
when t = 0, we must have (on substituting t = 0) y0 = y ∗ + ka0 = y ∗ + k, which is why
R
we take k = y0 − y ∗ .
Read Section 3.2 of the Anthony and Biggs book for a slightly different (and
fuller) explanation of why this theorem is true.
160
11.3. Long-term behaviour of solutions
yt = 5yt−1 + 6,
Alternatively, you can use the fact that the general solution is the sum of a
particular solution (the constant solution) and the general solution of the
homogeneous equation. The constant solution y ∗ can be found by substituting this
constant into the equation,
6 3
y ∗ = 5y ∗ + 6 =⇒ y∗ = =− .
−4 2
The homogeneous equation is yt − 5yt−1 = 0 with general solution yt = k(5t ).
Therefore the general solution of the equation is yt = −3/2 + k5t . Substituting the
initial value y0 = 5/2 into this equation produces the solution:
5 3 3
y0 = =− +k ⇒ k=4 ⇒ yt = − + 4(5)t .
2 2 2 11
Activity 11.3 Suppose that yt = (2/3)yt−1 + 5 and that y0 = 2. Find yt .
In the first of these cases (a > 1), whether yt → ∞ or yt → −∞ will, of course, depend
on the sign of y0 − y ∗ .
161
11. Difference equations
Activity 11.4 The case a = −1 is not covered in the table just given. How does the
solution yt behave in this case?
162
11.5. Financial applications of first-order difference equations
pt = 22 + (p0 − 22)(−0.5)t .
Note that the time-independent solution is the equilibrium price p∗ = 22, and that in
this case (if p0 6= p∗ ) the sequence approaches p∗ in an oscillatory way. We say that
we have a stable cobweb. However, it is possible, for other supply and demand
curves, that the price oscillates around the equilibrium price with ever-increasing
magnitude. In such cases, the price does not approach p∗ and we say we have an
R
unstable or exploding cobweb.
Read Chapter 5 of the Anthony and Biggs book for a more extensive discussion
of the cobweb model, including an analysis of the general case and a
characterisation of when it is stable.
yt = (1 + r)yt−1 − I, where y0 = P.
This is another case of the first-order linear difference equation, in standard form with
a = 1 + r and b = −I. The time-independent solution is therefore y ∗ = I/r. The general
solution is yt = y ∗ + (y0 − y ∗ )at , and since y0 = P we obtain
I I
yt = + P − (1 + r)t .
r r
163
11. Difference equations
This formula enables us to answer a number of questions. First, we might want to know
how large the withdrawals I can be given an initial investment of P , if we want to be
able to withdraw I annually for N years. The condition that nothing is left after N
years is, yN = 0. This is
I I
+ P− (1 + r)N = 0,
r r
and rearranging, we get
I
(1 + r)N − 1 = P (1 + r)N ,
r
so that
r(1 + r)N
I(P ) = P.
(1 + r)N − 1
An ‘inverse’ question is: what principal P is required to provide an annual income I for
the next N years? Rearranging the equation gives the result
I 1
P (I) = 1− .
r (1 + r)N
yt+2 + a1 yt+1 + a2 yt = 0, t ≥ 0.
We want to find a general solution of this equation. The general solution will need to
have two arbitrary constants so that a specific solution can be found once y0 and y1 are
given (just as the general solution of a first-order equation contains one arbitrary
constant).
As the equation is linear and homogeneous, two very useful properties apply (compare
this with homogeneous systems of linear equations, Ax = 0):
164
11.6. Homogeneous second-order difference equations
Activity 11.5 Show this. Given two sequences xt and zt which satisfy the
homogeneous difference equation yt+2 + a1 yt+1 + a2 yt = 0, show that xt + zt and cxt ,
where c is a constant, also satisfy this equation
Therefore, it follows that if we know two solutions xt and zt of the difference equation,
then
yt = Axt + Bzt
is also a solution for any constants A, B ∈ R.
We next set about finding solutions. Knowing what we do about the general solution of
the homogeneous first-order equation (a geometric progression), let’s try a solution of
the form yt = mt where m is some constant to be determined.
Substituting yt = mt into the difference equation we obtain
yt+2 + a1 yt+1 + a2 yt = mt+2 + a1 mt+1 + a2 mt = mt (m2 + a1 m + a2 ) = 0.
If m = 0, we get yt = 0, so we ignore this possibility. Then yt = mt is a solution of the
difference equation if and only if m satisfies m2 + a1 m + a2 = 0. This equation,
z 2 + a1 z + a2 = 0. is known as the auxiliary equation.
Let’s look at an example.
yt = A(2t ) + B(3t ), A, B ∈ R.
Now suppose we are given initial conditions y0 = 1 and y1 = 4. Then the difference
equation determines all remaining numbers in the sequence. We have
y2 = 5y1 − 6y0 = 5(4) − 6(1) = 14, y3 = 5y2 − 6y1 , and so on.
The specific solution of the difference equation with these initial conditions is found
by substituting t = 0 and t = 1 into the general solution. We have
y0 = A + B = 1 and y1 = 2A + 3B = 4.
Solving these equations for A and B, we find B = 2 and A = −1. Therefore the
solution of yt+2 − 5yt+1 = 6yt = 0 with initial conditions y0 = 1 and y1 = 4 is
yt = −(2t ) + 2(3t ).
165
11. Difference equations
When the auxiliary equation has two distinct solutions, α and β, the general
solution is
yt = Aαt + Bβ t (A, B constants).
In any specific case, A and B are determined by the initial values y0 and y1 , as in
Example 11.6.
When the auxiliary equation has just one solution, α, the general solution is
As in the previous case, the values of the constants C and D can be determined by
using the initial values y0 and y1 .
The auxiliary equation has no real solutions when the quantity a21 − 4a2 is negative.
In that case, 4a2 − a21 is positive, and hence so is a2 . Thus there is a positive square
√
root r of a2 ; that is, we can define r = a2 . In order to write down the general
solution in this case we define the angle θ by
a1 a1
cos θ = − =− √ .
2r 2 a2
11
Then the general solution in this case is
yt − 6yt−1 + 5yt−2 = 0.
166
11.6. Homogeneous second-order difference equations
yt + 6yt−1 + 9yt−2 = 0.
yt − 2yt−1 + 4yt−2 = 0,
√
and y0 = 1, y1 = 1 − 3. Here, the auxiliary equation, z 2 − 2z + 4 = 0, has no real
solutions,
√ so we are in the third case. In the notation used above, we have
r = 4 = 2. It follows that
167
11. Difference equations
yt + a1 yt−1 + a2 yt−2 = k
where k is a constant.
By analogy with the first-order case, we start by looking for a constant solution yt = y ∗
for all t. For this we require
k
y ∗ + a1 y ∗ + a2 y ∗ = k, or y ∗ =
1 + a1 + a2
Activity 11.6 Suppose that yt − 5yt−1 − 14yt−2 = 18, and that y0 = −1, y1 = 8.
Find yt .
Example 11.10 We want to find the general solution of the difference equation
11 yt+2 − 6yt+1 + 5yt = 8.
z 2 − 6z + 5 = (z − 1)(z − 5) = 0
yt = A + B(5t ) − 2t, A, B ∈ R.
168
11.8. Behaviour of solutions
Investment, I 11
Income, Y
Consumption, C
Suppose we can measure each of the quantities in successive time periods of equal
length (for example, each year). Denote by It , Yt , Ct the values of the key quantities in
time-period t. Then we have a sequence of values I0 , I1 , I2 , . . ., and similarly for the other
quantities. We shall assume that the equilibrium condition Yt = Ct + It holds for each t.
In the multiplier-accelerator model, we assume that the following equations link the
key quantities:
169
11. Difference equations
3 1
Ct = Yt−1 , It = 40 + (Yt−1 − Yt−2 )
8 8
and let’s assume the equilibrium condition Yt = Ct + It holds. Let’s suppose that
Y0 = 65 and Y1 = 64.5, and try to determine an expression for Yt .
Arguing as above, we have
Yt = Ct + It
3 1
= Yt−1 + 40 + (Yt−1 − Yt−2 )
8 8
1 1
= 40 + Yt−1 − Yt−2 ,
2 8
so
1 1
Yt − Yt−1 + Yt−2 = 40.
2 8
The auxiliary equation is
1 1
z 2 − z + = 0,
2 8
which has discriminant (1/2)2 − 4(1/8) = −1/4. This is negative, so there are no
(real) solutions. We are therefore in the third case of a second-order difference
equation.
p To proceed, √ we use the method given above. We have
r = 1/8 = 1/(2 2), and
11 √
(−1/2) 2 2 1
cos θ = − = =√ ,
2r 4 2
so θ = π/4. Thus, the general solution to the homogeneous equation in this case is
t
1
√ (E cos(πt/4) + F sin(πt/4)) .
2 2
1 1
Yt − Yt−1 + Yt−2 = 40.
2 8
Trying Yt = k, a constant, we see that k − (1/2)k + (1/8)k = 40, so k = 64. It follows
that for some constants E and F ,
t
1
yt = 64 + √ (E cos(πt/4) + F sin(πt/4)) .
2 2
Y0 = 64 + E cos(0) + F sin(0) = 64 + E,
170
11.9. Overview
so E = 1. Also,
1 1
Y1 = 64 + E √ cos(π/4) + F √ sin(π/4)
2 2 2 2
1 1 1 1
= 64 + E √ √ +F √ √
2 2 2 2 2 2
E F
= 64 + + ,
4 4
and since this is 64.5, we have E + F = 2 and hence F = 1. The final answer is
therefore t
1 πt πt
Yt = 64 + √ cos + sin .
2 2 4 4
Overview
In this chapter we have considerably expanded our understanding of sequences by
studying methods for determining sequences that are defined by difference equations.
We have seen, too, that these have important applications in economics and finance.
Learning outcomes
At the end of this chapter and the relevant reading you should be able to: 11
solve problems involving first-order difference equations
solve second-order difference equations
analyse the behaviour of solutions to difference equations
solve problems involving the application of difference equations.
Exercises
Exercise 11.1
Planners believe that, as a result of a recent government grant scheme, the number of
new high technology businesses starting up each year will be N . There are already 3,000
such businesses in existence in the country, but it is expected that each year 5% of all
those in existence at the beginning of the year will fail (shut down). Let yt denote the
171
11. Difference equations
yt = 0.95yt−1 + N.
Solve this difference equation for general N . Find a condition on N which will ensure
that the number of businesses will increase from year to year.
Exercise 11.2
The supply and demand functions for a good are
Find the equilibrium price. What is the inverse demand function pD (q)? Suppose that
the sequence of prices pt is determined by pt = pD (q S (pt−1 )) (as in the cobweb model).
Find an expression for pt .
Exercise 11.3
A market for a commodity is modelled by taking the demand and supply functions as
follows:
D(p) = 1 − p,
S(p) = p,
so that when the price p prevails the amount of commodity demanded by the market is
D(p) and the amount which producers will supply is S(p). Price adjusts over time t in
response to the excess of the demand over the supply according to the equation:
11
pt+1 − pt = a(D(pt ) − S(pt )),
where a is a positive constant. Initially the price p is p0 = 43 . Solve this equation and
show that over time the price adjusts towards the clearing value (i.e. the price at which
supply and demand are equal) if and only if
0 < a < 1.
Under what circumstances does the price tend towards the equilibrium price in an
oscillatory fashion? What happens to the price if a = 12 ?
Exercise 11.4
Find the general solution of the difference equation
yt − yt−1 − 6yt−2 = 0.
Exercise 11.5
(a) Suppose that consumption this year is the average of this year’s income and last
year’s consumption; that is,
1
Ct = (Yt + Ct−1 ).
2
172
11.9. Comments on selected activities
Suppose also that the relationship between next year’s income and current investment is
Yt+1 = kIt , for some positive constant k. Show that, if the equilibrium condition
Yt = Ct + It holds, then
k+1 k
Yt − Yt−1 + Yt−2 = 0.
2 2
(b) In the model set up in part (a), suppose that k = 3 and that the initial value Y0 is
positive. Show that Yt oscillates with increasing magnitude.
(c) Find the values of k for which the model set up in part (a) leads to an oscillating Yt ,
and determine whether or not the oscillations increase in magnitude. (Remember we are
given that k > 0.)
b 5
y∗ = = = 15,
1−a 1 − (2/3)
11
and the solution is
t t
∗ ∗ t 2 2
yt = y + (y0 − y )a = 15 + (2 − 15) = 15 − 13 .
3 3
(xt+2 +zt+2 )+a1 (xt+1 +zt+1 )+a2 (xt +zt ) = (xt+2 +a1 xt+1 +a2 xt )+(zt+2 +a1 zt+1 +a2 zt ) = 0
since both of the expressions in brackets on the right-hand side of this equation are
equal to zero. Therefore, yt = xt + zt is also a solution.
173
11. Difference equations
z 2 − 5z − 14 = (z + 2)(z − 7) = 0,
yt − 5yt−1 − 14yt−2 = 0
yt = −1 + A(−2)t + B(7t ).
To find the values of A and B we use the given values of y0 and y1 . Since y0 = −1, we
must have −1 + A + B = −1 and since y1 = 8, −1 − 2A + 7B = 8. Solving these, we
obtain A = −1 and B = 1, and therefore
yt = −1 − (−2)t + 7t .
Comments on exercises
Solution to exercise 11.1
Since 5% of the yt−1 businesses in operation at the start of year t fail during that year,
11 it follows that 95% of these survive. Additionally, N new businesses are created, so
yt = 0.95yt−1 + N.
This is a first-order difference equation with, in the standard notation, a = 0.95 and
b = N . Also, from the given information, y0 = 3000. The time-independent solution is
b N N
y∗ = = = = 20N
1−a 1 − 0.95 0.05
and the solution is
There are several ways to solve the last part of the question. Perhaps the easiest way is
to notice that since (0.95)t decreases with t, yt will increase with t if and only if the
number, 3000 − 20N multiplying (0.95)t is negative. So we need 3000 − 20N < 0, or
N > 150.
174
11.9. Comments on exercises
Now,
400 20 1
pt = pD (q S (pt−1 )) = pD (0.05pt−1 − 4) = − (0.05pt−1 − 4) = 160 − pt−1 .
3 3 3
This has time-independent solution
16
p∗ = = 120,
1 − (−1/3)
pt = (1 − 2a)pt−1 + a.
The equilibrium price is given by 1 − p = p, and so is 1/2. From our expression for pt ,
we see that pt → 1/2 as t → ∞ if and only if (1 − 2a)t → 0. For this to be true, we need
−1 < 1 − 2a < 1, which is equivalent to 0 < a < 1. The price will oscillate towards 1/2
when, additionally, 1 − 2a is negative. So this happens when 1/2 < a < 1. When
a = 1/2, 1 − 2a = 0 and the price pt equals 1/2 for all t.
z 2 − z − 6 = (z − 3)(z + 2) = 0,
yt = A3t + B(−2)t .
175
11. Difference equations
176
11.9. Comments on exercises
Yt = Aαt + Bβ t ,
and since α and β are positive, in this case there can be no oscillatory behaviour. The
same holds true when (k + 1)2 = 8k.
We have shown that oscillations occur when (k + 1)2 < 8k, in other words when k lies
strictly between the roots of the equation (k + 1)2 = 8k. Rewriting this as the quadratic
equation k 2 − 6k + 2 = 0, we find that the roots are
√ √
3 − 2 2 and 3 + 2 2.
So the model predicts that, when k is between these two numbers, the national income
Yt will oscillate. (In economics language, it will exhibit ‘business cycles’.)
Whether the oscillations increase
p ort decrease in magnitude depends p on k. Since the
solution involves the factor ( k/2) , the oscillations decrease if k/2 < 1 — that is, if
k < 2 — and increase if k > 2.
11
177
11. Difference equations
11
178
Chapter 12
Vector spaces and subspaces
Introduction
In this chapter we study the important theoretical concept of a vector space. This, and
the related concepts to be explored in the subsequent chapters, will help us understand
much more deeply and comprehensively what we’ve already learned about matrices and
linear equations. There is, necessarily, a bit of a step upwards in the level of
‘abstraction’, but it is worth the effort in order to help our fundamental understanding.
Aims
The aims of this chapter are to:
R
Essential reading
Anthony, M. and M. Harvey. Linear Algebra: Concepts and Methods. Sections 5.1
and 5.2.
Further reading
If you would like to have another source containing the material covered in this chapter,
you can look up the key concepts and definitions (as listed in the synopsis which
follows) in any elementary linear algebra textbook, such as the two listed in Chapter 1,
using either the table of contents or the index.
Synopsis
We introduce the general concept of a vector space. We describe what is meant by a
subspace of a vector space and show how to determine whether a subset of a vector
179
12. Vector spaces and subspaces
space is a subspace. We then look at two special subspaces associated with any matrix:
its range and its null space.
4. there is a single member 0 of V , called the zero vector, such that for all v ∈ V ,
v+0=v
180
12.1. Vector spaces
Other properties follow from those listed in the definition. For instance, we can see that
0x = 0 for all x, as follows:
0x = (0 + 0)x = 0x + 0x,
Activity 12.1 Prove that (−1)x = −x, the negative of the vector x, using a similar
argument with 0 = 1 + (−1).
(Note that this definition says nothing at all about ‘multiplying’ together two vectors:
the only operations with which the definition is concerned are addition and scalar
multiplication.)
A vector space as we have defined it is called a real vector space, to emphasise that 12
the ‘scalars’ α, β and so on are real numbers rather than (say) complex numbers. There
is a notion of complex vector space, where the scalars are complex numbers, which
we shall not cover. In this guide all scalars will be real numbers.
12.1.2 Examples
Example 12.1 The set Rn is a vector space with the usual way of adding and
scalar multiplying vectors.
Example 12.2 The set V = {0} consisting only of the zero vector is a vector
space, with addition defined by 0 + 0 = 0, and scalar multiplication defined by
α0 = 0 for all α ∈ R.
Example 12.3 The set V of functions from R to R with pointwise addition and
scalar multiplication (described earlier in this section) is a vector space. Note that
the zero vector in this space is the function that maps every real number to 0 —
that is, the identically-zero function.
181
12. Vector spaces and subspaces
Activity 12.2 Show that all 10 properties of a vector space are satisfied. In
particular, if the function f is a vector in this space, what is the vector −f ?
Example 12.4 The set of m × n matrices with real entries is a vector space, with
the usual addition and scalar multiplication of matrices. The ‘zero vector’ in this
vector space is the zero m × n matrix which has all entries equal to 0.
Example 12.5 Let V be the set of all vectors in R3 with third entry equal to 0,
that is,
x
V = y : x, y ∈ R .
0
Then V is a vector space with the usual addition and scalar multiplication. To verify
this, we need only check that V is closed under addition and scalar multiplication.
The associative, commutative and distributive laws (properties 2, 3, 7, 8, 9, 10) will
hold for vectors in V because they hold for all vectors in R3 (and all linear
combinations of vectors in V are in V ). Furthermore, if we can show that V is
closed under scalar multiplication, then for any particular v ∈ V , 0v = 0 ∈ V and
(−1)v = −v ∈ V . So we simply need to check that V 6= ∅ (V is non-empty), that if
u, v ∈ V then u + v ∈ V , and if α ∈ R and v ∈ V then αv ∈ V . Each of these is
easy to check.
12.2 Subspaces
12
The last example above is informative. Arguing as we did there, if V is a vector space
and W ⊆ V is non-empty and closed under scalar multiplication and addition, then W
too is a vector space (and we do not need to verify that all the other properties hold).
The formal definition of a subspace is as follows.
Definition 12.2 (Subspace) A subspace W of a vector space V is a non-empty
subset of V that is itself a vector space (under the same operations of addition and
scalar multiplication as V ).
R
for all v ∈ W and α ∈ R, αv ∈ W (W is closed under scalar multiplication).
Read the Comment on Activity 5.17 in the A-H text for an explanation of why
this theorem is true.
182
12.2. Subspaces
Each vector in one of the sets is the position vector of a point on that line. We will
show that the set S is a subspace of R2 , and that the set U is not a subspace of R2 .
1 0
If v = and p = , these sets can equally well be expressed as,
2 1
Activity 12.4 Show that the two descriptions of S describe the same set of vectors.
183
12. Vector spaces and subspaces
since 4 6= 2(1) + 1.
3. U is not closed under scalar multiplication:
0 0 0
∈ U, 2 ∈ R but 2 = ∈
/U
1 1 2
Activity 12.6 Let v be any non-zero vector in a vector space V . Show that the set
12
S = {αv : α ∈ R}
If V is a vector space, the sets V and {0} are subspaces of V . The set {0} is not empty,
it contains one vector, namely the zero vector. It is a subspace because 0 + 0 = 0 and
α0 = 0 for any α ∈ R.
Given any subset S of a vector space V , how do you decide if it is a subspace? First
check that 0 ∈ S. Then using some vectors in the subset, see if adding them and scalar
multiplying them will give you another vector in S. To prove that S is a subspace, you
will need to verify that it is closed under addition and closed under scalar multiplication
for any vectors in S, so you will need to use letters to represent general vectors, or
components of general vectors, in the set. That is, using letters show that the sum u + v
and the scalar product αu of vectors in S also satisfy the definition of a vector in S.
To prove a set S is not a subspace you only need to find one counterexample, one or two
particular vectors (use numbers) for which the sum or the scalar product does not
satisfy the definition of S. Note that if 0 is not in the set, it cannot be a subspace.
184
12.3. Subspaces connected with matrices
Activity 12.7 Write down a general vector (using letters) and a particular vector
(using numbers) for each of the following subsets. Show that one of the sets is a
subspace of R3 and the other is not.
x x
S1 = x2 : x ∈ R , S2 = 2x : x ∈ R
0 0
We have seen that a subspace is a non-empty subset W of a vector space that is closed
under addition and scalar multiplication, meaning that if u, v ∈ W and α ∈ R, then
both u + v and αv are in W . Now, it is fairly easy to see that the following equivalent
property characterises when W will be a subspace:
Theorem 12.2 A non-empty subset W of a vector space is a subspace if and only if
for all u, v ∈ W and all α, β ∈ R, we have αu + βv ∈ W .
Suppose that A is an m × n matrix. Then the null space N (A), the set of solutions to
the homogeneous linear system Ax = 0, is a subspace of Rn .
12
Theorem 12.3 For any m × n matrix A, N (A) is a subspace of Rn .
R Read the proof of this theorem in the A-H text, where it is numbered as
Theorem 5.28. But before you do so, try to prove it yourself by showing that N (A)
is non-empty, closed under addition and closed under scalar multiplication.
The null space is the set of solutions to the homogeneous linear system. If we instead
consider the set of solutions S to a general system Ax = b, S is not a subspace of Rn if
b 6= 0 (that is, if the system is not homogeneous). This is because 0 does not belong to
S. However, as we saw in Chapter 5 (Theorem 6.2), there is a relationship between S
and N (A): if x0 is any solution of Ax = b then S = {x0 + z : z ∈ N (A)}, which we may
write as x0 + N (A). S is an affine set, a translation of the subspace N (A).
Generally, if W is a subspace of a vector space V and x ∈ V then the set x + W defined
by
x + W = {x + w : w ∈ W }
185
12. Vector spaces and subspaces
12.3.2 Range
Recall that the range of an m × n matrix is
R(A) = {Ax : x ∈ Rn }.
R
Theorem 12.4 For any m × n matrix A, R(A) is a subspace of Rm .
Read the proof of this theorem in the A-H text, where it is numbered as
Theorem 5.30. But again before you do so, think about why it is true and see if you
can write out a proof.
Overview
We have defined what is meant by a vector space and a subspace, and have seen how to
determine whether a subset of a vector space is a subspace. Two particular subsets
associated with a matrix have been shown to be subspaces: the range and the null space.
Learning outcomes
At the end of this chapter and the relevant reading you should be able to:
12
Test your knowledge and understanding
Work Exercises 5.1, 5.2 and 5.6 in the text A-H. The solutions to all exercises in the
text can be found at the end of the textbook.
Work Problem 5.6 in the text A-H. You will find the solution on the VLE.
−x = −x + 0 = −x + x + (−1)x = (−1)x
186
12.3. Comments on selected activities
u + v = y + y0 ∈ V
0
and
αx
αv = αy ∈ V.
0
x + y = αv + α0 v = (α + α0 )v,
βx = β(αv) = (βα)v ∈ S
187
12. Vector spaces and subspaces
x
A general vector in S2 is of the form 2x , x ∈ R, and one particular vector, taking
0
1
x = 1, is 2 . S2 is a subspace. We show it is closed under addition and scalar
0
multiplication using general vectors: if u, v ∈ S2 , a, b ∈ R,
a b a+b
u + v = 2a + 2b = 2(a + b) ∈ S2 , a + b ∈ R,
0 0 0
and, if α ∈ R,
αa
αu = 2(αa) ∈ S2 , αa ∈ R.
0
12
188
Chapter 13
Linear span and linear independence
Introduction
In this chapter, we continue our study of vector spaces by looking at what is meant by
the linear span of a set of vectors, and we meet the concept of linear independence of a
set of vectors. These ideas are central to a deeper understanding of linear algebra.
Aims
The aims of this chapter are to:
define what is meant by the linear span of a set of vectors in a vector space
define what it means to say that a set of vectors is linearly independent or linearly
dependent
R
Essential reading
Anthony, M. and M. Harvey. Linear Algebra: Concepts and Methods. Sections 5.3 13
and 6.1.
Further reading
If you would like to have another source containing the material covered in this chapter,
you can look up the key concepts and definitions (as listed in the synopsis which
follows) in any elementary linear algebra textbook, such as the two listed in Chapter 1,
using either the table of contents or the index.
Synopsis
We define what is meant by linear combinations and linear span of vectors in a vector
space, We look at the geometric interpretation of linear span in Rn , and we define the
row space and column space of a matrix. We then say what it means for a set of vectors
189
13. Linear span and linear independence
to be linearly independent or linearly dependent, and show how to test whether a set of
vectors is linearly independent.
If we add together two vectors of this form, we get another linear combination of the
vectors v1 , v2 , . . . , vk . The same is true of any scalar multiple of v.
The set of all linear combinations of a given set of vectors of a vector space V forms a
subspace, and we give it a special name.
Definition 13.1 (Linear span) Suppose that V is a vector space and that
v1 , v2 , . . . , vk ∈ V . The linear span of X = {v1 , . . . , vk } is the set of all linear
combinations of the vectors v1 , . . . , vk , denoted by Lin{v1 , v2 , . . . , vk } or Lin(X) .
That is,
Lin{v1 , v2 , . . . , vk } = {α1 v1 + · · · + αk vk : α1 , α2 , . . . , αk ∈ R}.
Theorem 13.1 If X = {v1 , . . . , vk } is a set of vectors of a vector space V , then Lin(X)
R
is a subspace of V . It is the smallest subspace containing the vectors v1 , v2 , . . . , vk .
Read the proof of this theorem in the A-H textbook, where it is numbered
Theorem 5.33.
13 The subspace Lin(X) is also known as the subspace spanned by the set
X = {v1 , . . . , vk }, or, simply, as the span of {v1 , v2 , . . . , vk }.
Different texts may use different notations for the linear span of a set of vectors.
Notation is important, but it is nothing to get anxious about: just always make it clear
what you mean by your notation: use words as well as symbols!
190
13.1. Linear span
ax + by + cz = d, or as the set of all vectors x which satisfy a vector equation with two
parameters, x = p + sv + tw, s, t ∈ R, where v and w are non-parallel vectors and p is
the position vector of a point on the plane. These definitions are equivalent as it is
possible to go from one representation of a given plane to the other.
If d = 0, the plane contains the origin, so, taking p = 0, the plane is the set of vectors
{x : x = sv + tw, s, t ∈ R}.
Since this is the linear span, Lin{v, w}, of two vectors in R3 , a plane through the origin
is a subspace of R3 .
Let’s look at a specific example.
Then for x ∈ S,
x x x 0 1 0
x= y =
y = 0 + y =x 0 + y 1.
z 2y − 3x −3x 2y −3 2
That is, x = xv1 + yv2 where x, y can be any real numbers and v1 , v2 are the
vectors given above. Since S is the linear span of two vectors, it is a subspace of R3 .
Of course, you can show directly that S is a subspace by showing it is non-empty,
and closed under addition and scalar multiplication.
191
13. Linear span and linear independence
This yields three equations in the two unknowns, s and t. Eliminating s and t from
these equations yields a single Cartesian equation between the variables x, y, z:
x=s
y=t =⇒ z = −3x + 2y or 3x − 2y + z = 0.
z = −3s + 2t
In the same way as for planes in R3 , any hyperplane in Rn which contains the origin is a
subspace of Rn . You can show this directly, exactly as in the activity above, or you can
show it is the linear span of n − 1 vectors in Rn .
R
that the inner product hr, xi = 0 for any r ∈ RS(A) and x ∈ N (A).
Read Section 5.3 of the A-H textbook for a more detailed explanation of what
we have seen so far about linear span.
13
13.2 Linear independence
Linear independence is a central idea in the theory of vector spaces. If {v1 , v2 , . . . , vk }
is a set of vectors in a vector space V , then the vector equation
α1 v1 + α2 v2 + · · · + αr vk = 0
always has the trivial solution, α1 = α2 = · · · = αk = 0.
We say that vectors x1 , x2 , . . . , xk in Rn are linearly dependent if there are numbers
α1 , α2 , . . . , αk , not all zero, such that
α1 x1 + α2 x2 + · · · + αk xk = 0.
In this case the left-hand side is termed a non-trivial linear combination. The
vectors are linearly independent if they are not linearly dependent; that is, if no
non-trivial linear combination of them is the zero vector or, equivalently, whenever
α1 x1 + α2 x2 + · · · + αk xk = 0,
192
13.2. Linear independence
α1 v1 + α2 v2 + · · · + αk vk = 0 =⇒ α1 = α2 = · · · = αk = 0 :
that is, if and only if no non-trivial linear combination of v1 , v2 , . . . , vk equals the zero
vector.
This is because
2v1 + v2 − v3 = 0.
Note that this can also be written as v3 = 2v1 + v2 .
This example illustrates the following general result. Try to prove it yourself before
looking at the proof. 13
Theorem 13.2 The set {v1 , v2 , . . . , vk } ⊆ V is linearly dependent if and only if some
R
vector vi is a linear combination of the other vectors.
Read the proof of this theorem in the A-H textbook, where it is numbered
Theorem 6.6.
It follows from this theorem that a set of two vectors is linearly dependent if and only if
one vector is a scalar multiple of the other.
in Example 13.8 are linearly independent, since one is not a scalar multiple of the
other.
193
13. Linear span and linear independence
Activity 13.3 Show that, for any vector v in a vector space V , the set of vectors
{v, 0} is linearly dependent.
α1 v1 + α2 v2 + · · · + αr vk = 0
α1
α2
x= ... .
αk
Recall Theorem 4.2, (page 53) that the matrix product Ax is exactly the linear
combination α1 v1 + αv2 + · · · + αk vk .
Then the question of whether or not a set of vectors in Rn is linearly independent can
be answered by looking at the solutions of the homogeneous system Ax = 0.
Theorem 13.3 The vectors v1 , v2 , . . . , vk are linearly dependent if and only if the
linear system Ax = 0 has a solution other than x = 0, where A is the matrix
A = (v1 v2 · · · vk ). Equivalently, the vectors are linearly independent precisely when
the only solution to the system is x = 0.
If the vectors are linearly dependent, then any solution x 6= 0 of the system Ax = 0 will
13 directly give a non-trivial linear combination of the vectors that equals the zero vector.
Looking further, we know from our experience of solving linear systems with row
operations that the system Ax = 0 will have precisely the one solution x = 0 if and
only if we obtain from the n × k matrix A an echelon matrix in which there are k
leading ones. That is, if and only if rank(A) = k. (Think about this!) Thus, we have the
following result.
Theorem 13.4 Suppose that v1 , . . . , vk ∈ Rn . Then the set {v1 , . . . , vk } is linearly
independent if and only if the n × k matrix (v1 v2 · · · vk ) has rank k.
194
13.3. Testing for linear independence in Rn
But the rank is always at most the number of rows, so we certainly need to have k ≤ n.
Also, there is a set of n linearly independent vectors in Rn . In fact, there are infinitely
many such sets, but an obvious one is
{e1 , e2 , . . . , en } ,
where ei is the vector with every entry equal to 0 except for the ith entry, which is 1.
{e1 , e2 , . . . , en } ,
in Rn is linearly independent.
So any set of more than n vectors in Rn is linearly dependent. On the other hand, it
should not be imagined that any set of n or fewer is linearly independent: that isn’t true.
195
13. Linear span and linear independence
Activity 13.6 For the set L3 above, find the solution of the corresponding
homogeneous system Ax = 0 where A is the matrix whose columns are the vectors
of L3 . Use the solution to write down a non-trivial linear combination of the vectors
that is equal to the zero vector. Express one of the vectors as a linear combination of
the other two.
There is an important property of linearly independent sets of vectors which holds for
any vector space V .
Theorem 13.6 If x1 , x2 , . . . , xm are linearly independent in V and
then
c1 = c01 , c2 = c02 , ..., cm = c0m .
if and only if
13 Overview
We have seen what is meant by linear combinations and linear span of vectors in a
vector space. We looked at lines and planes in R3 as examples. We defined the row
space and column space of a matrix and looked at their relationships with the range
and null space. We have defined linear independence and dependence and shown how to
determine whether a given set of vectors is linearly independent. We have seen that a
linearly independent set of vectors in Rn contains at most n vectors, and that there is at
most one way to express a vector as a linear combination of linearly independent
vectors.
Learning outcomes
At the end of this chapter and the relevant reading you should be able to:
196
13.3. Test your knowledge and understanding
define column space, CS(A), and row space, RS(A) of a matrix A and explain why
CS(A) = R(A), where R(A) is the range of A
explain what is meant by the statement that the row space of a matrix A is
orthogonal to the null space of A and why this is true
define what is meant by linearly independence and linearly dependence of a set of
vectors
determine whether a given set of vectors is linearly independent or linearly
dependent, and in the latter case find a non-trivial linear combination of the
vectors which equals the zero vector
v = α 1 v 1 + α 2 v2 + · · · + α k vk
and
v0 = α10 v1 + α20 v2 + · · · + αk0 vk
and we will have
x + x0
X
Y = y + y0
Z z + z0
197
13. Linear span and linear independence
and we want to show this belongs to S. Now, this is the case, because
a1 e 1 + a2 e 2 + . . . a n e n = 0
13 you can see that the positions of the ones and zeros in the vectors lead to the equations
a1 = 0 from the first component, a2 = 0 from the second component, and so on, so that
ai = 0 (1 ≤ i ≤ n), is the only possible solution. Alternatively, the matrix
A = (e1 , e2 , . . . , en ) is the n × n identity matrix, so the only solution to Az = 0 is the
trivial solution, proving that the vectors are linearly independent.
Feedback to activity 13.6
The general solution to the system is
x −3/2
x = y = t −1/2 , t ∈ R.
z 1
Taking t = −1 and multiplying out the equation Ax = 0, we see that
1 1 2
3 0 1 2 1
+ − = 0,
2 −1 2 9 3
0 2 1
198
13.3. Comments on selected activities
and hence
2 1 1
1 3 0 1 2
=
3 2 −1 + 2 9 .
1 0 2
13
199
13. Linear span and linear independence
13
200
Chapter 14
Bases and dimension
Introduction
In this chapter we look more deeply into the structure of a vector space, developing the
concept of a basis, which will enable us to know precisely what we mean by the
dimension of a vector space. We also discuss the important rank-nullity theorem.
Aims
The aims of this chapter are to:
R
Essential reading
Anthony, M. and M. Harvey. Linear Algebra: Concepts and Methods.
Sections 6.2–6.5.
Further reading 14
If you would like to have another source containing the material covered in this chapter,
you can look up the key concepts and definitions (as listed in the synopsis which
follows) in any elementary linear algebra textbook, such as the two listed in Chapter 1,
using either the table of contents or the index.
Synopsis
We define what is meant by a finite basis of a vector space (or subspace) and how this
leads to the concept of dimension. Then we explain what is meant by coordinates with
respect to a basis. We also show how to find bases in some important circumstances. We
define the rank and nullity of a matrix and show how these are related.
201
14. Bases and dimension
14.1 Basis
The following result about Rn is very important in the theory of vector spaces. It says
that a linearly independent set of n vectors in Rn spans Rn .
Theorem 14.1 If v1 , v2 , . . . , vn are linearly independent vectors in Rn , then for any x
R
in Rn , x can be written as a unique linear combination of v1 , . . . , vn .
Read the proof of this theorem from the A-H text, where it is labelled
Theorem 6.22.
It follows from this theorem that if we have a set of n linearly independent vectors in
Rn , then the set of vectors also spans Rn , so any vector in Rn can be expressed in
exactly one way as a linear combination of the n vectors. We say that the n vectors form
a basis of Rn . The formal definition of a (finite) basis for a vector space is as follows.
Definition 14.1 ((Finite) Basis) Let V be a vector space. Then the subset
B = {v1 , v2 , . . . , vn } of V is said to be a basis for (or of) V if:
V = Lin(B).
Example 14.1 The vector space Rn has the basis {e1 , e2 , . . . , en } where ei is (as
earlier) the vector with every entry equal to 0 except for the ith entry, which is 1.
It’s clear that the vectors are linearly independent (as you showed in Activity 13.5
14 on page 195), and there are n of them, so we know straight away that they form a
basis. In fact, it’s easy to see that they span the whole of Rn , since for any
x = (x1 , x2 , . . . , xn )T ∈ Rn ,
x = x1 e 1 + x 2 e 2 + · · · + xn e n .
202
14.1. Basis
If x = (x, y, z)T is any vector in W , then its components must satisfy y = −x + 3z,
and we can express x as
x x 1 0
x = y = −x + 3z = x −1 + z 3 = xv + zw x, z ∈ R.
z z 0 1
This shows that the set {v, w} spans W . The set is linearly independent. Why?
Because of the positions of the zeros and ones, if αv + βw = 0 then necessarily
α = 0 and β = 0.
As in the above example, we can show that n vectors in Rn are a basis of Rn by writing
them as the columns of a matrix A and invoking Theorem 9.2. Turning this around, we
can see that if A = (v1 v2 . . . vn ) is an n × n matrix with rank(A) = n, then the
columns of A are a basis of Rn . Indeed, by Theorem 9.2, the system Az = x will have a
unique solution for any x ∈ Rn , so any vector x ∈ Rn can be written as a unique linear
combination of the column vectors. We therefore have two more equivalent statements
to add to the theorem.
Theorem 14.3 If A is an n × n matrix, then the following statements are equivalent.
A−1 exists.
The rank of A is n.
The last statement can be seen from the facts that |AT | = |A|, and the rows of A are
the columns of AT . This theorem provides an easy way to determine if a set of n vectors
is a basis of Rn . We simply write the n vectors as the columns of a matrix, and evaluate
its determinant.
203
14. Bases and dimension
Show that one of these sets is a basis of R3 and that the other one spans a plane in
R3 . Find a basis for this plane. Then find a Cartesian equation for the plane.
14.1.1 Coordinates
α1
α2
[v]S =
...
αn S
are each a basis of R2 . The coordinates of the vector v = (2, −5)T in each basis are
given by the coordinate vectors,
14
2 −1
[v]B = and [v]S = .
−5 B 3 S
In the standard basis, the coordinates of v are precisely the components of the
vector v. In the basis S, the components of v arise from the observation that
1 1 2
v = −1 +3 = .
2 −1 −5
Activity 14.2 For the example above, sketch the vector v on graph paper and show
it as the sum of the vectors given by each of the linear combinations: v = 2e1 − 5e2
and v = −1v1 + 3v2 .
204
14.1. Basis
14.1.2 Dimension
A fundamental result is that if a vector space V has a finite basis, then all bases of V
are of the same size.
Theorem 14.4 Suppose that the vector space V has a finite basis consisting of d
vectors. Then any basis of V consists of exactly d vectors.
R
This is not an easy result to prove.
Read Section 6.4.1, Theorems 6.36 and 6.37, of the A-H textbook to understand
why this theorem is true.
This enables us, finally, to define exactly what we mean by the dimension of a vector
space V .
Definition 14.3 (Dimension) The number d of vectors in a finite basis of a vector
space V is the dimension of V , and is denoted dim(V ). The vector space V = {0} is
defined to have dimension 0.
A vector space which has a finite basis is said to be finite-dimensional. Not all vector
spaces are finite-dimensional. (For example, the vector space of real functions with
pointwise addition and scalar multiplication has no finite basis. Such a vector space is
said to be infinite-dimensional.)
Example 14.5 We already know Rn has a basis of size n. (For example, the
standard basis consists of n vectors.) So Rn has dimension n (which is reassuring,
since it is often referred to as n-dimensional Euclidean space).
If we know the dimension of a vector space V , then we know how many vectors we need
for a basis. If we have the correct number of vectors for a basis and we know either that
the vectors span V , or that they are linearly independent, then we can conclude that
both must be true and they form a basis, as is shown in the following theorem. That is,
we do not need to show both.
Theorem 14.5 Let V be a finite-dimensional vector space of dimension d. Then: 14
d is the largest size of a linearly independent set of vectors in V . Furthermore, any
set of d linearly independent vectors is necessarily a basis of V
d is the smallest size of a spanning set of vectors for V . Furthermore, any finite set
R
of d vectors that spans V is necessarily a basis.
Read the proof of Theorem 6.42 in the A-H textbook to understand why this is
true.
Thus, d = dim(V ) is the largest possible size of a linearly independent set of vectors in
V , and the smallest possible size of a spanning set of vectors (a set of vectors whose
linear span is V ).
205
14. Bases and dimension
Now, the dimension of W is the largest size of a linearly independent set of vectors in
W , so there is a set of dim(W ) linearly independent vectors in V . But then this means
that dim(W ) ≤ dim(V ), since the largest possible size of a linearly independent set in V
is dim(V ). There is another important relationship between bases of W and V : this is
that any basis of W can be extended to one of V . The following result states this
precisely.
Theorem 14.6 Suppose that V is a finite-dimensional vector space and that W is a
subspace of V . Then dim(W ) ≤ dim(V ). Furthermore, if {w1 , w2 , . . . , wr } is a basis of
W then there are s = dim(V ) − dim(W ) vectors v1 , v2 , . . . , vs ∈ V such that
{w1 , w2 , . . . , wr , v1 , v2 , . . . , vs } is a basis of V . (In the case W = V , the basis of W is
already a basis of V .) That is, we can obtain a basis of the whole space V by adding
R
certain vectors of V to any basis of W .
Read the proof of this theorem in the text A-H where it is labelled Theorem 6.45.
W = {x : x + y − 3z = 0}
has a basis consisting of the vectors v1 = (1, 2, 1)T and v2 = (3, 0, 1)T . If v3 is any
vector which is not in this plane, for example, v3 = (1, 0, 0)T , then the set
S = {v1 , v2 , v3 } is a basis of R3 .
206
14.2. Finding a basis for a linear span in Rn
A useful technique is to form a matrix with the xTi as rows, and to perform row
operations until the resulting matrix is in echelon form. Then a basis of the linear span
is given by the transposed non-zero rows of the echelon matrix (which, it should be
noted, will not generally be among the initial given vectors). The reason this works is
that: (i) row operations are such that at any stage in the resulting procedure, the row
space of the matrix is equal to the row space of the original matrix, which is precisely
the linear span of the original set of vectors, and (ii) the non-zero rows of an echelon
matrix are linearly independent (which is clear, since each has a one in a position where
the vectors below it all have zero).
Example 14.8 We find a basis for the subspace of R5 spanned by the vectors
1 2 −1 3
−1 1 2 0
2 , x2 = −2 , x3 = −4 , x4 = 0 .
x1 =
−1 −2 1 −3
−1 −2 1 −3
Write the vectors as the columns of a matrix A and then take AT (effectively writing
each column vector as a row).
1 −1 2 −1 −1
2 1 −2 −2 −2
AT =
−1
.
2 −4 1 1
3 0 0 −3 −3
If we want to find a basis that consists of a subset of the original vectors, then we need
to take those vectors that ‘correspond’ to the final non-zero rows in the echelon matrix.
207
14. Bases and dimension
By this, we mean the rows of the original matrix that have ended up as non-zero rows
in the echelon matrix. For instance, in Example 14.8, the first and second rows of the
original matrix correspond to the non-zero rows of the echelon matrix, so a basis of the
span is {x1 , x2 }. On the other hand, if we interchange rows, the correspondence won’t
be so obvious.
A better method to obtain such a basis is given in the next section, using the matrix A
whose columns are the vectors x1 , x2 . . . , xk . Then, as we have seen,
Lin{x1 , . . . , xk } = R(A). That is, Lin{x1 , . . . , xk } is the range or column space of the
matrix A.
rank(A) = dim(R(A))
We have, of course, already used the word ‘rank’, so it had better be the case that the
usage just given coincides with the earlier one. Fortunately it does. In fact, we have the
following connection.
Theorem 14.7 Suppose that A is an m × n matrix with columns c1 , c2 , . . . , cn , and
that an echelon form obtained from A has leading ones in columns i1 , i2 , . . . , ir . Then a
basis for R(A) is
B = {ci1 , ci2 , . . . , cir }.
Note that the basis is formed from columns of A, not columns of the echelon matrix: the
14 basis consists of those columns of A corresponding to the leading ones in the echelon
matrix.
We will outline a proof of this theorem, so you can see how it works. We have already
seen that a solution x = (α1 , α2 , . . . , αn ) of Ax = 0 gives a linear combination of the
columns of A which is equal to the zero vector,
0 = α1 c1 + α2 c2 + . . . + αn cn .
If E denotes the reduced echelon form of A, and if c01 , c02 , . . . , c0n denote the columns of
E, then exactly the same relationship holds:
208
14.3. Basis and dimension of range and null space
R
leading ones.
For more detail, read the proof of Theorem 6.54 in the A-H textbook.
We have already seen that a matrix A and its reduced row echelon form have the same
row space, and that the non-zero rows form a basis of this row space. So the dimension
of the row space of A, RS(A), and the dimension of the column space of A,
CS(A) = R(A), are each equal to the number of leading ones in an echelon form of A,
that is both are equal to rank(A). We restate this important fact:
0 21 12
1
E = 0 1 23 12 .
0 0 0 0
The leading ones in this echelon matrix are in the first and second columns, so a
basis for R(A) can be obtained by taking the first and second columns of A. (Note:
‘columns of A’, not of the echelon matrix!) Therefore a basis for R(A) is
1 1
2, 0 .
9 −1
A basis of the row space of A consists of the two non-zero rows of the reduced
matrix, or the first two rows of the original matrix,
1
0 1
2
14
0 1 1 0
,
1 3 or 2 1 .
,
21
2
1
2 2
1 1
Note that the column space is a two-dimensional subspace of R3 (a plane) and the
row space is a two-dimensional subspace of R4 . The columns of A and E satisfy the
same linear dependence relations, which can be easily read from the reduced echelon
form of the matrix,
1 3 1 1
c3 = c1 + c2 , c4 = c1 + c2 .
2 2 2 2
209
14. Bases and dimension
Activity 14.4 Check that the columns of A satisfy these same linear dependence
relations.
There is a very important relationship between the rank and nullity of a matrix. We
have already seen some indication of it in our considerations of linear systems. Recall
that if an m × n matrix A has rank r then the general solution to the (consistent)
system Ax = 0 involves n − r ‘free parameters’. Specifically (noting that 0 is a
particular solution, and using a characterisation obtained earlier in Chapter 9), the
general solution takes the form
x = s1 u1 + s2 u2 + · · · + sn−r un−r ,
where u1 , u2 , . . . , un−r are themselves solutions of the system Ax = 0. But the set of
solutions of Ax = 0 is precisely the null space N (A). Thus, the null space is spanned by
the n − r vectors u1 , . . . , un−r , and so its dimension is at most n − r. In fact, it turns
out that its dimension is precisely n − r. That is,
nullity(A) = n − rank(A).
To see this, we need to show that the vectors u1 , . . . , un−r are linearly independent.
Because of the way in which these vectors arise (look at Example 14.9), it will be the
case that for each of them, there is some position where that vector will have an entry
equal to 1 and the entry in that same position of all the other vectors will be 0. From
this we can see that no non-trivial linear combination of them can be the zero vector, so
they are linearly independent. We have therefore proved the following central result.
Theorem 14.8 (Rank-nullity theorem) For an m × n matrix A,
rank(A) + nullity(A) = n.
Activity 14.5 Find a basis of the null space of the matrix A from Example 14.9,
1 1 2 1
A = 2 0 1 1.
14 9 −1 3 4
Overview
We have seen what is meant by a basis of a vector space, and the related concept of
dimension and we have seen how to determine coordinates with respect to a basis. We
have also defined the rank and nullity of a matrix and observed the connection between
these, through the rank-nullity theorem.
210
14.3. Learning outcomes
Learning outcomes
At the end of this chapter and the relevant reading you should be able to:
Equating components, you obtain three equations in the two unknowns s and t.
Eliminating s and t between the three equations you will obtain a single equation
relating x, y and z. Explicitly, we have
x = −s + t, y = 2t, z = s + 3t,
so
y y
t= , s=t−x= −x
2 2
211
14. Bases and dimension
and y 3
z = s + 3t = − x + y,
2 2
so we have x − 2y + z = 0. This is the Cartesian equation of the plane.
Note that a Cartesian equation could equally well have been obtained by writing the
two basis vectors and the vector x as the columns of a matrix M and using the fact
that |M | = 0 if and only if the columns of M are linearly dependent. That is,
−1 1 x
0 2 y = −2x + 4y − 2z = 0.
1 3 z
a1 w1 + a2 w2 + . . . + ar wr = 0
is the trivial one, with all ai = 0. But all the vectors in W are also in V , and this
statement still holds true, so S is a linearly independent set of vectors in V .
− 21
1
−2
3 1
x = s 1 − 2 + s 2 − 2 = s 1 u1 + s 2 u2 .
1 0
0 1
The set {u1 , u2 } is a basis of the null space of A, so dim(N (A)) = 2. From Example
14.9, rank(A) = 2. The matrix A has n = 4 columns.
rank(A) + nullity(A) = 2 + 2 = 4 = n.
Note that the basis vectors of the null space give precisely the same linear dependence
relations between the column vectors as those given in the example. Since Au1 = 0 and
14 Au2 = 0,
1 3 1 1
Au1 = − c1 − c2 + c3 = 0 and Au2 = − c1 − c2 + c4 = 0.
2 2 2 2
212
Chapter 15
Linear transformations
Introduction
In the next few chapters, we turn our attention to special types of functions between
vector spaces known as linear transformations. In this chapter, we will discuss what is
meant by a linear transformation, and will look at the matrix representations of linear
transformations between Euclidean vector spaces. This material, together with that of
the next chapter, provides the fundamental theoretical underpinning for the technique
of diagonalisation, which has many applications, as we shall see later.
Aims
The aims of this chapter are to:
R
Essential reading
Anthony, M. and M. Harvey. Linear Algebra: Concepts and Methods. Sections 7.1
and 7.2.
Further reading
15
If you would like to have another source containing the material covered in this chapter,
you can look up the key concepts and definitions (as listed in the synopsis which
follows) in any elementary linear algebra textbook, such as the two listed in Chapter 1,
using either the table of contents or the index.
Synopsis
We define linear transformations and give examples. We then show an important
connection between linear transformations of Euclidean space and matrices. We discuss
213
15. Linear transformations
2. T (αu) = αT (u).
T is said to be a linear transformation (or linear mapping or linear function).
(This single condition implies the two in the definition, and is implied by them.)
Activity 15.1 Prove that this single condition is equivalent to the two of the
definition.
15.2 Examples
Next,
TA (αu) = A(αu) = αAu = αTA (u).
So the two ‘linearity’ conditions are satisfied. We call TA the linear transformation
corresponding to A.
214
15.3. Linear transformations and matrices
u1
u2
T (u) = T
... = pu1 ,u2 ,...,un = pu ,
un
The fact that for all x, pu+v (x) = (pu + pv )(x) means that the functions pu+v and
pu + pv are identical. The fact that T (αu) = αT (u) is similarly proved, and you
should try it!
215
15. Linear transformations
R
then T = TA : that is, for every u ∈ Rn , T (u) = Au.
Read the proof of this theorem in the A-H text, where it is labelled Theorem 7.8.
Thus, to each matrix A there corresponds a linear transformation TA , and to each linear
transformation T there corresponds a matrix AT . Note that the matrix AT we found
was determined by using the standard basis in both vector spaces: later in this chapter
we will generalise this to use other bases.
Notice that the entries of the matrix AT are just the coefficients of x, y, z in the
definition of T .
15.3.1 Rotation in R2
We find the matrix A that represents the linear transformation T : R2 → R2 which is
rotation anticlockwise by an angle θ about the origin. Let the images of the standard
15 basis vectors e1 and e2 be the vectors,
a b
T (e1 ) = , T (e2 ) = ,
c d
so that
a b
AT = .
c d
216
15.4. Linear transformations of any vector spaces
y 6
1 6
(a, c)
T (e1 )
7
(b, d) }
Z T (e2 )
θZZ
Z
Z
Z θ
Z
- -
(0, 0) 1 x
a b
The vectors T (e1 ) = and T (e2 ) = are orthogonal and each has length one
c d
since they are the rotated standard basis vectors. Drop a perpendicular from the point
(a, c) to the x-axis, forming a right triangle with angle θ at the origin. Since the
x-coordinate of the rotated vector is a and the y-coordinate is c, the side opposite the
angle θ has length c and the side adjacent to the angle θ has length a. The hypotenuse
of this triangle (which is the rotated unit vector e1 ) has length equal to one. We
therefore have a = cos θ and c = sin θ. Similarly, drop the perpendicular from the point
(b, d) to the x-axis and observe that the angle opposite the x-axis is equal to θ. Again,
basic trigonometry tells us that the x-coordinate is b = − sin θ (it has length sin θ and is
in the negative x-direction), and the height is d = cos θ. Therefore,
a b cos θ − sin θ
A= =
c d sin θ cos θ
Activity 15.3 Confirm this by sketching the vectors e1 and e2 and the image
vectors 1 1
√
2
− √2
T (e1 ) = 1 and T (e2 ) = 1 .
√ √
2 2
15
What is the matrix of the linear transformation which is a rotation anticlockwise by
π radians? What is the matrix of the linear transformation which is a reflection in
the y-axis? Think about what each of these two transformations does to the
standard basis vectors e1 and e2 .
217
15. Linear transformations
218
15.5. Linear transformations determined by action on a basis
Activity 15.4 Check this by multiplying the matrices. (You should note that
sin2 θ + cos2 θ = 1: see the subject guide for MT1174 Calculus.)
We found,
1 1 1
AT = 1 −1 0 .
1 2 −3
Since |A| = 9, the matrix is invertible, and T −1 is given by the matrix
3 5 1
1
A−1
T = 3 −4 1 .
9
3 −1 −2
That is,
1
+ 95 v + 19 w
u
!
3
u
T −1 v = 1
3
− 49 v + 19 w .
u
1
w 3
u − 19 v + − 92 w
We have the following result, which shows that if we know what a linear transformation
does to the basis of a vector space, then we know what it does to any vector.
Theorem 15.2 Let V be a finite dimensional vector space and let T be a linear
transformation from V to a vector space W . Then T is completely determined by what
R
it does to a basis of V .
Read the proof of this theorem in the A-H text, where it is labelled
Theorem 7.20.
219
15. Linear transformations
The null space is also called the kernel, and may be denoted ker(T ) in some texts.
Of course, for any matrix, A, R(TA ) = R(A) and N (TA ) = N (A).
The range and null space of a linear transformation T : V → W are subspaces of W and
V , respectively.
Example 15.6 We find the null space and range of the linear transformation
S : R2 → R 4 ,
x + y
x x
S =
x − y.
y
y
The matrix of the linear transformation is
1 1
1 0
AS = 1 −1 .
0 1
15 Observe that this matrix has rank 2 (by having two linearly independent columns, or
you could alternatively see this by putting it into row echelon form), so that
N (S) = {0}, the subspace of R2 consisting of only the zero vector. This can also be
seen directly from the fact that
x+y 0
x 0 x 0
= ⇐⇒ x = 0, y = 0 ⇐⇒ = .
x − y 0 y 0
y 0
The range, R(S) is the two-dimensional subspace of R4 with basis given by the
column vectors of AS .
220
15.7. Rank and nullity
R
(Note that this result holds even if W is not finite-dimensional.)
Read the proof of this theorem in the A-H text, where it is labelled Theorem
7.25. Note the differences (and similarities) between the statement and proof of this
theorem and the rank-nullity theorem for matrices.
For an m × n matrix A, if T = TA , then T is a linear transformation from V = Rn to
W = Rm , and rank(T ) = rank(A), nullity(T ) = nullity(A), so this theorem states the
earlier result that
rank(A) + nullity(A) = n.
221
15. Linear transformations
1 0 − 31
AT = 0 1 − 23 .
0 0 0
Overview
We have defined linear transformations and seen some examples. An important theme
has been the connection between linear transformations and matrices. This gives us a
very useful way to think about matrices: not simply as arrays of numbers on which one
can perform algebra, but as representations of linear transformations. That conception
of matrices will enable us to understand better the key topic of change of basis (in the
next chapter) and diagonalisation (following that).
Learning outcomes
At the end of this chapter and the relevant reading you should be able to:
222
15.7. Comments on selected activities
as required. On the other hand, suppose that for all u, v ∈ V and α, β ∈ R, we have
T (αu + βv) = αT (u) + βT (v). Then property 1, follows on taking α = β = 1 and
property 2 follows on taking β = 0.
Feedback to activity 15.2
T (αu) = pαu , and T (u) = pu , so we need to check that pαu = αpu . Now, for all x,
as required.
Feedback to activity 15.3
Rotation by π radians is given by matrix A, whereas reflection in the y-axis is given by
matrix B:
−1 0 −1 0
A= B= .
0 −1 0 1
223
15. Linear transformations
so αv ∈ R(A).
Now consider N (T ). It is non-empty because the fact that T (0) = 0 shows 0 ∈ N (T ).
Suppose u, v ∈ N (A) and α ∈ R. Then to show u + v ∈ N (T ) and αu ∈ N (T ), we must
show that T (u + v) = 0 and T (αu) = 0. We have
T (u + v) = T (u) + T (v) = 0 + 0 = 0
and
T (αu) = α(T (u)) = α0 = 0,
so we have shown what we needed.
15
224
Chapter 16
Coordinates and change of basis
Introduction
This chapter looks at how vectors can be represented in terms of their coordinates with
respect to a basis, and explores how these representations change if the basis is changed.
Equally, linear transformations can be represented as matrices with respect to different
bases, and this chapter explains how these representations change if the bases are
changed. It is quite technical material, but vital for a proper understanding of the next
topic, diagonalisation.
Aims
The aims of this chapter are to:
Essential reading
R Anthony, M. and M. Harvey. Linear Algebra: Concepts and Methods. Sections 7.3
and 7.4.
16
Further reading
If you would like to have another source containing the material covered in this chapter,
you can look up the key concepts and definitions (as listed in the synopsis which
follows) in any elementary linear algebra textbook, such as the two listed in Chapter 1,
using either the table of contents or the index.
225
16. Coordinates and change of basis
Synopsis
We define the coordinates of a vector with respect to a given basis. Two different bases
will give different coordinates for a given vector and we explain how these two different
coordinates are connected, through a transition matrix. We have seen that linear
transformations can be represented by matrices, but we then look at how we may do
this with respect to any bases, not just the standard ones (which we will see is what we
have done up to now). The way in which these matrices change when we change the
bases in question brings us to the concept of similarity of two matrices.
αn B
is called the coordinate vector of x with respect to the basis B = {v1 , v2 , . . . , vn }.
One very straightforward observation is that the coordinate vector of any x ∈ Rn with
respect to the standard basis is just x itself. This is because if x = (x1 , x2 , . . . , xn )T ,
x = x1 e1 + x2 e2 + · · · + xn en .
What is less immediately obvious is how to find the coordinates of a vector x with
respect to a basis other than the standard one.
If x is the vector (5, 7, −2)T , then the coordinate vector of x with respect to B is
16
1
[x]B = −1 ,
2 B
because
1 2 3
x = 1 2 + (−1) −1 + 2 2 .
3 3 −1
226
16.1. Coordinates and coordinate change
where the column vectors of the matrix are the new basis vectors, v1 , v2 , so the
matrix is also the transition matrix from B coordinates to standard coordinates;
that is, we have v = PB [v]B . Then the coordinates of a vector with respect to the
new basis are given by [v]B = PB−1 v. The inverse of rotation anticlockwise, is
rotation clockwise, so we have, 16
1
cos(− π4 ) − sin(− π4 ) cos π4 sin π4 √1
√
−1 2 2
PB = = =
sin(− π4 ) cos(− π4 ) − sin π4 cos π4 − √12 √12
227
16. Coordinates and change of basis
What are √
its coordinates in the new basis B? We can find these directly since we
have x = 2v1 + 0v2 , and in B coordinates
1 0
[v1 ]B = and [v2 ]B = ,
0 B 1 B
so that, √
√
1 2
[x]B = 2 = .
0 B 0 B
Note that √
√1 − √12
2 2 1
x = PB [x]B = = .
√1 √1 0 B 1
2 2
Given a basis B of Rn with transition matrix PB , and another basis B 0 with transition
matrix PB 0 , how do we change from coordinates in the basis B to coordinates in the
basis B 0 ?
The answer is quite simple. First we change from B coordinates to standard coordinates
using v = PB [v]B and then change from standard coordinates to B 0 coordinates using
[v]B 0 = PB−10 v. That is,
[v]B 0 = PB−10 PB [v]B .
that is, each column of the matrix M is obtained by multiplying the matrix PB−10 by the
corresponding column of PB . But PB−10 vi is just the B 0 coordinates of the vector vi , so
the matrix M is given by
R Read Section 7.3 in the text A-H and work through the activity labelled
Activity 7.33 there.
228
16.2. Change of basis and similarity
[T (x)]B 0 = M [x]B
The matrix A[B,B 0 ] is called the matrix representing T with respect to bases B
and B 0 . (This could also be denoted as AT [B,B 0 ] to show the correspondence with the
R
linear transformation T .)
Read the proof of this theorem in the text A-H where it is labelled Theorem 7.36.
The theorem states that A[B,B 0 ] = ([T (v1 )]B 0 [T (v2 )]B 0 · · · [T (vn )]B 0 ). According to the
proof of this theorem, this matrix is obtainable as
A[B,B 0 ] = PB−10 AT PB
16.2.2 Similarity 16
A particular case of Theorem 16.2 is so important it is worth stating separately. It
corresponds to the case in which m = n and B 0 = B.
Theorem 16.3 Suppose that T : Rn → Rn is a linear transformation and that
B = {x1 , x2 , . . . , xn } is some basis of Rn . Let
P = (x1 x2 . . . xn )
229
16. Coordinates and change of basis
be the matrix whose columns are the vectors of B. Then for all x ∈ Rn ,
[T (x)]B = P −1 AP [x]B ,
In other words,
A[B,B] = P −1 AP.
The relationship between the matrices A[B,B] and A is a central one in the theory of
linear algebra. The matrix A[B,B] performs the same linear transformation as the matrix
A, only A[B,B] describes it in terms of the basis B rather than in standard coordinates.
This likeness of effect inspires the following definition.
Definition 16.2 (Similarity) We say that two square matrices A and M are similar if
there is an invertible (non-singular) matrix P such that M = P −1 AP .
Note that ‘similar’ has a very precise meaning here: it doesn’t mean that the matrices
somehow ‘look like’ each other (as the usual use of the word similar would suggest), but
that they represent the same linear transformation in different bases.
As we shall see in the remaining chapters, this relationship can be used to great
R
advantage if the new basis B is chosen carefully.
Read Section 7.4 in the text A-H (including Example 7.40 and Example 7.42).
Overview
We defined the coordinates of a vector with respect to a given basis and studied how
these change when the basis changes. We have also investigated the representation of
linear transformations by matrices, with respect to any bases on each of the two vector
spaces (the one it maps from and the one it maps to). The concept of similarity of
matrices then arose naturally, as two matrices are similar if they represent the same
linear transformation of a vector space to itself, but possibly with respect to a different
basis.
Learning outcomes
16
At the end of this chapter and the relevant reading you should be able to:
know what is meant by the coordinate vector of a vector with respect to a basis
and be able to determine this
find the matrix representation of a transformation with respect to two given bases
know how to change between different bases of a vector space
know what it means to say that two square matrices are similar.
230
16.2. Test your knowledge and understanding
16
231
16. Coordinates and change of basis
16
232
Chapter 17
Diagonalisation
Introduction
One of the most useful techniques in applications of matrices and linear algebra is
diagonalisation. This relies on the topic of eigenvalues and eigenvectors, and is related
to change of basis. We will learn how to find eigenvalues and eigenvectors of an n × n
matrix, how to diagonalise a matrix when it is possible to do so, and also how to
recognise when it is not possible. We shall see in the next chapter how useful a
technique diagonalisation is.
All matrices in this chapter are square n × n matrices with real entries, so all vectors
will be in Rn for some n.
Aims
The aims of this chapter are to:
R
Essential reading
Anthony, M. and M. Harvey. Linear Algebra: Concepts and Methods. Chapter 8,
except Section 8.3.4.
Further reading
If you would like to have another source containing the material covered in this chapter,
you can look up the key concepts and definitions (as listed in the synopsis which
follows) in any elementary linear algebra textbook, such as the two listed in Chapter 1,
using either the table of contents or the index.
17
Synopsis
We define eigenvalues and eigenvectors of a square matrix and show how to find these,
through the characteristic polynomial of the matrix. We then say what it means to
233
17. Diagonalisation
show that a matrix is diagonalisable, and we relate this to the existence of a basis, the
members of which are eigenvectors of the matrix. We explain this by drawing on the
material of the previous two chapters, through our understanding that a matrix
represents a linear transformation. We explain that diagonalisation is not always
possible but, when it is, we show how to achieve it.
17.1.1 Definitions
Definition 17.1 Suppose that A is a square matrix. The number λ is said to be an
eigenvalue of A if for some non-zero vector x, Ax = λx. Any non-zero vector x for
which this equation holds is called an eigenvector for eigenvalue λ or an
eigenvector of A corresponding to eigenvalue λ.
234
17.1. Eigenvalues and eigenvectors
= λ2 − 3λ − 28 + 30
= λ2 − 3λ + 2.
So the eigenvalues are the solutions of λ2 − 3λ + 2 = 0. To solve this for λ, one could
use either the formula for the solutions to a quadratic equation, or simply observe
that the characteristic polynomial factorises. We have (λ − 1)(λ − 2) = 0 with
solutions λ = 1 and λ = 2. Hence the eigenvalues of A are 1 and 2, and these are the
only eigenvalues of A.
235
17. Diagonalisation
So why do we prefer row operations? There are two reasons. The first reason is that the
system of equations may not be as simple as the one just given, particularly for an
n × n matrix where n > 2. The second reason is that putting the matrix A − λI into
echelon form provides a useful check on the eigenvalue. If |A − λI| = 0, the echelon form
of A − λI must have a row of zeros, and the system (A − λI)x = 0 will have a
non-trivial solution. If we have reduced the matrix (A − λ0 I) for some supposed
eigenvalue λ0 and do not obtain a zero row, we know immediately that there is an error,
either in the row reduction or in the choice of λ0 , and we can go back and correct it.
Examples in R3
Find the eigenvalues of A and find the corresponding eigenvectors for each
eigenvalue.
To find the eigenvalues we solve |A − λI| = 0. Now,
4 − λ 0 4
|A − λI| = 0 4−λ 4
4 4 8 − λ
4 − λ 4 0 4 − λ
= (4 − λ) + 4
4 8 − λ 4 4
= (4 − λ) ((4 − λ)(8 − λ) − 16) + 4 (−4(4 − λ))
= (4 − λ) ((4 − λ)(8 − λ) − 16) − 16(4 − λ).
We notice that each of the two terms in this expression has 4 − λ as a factor, so
instead of expanding everything, we take 4 − λ out as a common factor, obtaining
It follows that the eigenvalues are 4, 0, 12. (The characteristic polynomial will not
always factorise so easily. Here it was simple because of the common factor 4 − λ.
The next example is more difficult.)
17 To find an eigenvector for 4, we have to solve the equation (A − 4I)x = 0 for
x = (x1 , x2 , x3 )T . Using row operations, we have
0 0 4 1 1 0
0 0 4 −→ . . . −→ 0 0 1 .
4 4 4 0 0 0
236
17.1. Eigenvalues and eigenvectors
Activity 17.1 Determine the eigenvectors for 0 and 12. Check your answers: verify
that Av = λv for each eigenvalue and one corresponding eigenvector.
Now, the fact that −1 is an eigenvalue means that −1 is a solution of the equation
|A − λI| = 0. (You should immediately check the characteristic polynomial we
obtained by verifying that λ = −1 does, indeed, satisfy |A − λI| = 0.) This means
that λ − (−1), that is, λ + 1, is a factor of the characteristic polynomial |A − λI|. So
this characteristic polynomial can be written in the form
Clearly we must have a = 1 and c = 2 to obtain the correct λ3 term and the correct
constant. Using this, and comparing the coefficients of either λ2 or λ with the cubic
polynomial, we find b = 3. In other words, the characteristic polynomial is
17
−(λ3 + 4λ2 + 5λ + 2) = −(λ + 1)(λ2 + 3λ + 2) = −(λ + 1)(λ + 2)(λ + 1).
237
17. Diagonalisation
Activity 17.2 Perform the calculations to check that b = 3 and that the
characteristic polynomial factorises as stated.
We have, |A − λI| = (λ + 1)2 (λ + 2). The eigenvalues are the solutions to |A − λI| = 0,
so they are λ = −1 and λ = −2.
Note that in this case, there are only two distinct eigenvalues. We say that the
eigenvalue −1 has occurred twice, or the λ = −1 is an eigenvalue of multiplicity 2. We
will find the eigenvectors when we look at this example again in section 17.3.
17.1.3 Eigenspaces
If A is an n × n matrix and λ is an eigenvalue of A, then the set of eigenvectors
corresponding to the eigenvalue λ together with the zero vector, 0, is a subspace of Rn .
Why?
We have already seen that the null space of any m × n matrix is a subspace of Rn . The
null space of the n × n matrix A − λI, consists of all solutions to the matrix equation
(A − λI)x = 0, which is precisely the set of all eigenvectors corresponding to λ together
with the vector 0.
Definition 17.3 (Eigenspace) If A is an n × n matrix and λ is an eigenvalue of A,
then the eigenspace of the eigenvalue λ is the subspace N (A − λI) of Rn .
R
eigenvalues.
Read a more detailed discussion of this theorem at the beginning of Section 8.1.4
of the text A-H (where the theorem is labelled Theorem 8.11). Section 8.1.4 also
discusses the trace of a matrix, which is entirely optional for this course.
17
17.2 Diagonalisation of a square matrix
Recall that square matrices A and M are similar if there is an invertible matrix P such
that P −1 AP = M . We met this idea earlier when we looked at how a matrix
238
17.2. Diagonalisation of a square matrix
representing a linear transformation changes when the basis is changed. We now begin
to explore why this is such an important and useful concept.
Definition 17.4 The matrix A is diagonalisable if it is similar to a diagonal matrix;
in other words, if there is a diagonal matrix D and an invertible matrix P such that
P −1 AP = D.
λ1 0 · · · 0
0 λ2 · · · 0
D = diag(λ1 , λ2 , . . . , λn ) =
0 0 ... 0 .
0 0 · · · λn
(Note the useful notation for describing the diagonal matrix D.) Then we have
AP = P D. If the columns of P are the vectors v1 , v2 , . . . , vn , then
and
λ1 0 ··· 0
0 λ2 ··· 0
P D = (v1 . . . vn )
0 ... = (λ1 v1 . . . λn vn ).
0 0
0 0 · · · λn
So this means that
The fact that P −1 exists means that none of the vectors vi is the zero vector. So this
means that (for i = 1, 2, . . . , n) λi is an eigenvalue of A and vi is a corresponding
eigenvector. Since P has an inverse, these eigenvectors are linearly independent.
Therefore, A has n linearly independent eigenvectors. Conversely, if A has n linearly
independent eigenvectors, then the matrix P whose columns are these eigenvectors will
be invertible, and we will have P −1 AP = D where D is a diagonal matrix with entries
equal to the eigenvalues of A. We have therefore established the following result.
Theorem 17.2 An n × n matrix A is diagonalisable if and only if it has n linearly
independent eigenvectors.
Since n linearly independent vectors in Rn form a basis of Rn , another way to state this
theorem is:
Theorem 17.3 An n × n matrix A is diagonalisable if and only if there is a basis of
Rn consisting of eigenvectors of A.
17
Suppose that this is the case, and let v1 , . . . , vn be n linearly independent eigenvectors,
where vi is an eigenvector for eigenvalue λi . Then the vectors form a basis of Rn , and
the matrix P = (v1 . . . vn ) is such that P −1 exists, and P −1 AP = D where
D = diag(λ1 , . . . , λn ).
239
17. Diagonalisation
P = (v1 . . . vn ) .
P is the matrix whose columns are the basis of eigenvectors of A and AT is the matrix
representing T , which in this case is simply A itself, so that
P −1 AP = A[B,B] = D.
In other words, the matrices A and D are similar. They represent the same linear
transformation, but A does so with respect to the standard basis and D represents T in
the basis of eigenvectors of A.
What does this tell us about the linear transformation T = TA ? If x ∈ Rn is any vector,
then its image under the linear transformation T is particularly easy to calculate in B
coordinates, where B is the basis of eigenvectors of A. That is, suppose the B
coordinates of x are [x]B = [b1 , b2 , . . . , bn ]B , then since [T (x)]B = A[B,B] [x]B = D[x]B , we
have
λ1 0 . . . 0 b1 λ1 b1
0 λ 2 . . . 0 b2 λ2 b2
0 0 . . . 0 ... = ... .
[T (x)]B =
0 0 . . . λn bn B λn bn B
You simply multiply each coordinate by the corresponding eigenvalue.
Geometrically, we can describe the linear transformation A as a stretch in the direction
of the eigenvector vi by a factor λi (in the same direction if λ > 0 and in the opposite
direction if λ < 0). Indeed this can be seen directly. Since Avi = λi vi , each vector on
the line tvi , t ∈ R is mapped into the scalar multiple λi tvi by the linear transformation
A. If λi = 0, the line tvi is mapped to 0.
Example 17.5 Consider the first 3 × 3 matrix for which we found the eigenvalues
and eigenvectors in section 17.1.2 (page 236); we will diagonalise the matrix
4 0 4
A = 0 4 4.
4 4 8
We have seen that it has three distinct eigenvalues 0, 4, 12. From the eigenvectors
17 we found we take one eigenvector corresponding to each of the eigenvalues
λ1 = 4, λ2 = 0, λ3 = 12, in that order,
−1 −1 1
v1 = 1 , v2 = −1 , v3 = 1 .
0 1 2
240
17.3. When is diagonalisation possible?
You can choose any order for listing the eigenvectors as the columns of the matrix
P , as long as you write the corresponding eigenvalues in the corresponding columns
of D, that is, as long as the column orders in P and D match. (If, for example, we
had chosen Pb = (v2 v1 v3 ) then Db = diag(0, 4, 12).)
As soon as you have written down the matrices P and D, you should check that
your eigenvectors are correct. That is, check that
Activity 17.3 Carry out this calculation to check that the eigenvectors are correct,
that is, check that the columns of P are eigenvectors of A corresponding to the
eigenvalues 4, 0, 12.
Then, according to the theory, if P has an inverse, that is, if the eigenvectors are
linearly independent, then P −1 AP = D = diag(4, 0, 12).
Activity 17.4 Check that P is invertible. Then find P −1 (the inverse may be
calculated using either elementary row operations or the cofactor method) and verify
that P −1 AP = D.
Note how important it is to have checked P first. Calculating the inverse of an incorrect
matrix P would have been a huge wasted effort.
Activity 17.5 Geometrically, how would you describe the linear transformation
TA (x) = Ax for this example?
241
17. Diagonalisation
The following result shows that if a matrix has n different eigenvalues then it is
diagonalisable, because the matrix will have n linearly independent eigenvectors.
Theorem 17.4 Eigenvectors corresponding to different eigenvalues are linearly
R
independent.
Read the proof of this theorem from the A-H text, where it is labelled
Theorem 8.33.
Using these results we have the important conclusion:
It is not, however, necessary for the eigenvalues to be distinct. What is needed for
diagonalisation is a set of n linearly independent eigenvectors, and this can happen even
when there is a ‘repeated’ eigenvalue (that is, when there are fewer than n different
eigenvalues). The following example illustrates this.
242
17.3. When is diagonalisation possible?
= (2 − λ)(λ2 − 6λ + 8)
= (2 − λ)(λ − 4)(λ − 2)
= −(λ − 2)2 (λ − 4).
We see immediately that this matrix has rank 1, so its null space (the eigenspace for
λ = 2) will have dimension 2, and we can find a basis of this space consisting of two
linearly independent eigenvectors. Setting the non-leading variables equal to
arbitrary parameters s and t, we find that the solutions of (A − 2I)x = 0 are
1 −1
x = s 1 + t 0 = sv1 + tv2 , s, t ∈ R,
0 1
Activity 17.6 How do you know that v1 and v2 are linearly independent?
Now, knowing that we will be able to diagonalise A, we find the eigenvector for λ = 4
by reducing (A − 4I).
−1 −1 1 1 0 −1
(A − 4I) = 0 −2 0 −→ . . . −→ 0 1 0
1 −1 −1 0 0 0
with solutions
1
x = t 0,
t ∈ R.
1
Let
1
v3 = 0 .
1
The eigenvectors corresponding to distinct eigenvalues are linearly independent, so the 17
vectors v1 , v2 , v3 form a linearly independent set. Then we may take
1 1 −1 4 0 0
P = 0 1 0 and P −1 AP = D = 0 2 0 .
1 0 1 0 0 2
243
17. Diagonalisation
Activity 17.7 Check this! Check that AP = P D. Once you have checked that the
columns of P are the eigenvectors corresponding to the eigenvalues in the
corresponding columns of D, the theory will tell you that P −1 AP = D. Why?
Example 17.8 Consider again the last 3 × 3 example in section 17.1.2. We found
that the matrix,
−3 −1 −2
A = 1 −1 1
1 1 0
has an eigenvalue λ1 = −1 of multiplicity 2, and a second eigenvalue, λ2 = −2. We
can find one (linearly independent) eigenvector corresponding to λ2 = −2. In order
to diagonalise this matrix we need two linearly independent eigenvectors for λ = −1.
To see if this is possible, we row reduce the matrix (A + I):
−2 −1 −2 1 0 1
(A + I) = 1 0 1 −→ . . . −→ 0 1 0 .
1 1 1 0 0 0
This matrix has rank 2 and the null space (the eigenspace for λ = −1) therefore (by
the rank-nullity theorem) has dimension 1. We can only find one linearly
independent eigenvector for λ = 1. All solutions of (A + I)x = 0 are of the form
−1
x = t 0 t ∈ R.
1
There is another reason why a matrix A may not be diagonalisable over the real
numbers. Consider the following example.
244
17.3. Overview
Overview
We defined eigenvalues and eigenvectors of a square matrix and explained the method
of finding these. We then said what it means for a matrix to be diagonalisable and
showed how to diagonalise a matrix when it is possible to do so.
Learning outcomes
At the end of this chapter and the relevant reading you should be able to:
245
17. Diagonalisation
Similarly, you should find that for λ = 12 the eigenvectors are non-zero multiples of
1
v3 = 1 .
2
Check that P P −1 = I. You have calculated AP in the previous activity, so now just
multiply P −1 AP to obtain D.
17
246
Chapter 18
Applications of diagonalisation
Introduction
We will now look at some applications of diagonalisation. We apply diagonalisation to
find powers of diagonalisable matrices and solve systems of simultaneous linear
difference equations. We also look at the topic of Markov chains. You should try to
understand why the diagonalisation process makes the solution possible, by essentially
changing basis to one in which the problem is readily solvable, namely the basis of Rn
consisting of eigenvectors of the matrix.
Aims
The aims of this chapter are to:
R
Essential reading
Anthony, M. and M. Harvey. Linear Algebra: Concepts and Methods. Sections 9.1
and 9.2.
Further reading
If you would like to have another source containing the material covered in this chapter,
you can look up the key concepts and definitions (as listed in the synopsis which
follows) in any elementary linear algebra textbook, such as the two listed in Chapter 1,
using either the table of contents or the index.
Synopsis
18
We start by showing how to determine the general nth power of a matrix if that matrix
can be diagonalised. We then look at systems of linear difference equations. We discuss
247
18. Applications of diagonalisation
two ways of solving these: using a change of variable, and using powers of matrices.
Underlying both these methods is the diagonalisation of the matrix that corresponds to
the coefficients of the system of equations. We then look at Markov chains, where we
use powers of matrices and look at some properties of Markov chains.
It is often useful, as we shall see in this chapter, to determine An for a general integer n.
Diagonalisation helps here. If we can write P −1 AP = D, then A = P DP −1 and so
An = A {z· · · A}
| AA
n times
−1
= (P DP ) (P DP −1 ) (P DP −1 ) · · · (P DP −1 )
| {z }
n times
= P D(P P )D(P P )D(P −1 P ) · · · D(P −1 P )DP −1
−1 −1
= P DIDIDI · · · DIDP
−1
{z· · · D} P
= P |DDD
n times
= P Dn P −1 .
The product P Dn P −1 is easy to compute since Dn is simply the diagonal matrix with
entries equal to the nth power of those of D.
λ1 0 ··· 0
0 λ2 ··· 0
D= ... .. .. . ,
. . ..
0 0 · · · λk
then
λn1 0 ··· 0
0 λn2 ··· 0
Dn =
... .. ... .. .
. .
0 0 · · · λnk
We give an illustrative example using a 2 × 2 matrix A, but you should be able to carry
out the procedure for 3 × 3 matrices as well.
Example 18.1 Suppose that we want a matrix expression for the nth power of the
18 matrix
1 4
A= 1 .
2
0
248
18.2. Systems of difference equations
so we may take (4, 1)T . Let P be the matrix whose columns are these eigenvectors.
Then
2 4
P = .
−1 1
The inverse is
−1 1 1 −4
P = .
6 1 2
18
249
18. Applications of diagonalisation
We cannot directly solve equation (18.1) for xt since we would need to know zt . On the
other hand we can’t work out zt directly from equation (18.2) or equation (18.3)
because to do so we would need to know yt ! It seems impossible, perhaps, but there are
ways to proceed.
Note that this (coupled) system of difference equations can be expressed as
xt+1 5 0 4 xt
yt+1 = 0 5 4 yt .
zt+1 4 4 9 zt
That is,
xt+1 = Axt ,
where
xt 5 0 4
xt = yt and A = 0 5 4 .
zt 4 4 9
The general system we shall consider will take the form xt+1 = Axt where A is an n × n
square matrix. We shall concentrate on 3 × 3 and 2 × 2 systems, though the method is
applicable to larger values of n.
We shall describe two techniques: one involving a change of variable, and the other
powers of matrices.
250
18.2. Systems of difference equations
To use the technique, we need to diagonalise A. You should work through this
diagonalisation yourself. We’ll omit the workings here, but if
−1 −1 1
P = −1 1 1
1 0 2
then
P −1 AP = D = diag(1, 5, 13).
Now let
ut
zt = vt
wt
be given by xt = P zt . Then the equation xt+1 = Axt gives rise (as explained above)
to zt+1 = Dzt . That is,
ut+1 1 0 0 ut
vt+1 = 0 5 0 vt ,
wt+1 0 0 13 wt
ut+1 = ut
vt+1 = 5vt
wt+1 = 13wt .
This is very easy to solve: each equation involves only one sequence, so we have
uncoupled the equations. We have, for all t,
ut = u0 , vt = 5t v0 , wt = (13)t w0 .
We have not yet solved the original problem, however, since we need to find xt , yt , zt .
We have
xt −1 −1 1 ut −1 −1 1 u0
xt = yt = P zt = −1 1 1 vt = −1 1 1 5t v0 .
18
xt 1 0 2 wt 1 0 2 (13)t w0
251
18. Applications of diagonalisation
But we have also to find out what u0 , v0 , w0 are. These are not given in the problem,
but x0 , y0 , z0 are, and we know that
x0 u0 −1 −1 1 u0
y0 = P v0 = −1 1 1 v0 .
z0 w0 1 0 2 w0
using row operations, or we can (though it involves more work) find out what P −1 is
and use the fact that
u0 x0 12
v0 = P −1 y0 = P −1 6 .
w0 z0 6
xt = 4 + 3(5t ) + 5(13)t
yt = 4 − 3(5t ) + 5(13)t
zt = −4 + 10(13)t .
Activity 18.3 Perform the omitted diagonalisation calculations required for the
18 example just given.
252
18.2. Systems of difference equations
work, but the good news is that it is just a matter of going through a definite (if
time-consuming) procedure.
x t = A t x0 .
This solution can be determined explicitly if we can find the tth power At of the matrix
A. As described in section 18.1, this can be done using diagonalisation of A.
Example 18.3 We solve the system of the above example using matrix powers.
The system is xt+1 = Axt where
5 0 4
A = 0 5 4
4 4 9
we have
P −1 AP = D = diag(1, 5, 13).
So A = P DP −1 and At = P Dt P −1 . Now, as you can calculate (the details are
omitted here),
−1/3 −1/3 1/3
P −1 = −1/2 1/2 0 ,
1/6 1/6 1/3
so
xt = At x0 = P Dt P −1 x0
Doing the multiplication (again, details omitted),
1 0 0 12 4 + 3(5t ) + 5(13t )
xt = P 0 5 t 0 P −1 6 = 4 − 3(5t ) + 5(13t ) ,
0 0 (13)t 6 −4 + 10(13t )
which is, of course, precisely the same answer as we obtained using the previous
method.
18
253
18. Applications of diagonalisation
Note that although this technique is presented as being different from the one using a
change of variable, they are essentially the same. Here, as before, the matrix P −1 x0
represents the coordinates of the vector x0 (the initial conditions) in the basis of
eigenvectors of A (the columns of P ). In both cases, diagonalisation enables us to solve
the system by a change of basis from the standard basis in Rn to a basis consisting of
eigenvectors of the matrix A.
Example 18.4 Suppose two supermarkets compete for customers in a region with
20,000 shoppers. Assume that no shopper goes to both supermarkets in any week,
and that the table below gives the probabilities that a shopper will change from one
supermarket (or none) to another (or none) during the week.
For example, an interpretation of the second column is that during any given week
supermarket B will keep 80% of its customers while losing 15% to supermarket A
and 5% to no supermarket. Suppose that at the end of a certain week (call it week
zero) it is known that the total population of T = 20, 000 shoppers was distributed
as follows: 10,000 (0.5 T ) went to supermarket A; 8,000 (0.4 T ) went to
supermarket B; and 2,000 (0.1 T ) did not go to a supermarket.
Let xt denote the percentage of total shoppers going to supermarket A in week t, yt
the percentage going to supermarket B, and zt the percentage who do not go to any
supermarket. The number of shoppers in week t can be predicted by this model from
the numbers in the previous week, that is,
0.70 0.15 0.30 xt
xt = Axt−1 where A = 0.20 0.80 0.20 , xt = yt
0.10 0.05 0.50 zt
with x0 = 0.5, y0 = 0.4, z0 = 0.1. The questions we wish to answer are: can we
predict from this information the number of shoppers at each supermarket in any
future week t?, and can we predict a long-term distribution of shoppers?
This is an example of a Markov chain.
254
18.2. Systems of difference equations
another, depending on the state it occupied at the previous observation, is known. The
system is then observed at a certain time, and the information is used to predict the
distribution of the system into its different states at a future time t.
The probabilities are listed in an n × n matrix A = (aij ) where the entry aij is the
probability that a member of the population will change from state j into state i. Such
a matrix, called a transition matrix, has the following two properties:
(2) The sum of the entries in each column of A is equal to 1: a1j + a2j + · · · + anj = 1.
Property (2) follows from the assumption that all members of the population must be
in one of the n states at any given time.
The distribution vector (or state vector) for the time period t is the vector xt ,
whose ith entry is the percentage of the population in state i at time t. The entries of xt
sum to 1, for the reason just given, that all members of the population are in one of the
states at any time. Our first goal is to find the state vector for any t, and to do this we
need to solve the difference equation
xt = Axt−1 .
xt = At x0 = (P Dt P −1 )x0 .
xt = P Dt (P −1 x0 )
λt1 0 ··· 0 b1
| | | 0 λt2 ··· 0 b2
= v 1 v2 · · · v n
... .. .. .. .
. ..
| | | . .
0 0 · · · λtn bn
255
18. Applications of diagonalisation
Example 18.5 We find the number of shoppers using each of the supermarkets at
the end of week t, and see if we can use this to predict the long-term distribution of
shoppers.
First diagonalise the matrix A. The characteristic equation of A is
0.70 − λ 0.15 0.30
|A − λI| = 0.20 0.80 − λ 0.20 = −λ3 + 2λ2 − 1.24λ + 0.24 = 0.
0.10 0.05 0.50 − λ
Activity 18.7 Carry out the omitted calculations for the diagonalisation above.
256
18.2. Overview
You will have noticed that an essential part of the solution of predicting a long-term
distribution for this example is the fact that the transition matrix A has an eigenvalue
λ = 1 (of multiplicity one), and that the other eigenvalues satisfy |λi | < 1. In this case,
as t increases, the distribution vector xt will approach the unique eigenvector q for
λ = 1 which is also a distribution vector, so that Aq = q. (The fact that the entries sum
to 1 makes q unique in this one-dimensional eigenspace.)
We would like to be able to know that this is the case for any Markov chain, but there
are some exceptions to this rule. A Markov chain is said to be regular if some integer
power of the transition matrix A has strictly positive entries, aij > 0 (so no zero
entries). In this case, there will be a long-term distribution as the following theorem
implies.
Theorem 18.1 If A is the transition matrix of a regular Markov chain, then λ = 1 is
an eigenvalue of multiplicity one and all other eigenvalues satisfy |λi | < 1.
A proof of this theorem can be found in texts on Markov chains and is beyond the
scope of this course. However, we can prove a similar, but less strong result, which
makes it clear that the only thing that can go wrong is for the eigenvalue λ = 1 to have
multiplicity greater than 1.
Theorem 18.2 If A is the transition matrix of a Markov chain, then λ = 1 is an
R
eigenvalue of A and all other eigenvalues satisfy |λi | ≤ 1.
Read the discussion in Section 9.2.6 of the text A-H following Theorem 9.19
there (Theorem 18.1 above) for a proof of Theorem 18.2.
Theorem 18.2 tells us that λ = 1 is an eigenvalue, but it might have multiplicity greater
than one, in which case either there would be more than one (linearly independent)
eigenvector corresponding to λ = 1, or the matrix might not be diagonalisable.
In order to obtain a long-term distribution we need to know that there is only one
(linearly independent) eigenvector for the eigenvalue λ = 1. So if the eigenvalue λ = 1 of
a transition matrix A of a Markov chain does have multiplicity 1, then Theorem 18.2
implies all the others will have |λi | < 1. There will be one corresponding eigenvector
which is also a distribution vector and, provided A can be diagonalised, we will know
that there is a long-term distribution. This is all we will need in practice.
Overview
We’ve seen that if a matrix is diagonalisable, then its powers can easily be determined.
We then applied diagonalisation to the solution of systems of difference equations. We
described two methods, the first using a change of variable from the standard basis to
the basis of eigenvectors, and the second using powers of a matrix. Although the 18
description of the second method seemed different, we observed that it is essentially the
same as the first giving the same form of solution. We then looked at the special case of
257
18. Applications of diagonalisation
systems of difference equations which are known as Markov chains and noted the special
attributes of such systems.
Learning outcomes
At the end of this chapter and the relevant reading you should be able to:
t −1
0 λt2 · · · 0 b2 b2 λt2
D (P x0 ) = ..
.. . . .. . = . .
. . . . .. ..
0 0 · · · λtn bn bn λtn
258
Appendix A
Sample examination paper
Important note: This Sample examination paper reflects the intended examination
and assessment arrangements for this course in the academic year 2014/2015. The
intended format and structure of the examination may have changed since the
publication of this subject guide. You can find the most recent examination papers on
the VLE where all changes to the format of the examination are posted.
1(a) Consider the following system of equations, for some constants a and b,
x − y + 2z = 4
3x − y − z = 0
x + y + az = b
Use matrix methods to determine what values a and b must take if this system is
consistent and has infinitely many solutions.
What must the value of a not be if the system has precisely one solution?
(b) If a = 4 and b = 1, find the solution of the above system using any matrix method
(Gaussian elimination, inverse matrix, Cramer’s rule).
(c) What does it mean to say that a set {x1 , x2 , . . . , xk } of vectors in Rn is linearly
dependent? Show that the set {x1 , x2 , x3 , x4 } of vectors in R4 is linearly dependent,
where
1 2 2 2
2 0 1 5
x1 =
1 , x2 = 3 , x3 = 7 , x4 = 6 .
4 5 3 6
Prove that the set H is closed under addition and scalar multiplication. Hence, or
otherwise, prove that it is a subspace of R3 .
Show that every vector w ∈ H is a unique linear combination of the vectors
1 0
v1 = 0 and v2 = 1 .
−1 5
(i) Suppose the vector x is such that the linear transformation T has
dimR(T ) = dimN (T ).
Write down a condition that the components of x must satisfy for this to happen.
Find a basis of R(T ) in this case.
(ii) Suppose the vector x is such that the linear transformation T has
dimN (T ) = 1.
Write down a condition that the components of x must satisfy for this to happen.
Find a basis of N (T ) in this case.
4 Suppose
−1 −2 −1
A = 4 −4 −8 .
−13 −2 11
5(a) Let
1 4 5 3 2 11
A = 0 2 4 2 2, b = 2 .
−1 1 5 0 1 6
A
262
Appendix B
Commentary on the Sample
examination paper
General remarks
We start by emphasising that candidates should always include their working. This
means two things. First, you should not simply write down the answer in the
examination script, but should explain the method by which it is obtained. Second, you
should include rough working. The Examiners want you to get the right answers, of
course, but it is more important that you demonstrate that you know what you are
doing: that is what is really being examined.
We also stress that if a candidate has not completely solved a problem, they may still
be awarded marks for a partial, incomplete, or slightly wrong, solution; but, if they have
written down a wrong answer and nothing else, no marks can be awarded.
Solutions to questions
Question 1(a) Since you are asked to use matrix methods, begin by thinking of the
system of equations in matrix form, as Ax = b with
1 −1 2 x 4
A = 3 −1 −1 , x = y , b = 0.
1 1 a z b
Read through the question to know all that is being asked. There are different
approaches you can take to start.
The most efficient method is to write down the augmented matrix and begin to row
reduce it,
1 −1 2 4 −→ 1 −1 2 4
(A|b) = 3 −1 −1 0 R2 − 3R1 0 2 −7 −12
1 1 a b R3 − R1 0 2 a−2 b−4
1 −1 2 4
−→ 0 2 −7 −12 .
R3 − R2
0 0 a+5 b+8
You are now in a position to answer the questions asked in the order in which they were
asked.
263
B
B. Commentary on the Sample examination paper
The system will be consistent with infinitely many solutions if and only if the last row
of the row echelon form is a row of zeros, so a = −5 and b = −8. It will have a unique
solution if and only if a + 5 6= 0, so a 6= −5. It will be inconsistent (no solution) if and
only if a + 5 = 0 and b + 8 6= 0, that is a = −5 and b 6= −8.
Alternatively, you can begin by evaluating the determinant of A, for example, using the
cofactor expansion by row 3,
1 −1 2
3 −1 −1 = 1(1 + 2) − 1(−1 − 6) + a(−1 + 3) = 10 + 2a.
1 1 a
The system will have a unique solution if and only if |A| 6= 0, so a 6= −5. If a = −5
there will either be infinitely many solutions or no solution, depending on the value of b.
To answer the remaining questions, you still need to row reduce the augmented matrix,
but this time you can do it with a = −5,
1 −1 2 4 −→ 1 −1 2 4
3 −1 −1 0 R2 − 3R1 0 2 −7 −12 .
1 1 −5 b R3 − R1 0 2 −7 b − 4
Comparing the last two rows, you can see that the system will be inconsistent if
b − 4 6= −12, that is if b 6= −8 and a = −5, and that there will be infinitely many
solutions if b = −8 and a = −5.
(b) If you have successfully solved part (a) of this question, then the easiest way to
solve the system with a = 4 and b = 1 is to substitute these values into the row echelon
form of the augmented matrix and continue reducing:
1 −1 2 4 1 −1 2 4 1 −1 2 4
0 2 −7 −12 = 0 2 −7 −12 −→ 0 2 −7 −12
0 0 a+5 b+8 0 0 9 9 0 0 1 1
1 0 0 − 12
1 −1 0 2 1 −1 0 2
−→ 0 2 0 −5 −→ 0 1 0 − 52 −→ 0 1 0 − 52 .
0 0 1 1 0 0 1 1 0 0 1 1
The unique solution is x = (x, y, z)T = (− 21 , − 25 , 1)T . (It is easy for you to check that
this is correct by substituting the values into the equations.)
You could also solve this system using the inverse matrix or Cramer’s rule. These are
covered in Chapter 8 of the subject guide. It is a good idea for you to practise these
methods by solving this system to obtain the same answer.
a1 x1 + a2 x2 + · · · + ak xk = 0.
Equivalently, the set {x1 , x2 , . . . , xk } of vectors is linearly dependent if one of the vectors
can be expressed as a linear combination of the others. (Either statement is acceptable).
264
B
To show that the set {x1 , x2 , x3 , x4 } of vectors in R4 is linearly dependent, where
1 2 2 2
2 0 1 5
x1 = 1 , x2 = 3 , x3 = 7 , x4 = 6 ,
4 5 3 6
you can write the vectors as the columns of a matrix A and row reduce it, thereby
solving the system of equations
Ax = a1 x1 + a2 x2 + a3 x3 + a4 x4 = 0.
The steps are not shown here, but you should show all steps in the examination. The
reduced row echelon form of a matrix is unique, so you should find that
1 2 2 2 1 0 0 2
−→ . . . −→ 0 1 0 −1 .
2 0 1 5
A= 1 3 7 6 0 0 1 1
4 5 3 6 0 0 0 0
From the reduced row echelon form you can deduce that there are infinitely many
solutions, since there is one non-leading variable, and therefore the vectors are linearly
dependent.
To find the linear combination, you can spot the linear dependence relations between
the columns of the reduced row echelon form, and the columns of A will have the same
relationship, namely,
x4 = 2x1 − x2 + x3 .
Or you can find the solution v = (−2, 1, −1, 1)T of Ax = 0 and use it to write down the
relationship between the columns of A, since
Av = −2x1 + x2 − x3 + x4 = 0,
and then solve for x4 . Either way, it is easy to check (and you should do this) that your
answer is correct by using the vectors,
2 1 2 2
5 2 0 1
= 2 − + .
6 1 3 7
6 4 5 3
Question 2(a) This question is a good test of your understanding of the material in
Chapters 5, 6 and 9 of the subject guide. If A is an m × n matrix with columns
c1 , c2 , . . . , cn such that the system of linear equations Ax = d has solution:
1 2 1
2 1 1
x= 0 + s 1 + t 0 = p + sv1 + tv2 ,
−1 0 −1
0 0 1
265
B
B. Commentary on the Sample examination paper
then you should be able to deduce certain properties of the matrix A just by looking at
the solution.
(1) The number of columns, n = 5. Why? Because the solutions, x, are 5 × 1 vectors,
and the multiplication Ax is only defined if A has the same number of columns as x has
rows.
(2) The number m cannot be determined. (But, m ≥ 3 from part (3).)
(3) The rank of A is 3. Essentially, this is deduced from the rank-nullity theorem
which says that rank(A)+nullity(A)=n, where n is the number of columns of A. So the
rank, r is r = n − dim(N (A)). You have also seen that the general solution of Ax = b is
of the form
x = p + a1 v1 + · · · + an−r vn−r
and the given solution is of the form x = p + sv1 + tv2 , so dim(N (A)) = 2 and
r = 5 − 2 = 3.
(4) The two vectors, v1 and v2 form a basis of the null space of A, N (A), So {v1 , v2 } is
a basis, where
v1 = (2, 1, 1, 0, 0, )T and v2 = (1, 1, 0, −1, 1)T .
(5) To answer this you need a good understanding of how the general solution is
obtained using Gaussian elimination. By looking at the solution, you can tell the
positions of the leading variables and the non-leading variables in the reduced row
echelon form of A. The non-leading variables must be in the third and fifth column
because of the positions of 0 and 1 in the solution vectors, and the leading ones must be
in the first, second and fourth columns. So a basis of the range, R(A), is the set of
vectors {c1 , c2 , c4 }.
(6) From Ap = d, you can deduce that d = c1 + 2c2 − c4 . Any solution x, so any value
of s and t, will also give you a vector such that Ax = d, and so a different linear
combination, but p is the simplest one to use.
(7) In the same way, using Av1 = 0, or Av2 = 0, you obtain the linear combinations
2c1 + c2 + c3 = 0 or c1 + c2 − c4 + c5 = 0.
Again, any linear combination of v1 and v2 can be used.
(b) This is a second-order difference equation,
√ as covered in Chapter 11 of the subject
guide. In
√ standard form, we have x t+1 −
√ ax t + (a/4)xt−1 = 0, so the auxiliary equation
√
2 2
is z − az + (a/4) = 0, which is (z − a/2) = 0, so there is just one solution, a/2.
Therefore, for some constants A and B,
√ t
a
xt = (At + B) .
2
√
The facts that x0 = −1 and x1 = a show that A = 3 and B = −1, so
√ t
a
xt = (3t − 1) .
2
(c) Let yn be the amount of money after the nth withdrawal. Then:
y1 = 20000(1.05) − 500,
266
B
y2 = (1.05)y1 − 500 = 20000(1.05)2 − 500(1.05) − 500,
y3 = y2 (1.05) − 500 = 20000(1.05)3 − 500(1.05)2 − 500(1.05) − 500.
= 10000(1.05)N + 10000.
Question 3(a) This question is covered in Chapters 12, 13 and 14 of the subject guide.
It also relies on understanding lines and planes in R3 as covered in Chapter 4.
2t
To show H = t : t∈R is closed under addition,
3t
2t 2s
let u, v ∈ H. Then u = t , v=
s , for some s, t ∈ R. Then
3t 3s
2t 2s 2t + 2s 2(t + s)
u+v = t + s = t+s = t+s ∈H
3t 3s 3t + 3s 3(t + s)
267
B
B. Commentary on the Sample examination paper
equation, we find that these values also satisfy the third equation. Therefore the system
has the unique solution a = 2s, b = s, and w = (2s)v1 + (s)v2 .
To answer the remaining questions, it helps for you to see what is going on.
(1) The set {v1 , v2 } is NOT a basis of the subspace H since v1 ∈
/ H (and also,
v2 ∈
/ H.)
2
A basis of H is {v} where v = 1 ; and dim(H) = 1.
3
(2) A Cartesian equation for the subspace G = Lin{v1 , v2 } is given by
1 0 x
0
1 y = x − 5y + z = 0.
−1 5 z
(This can be easily checked by substituting in the components of v1 and v2 , and you
should do this.) The set {v1 , v2 } is a basis of G. It spans as G is, by definition, the set
of all linear combinations of v1 and v2 . It is linearly independent as neither vector is a
scalar multiple of the other.
(Although this is not asked as part of the question, it should be clear to you by now
that, geometrically, H is a line through the origin, that is, H is the set of position
vectors whose endpoints determine a line in R3 .
The set G is a plane through the origin (meaning position vectors of points on the
plane). The line H is contained in the plane G, or algebraically, H is a subspace of G,
which is why every vector in H is a unique linear combination of the basis vectors of G.
Both H and G are subspaces of R3 .)
268
B
Then T is given by T (x) = Ax where A is a 3 × 4 matrix. The simplest way to answer
the questions is to construct this matrix, whose columns are the images of the standard
basis vectors, T (ei ),
1 2 −1 x
A= 0 3 5 y
−2 −1 7 z
In order to consider the two possibilities in parts (i) and (ii), row reduce this matrix,
beginning with R3 + 2R1 ,
1 2 −1 x 1 2 −1 x
A −→ 0 3 5 y −→ 0 3 5 y .
0 3 5 z + 2x 0 0 0 z + 2x − y
2x − y + z = 0.
If the vector x satisfies this condition, then a basis of R(T ) is given by the columns of A
corresponding to the leading ones in the row echelon form, which will be the first two
columns. So a basis of R(T ) is {v1 , v2 }.
You could also approach this question by first deducing from the dimension theorem
that the dimR(T ) = 2 as above, so R(T ) is a plane in R3 . Therefore {v1 , v2 } is a basis,
since these two vectors are linearly independent (because they are not scalar multiples)
and they span a plane whose Cartesian equation is given by
1 2 x
0
3 y = 6x − 3y + 3z = 0.
−2 −1 z
or 2x − y + z = 0. The components of the vector v3 satisfy this equation, and this is the
condition that the components of x must satisfy.
(ii) If the linear transformation has dimN (T ) = 1, then by the dimension theorem, you
know that dimR(T ) = 3 (therefore R(T ) = R3 ), so the echelon form of the matrix A
needs to have 3 leading ones. Therefore the condition that the components of x must
satisfy is
2x − y + z 6= 0.
Now continue with row reducing the matrix A to obtain a basis for N (T ). The row
echelon form of A will have a leading one in the last column (first multiply the last row
by 1/(2x − y + z) to get this leading one, then continue to reduced echelon form)
1 0 − 13
1 2 −1 0 1 2 −1 0 3
0
A −→ . . . −→ 0 3 5 0 −→ 0 1 35 0 −→ 0 1 5
3
0,
0 0 0 1 0 0 0 1 0 0 0 1
269
B
B. Commentary on the Sample examination paper
13
3
−5
3
so a basis of N (T ) is given by the vector w = or any non-zero scalar multiple
1
0
13
−5
of this, such as
3 .
Question 4 The null space of a matrix is introduced in Chapter 6 of the subject guide.
Diagonalisation and its application to systems of difference equations are found in
Chapters 17 and 18, respectively.
To find a basis of the null space of the matrix A, put it into reduced row echelon form
using the algorithm. The steps are not shown, but you should be able to carry them out
efficiently and accurately, and you should show all the steps in the examination.
−1 −2 −1 1 0 −1
A= 4 −4 −8 −→ . . . −→ 0 1 1 .
−13 −2 11 0 0 0
You can read the solution of the homogeneous system Ax = 0 from the reduced echelon
form of the matrix, setting z = t, t ∈ R, to obtain the general solution
1
x = t −1 = tv1 . t ∈ R.
1
Again, the steps are not shown here, but you should show them all in an examination.
You need to expand the determinant slowly and carefully to avoid errors. The
eigenvalues are λ1 = 0, λ2 = −6, λ3 = 12.
Next solve (A − λI)v = 0 for each of the other two eigenvalues. In each case the
reduced echelon form of the matrix, (A − λI) should contain a row of zeros, so that
there is a non-trivial solution giving the corresponding eigenvector. This checks that the
eigenvalues are correct. If the reduced echelon form of (A − λI) does not contain a row
of zeros, then you need to find your error. This may be in the row reduction, or it may
be in your characteristic equation or factorising. One quick way to check whether your
eigenvalue is correct is to substitute it into |A − λI| and see if you do get zero when you
evaluate the determinant.
Having solved (A − λI)v = 0 for each of λ2 and λ3 , you should have that the
270
B
corresponding eigenvectors are multiples of:
1 0
v2 = 2 and v3 = −1 ,
1 2
respectively. Again, all work should be shown.
At this stage, you should check that the eigenvectors are correct. Form a matrix P
whose columns are the eigenvectors and the diagonal matrix D with the corresponding
eigenvalues on the diagonal,
1 1 0 0 0 0
P = −1 2 −1 D = 0 −6 0 .
1 1 2 0 0 12
Now check that AP = P D by multiplying out the matrices AP and P D.
You know that P is invertible since eigenvectors corresponding to distinct eigenvalues
are linearly independent. Therefore, you can conclude that P −1 AP = D. Having
checked the eigenvalues and eigenvectors, you do not need to compute P −1 AP explicitly
to determine D; you can simply state the result because of the underlying theory.
Use this diagonalisation to determine the sequences (xn ), (yn ), (zn ) which have the
following properties:
xn+1 = −xn − 2yn − zn
yn+1 = 4xn − 4yn − 8zn
zn+1 = −13xn − 2yn + 11zn
and which satisfy the initial conditions x0 = y0 = 1 and z0 = 0.
Denoting by xn the vector (xn , yn , zn )T , this can be expressed as xn+1 = Axn , for which
the solution is given by
xn = An x0 = P Dn P −1 x0 .
Using the adjoint method (cofactors), or any other method, find P −1 , and immediately
check that the inverse is correct by showing P P −1 = I.
Then using the initial conditions,
1
5 −2 −1 1
! !
1 2
P −1 x0 = 1 2 1 1 = 1
2
6 −3 0 3 0 − 12
So that 1
xn 1 1 0 0 0 0 2
xn = yn = −1 2 −1 0 (−6)n 0 12
zn 1 1 2 0 0 (12)n − 12
The solution is,
1
xn = (−6)n
2
1
yn = (−6)n + (12)n
2
1
zn = (−6) + 12n .
n
2
271
B
B. Commentary on the Sample examination paper
(The answer can be checked by finding x1 both from the original equations and from
the solution. If you have time, you might want to do this.)
Question 5(a) Solving a linear system of equations by putting the augmented matrix
into reduced echelon form is an application of the basic material in Chapter 5 of the
subject guide. The vector form is specifically emphasised in Chapter 9.
You should begin this question as instructed, by writing down the augmented matrix
and putting it into reduced row echelon form. Do this carefully to avoid errors,
1 4 5 3 2 11 1 4 5 3 2 11
(A|b) = 0 2 4 2 2 2 −→ 0 1 2 1 1 1 −→
−1 1 5 0 1 6 0 5 10 3 3 17
1 4 5 3 2 11 1 4 5 3 2 11
0 1 2 1 1 1 −→ 0 1 2 1 1 1 −→
0 0 0 −2 −2 12 0 0 0 1 1 −6
1 4 5 0 −1 29 1 0 −3 0 −1 1
0 1 2 0 0 7 −→ 0 1 2 0 0 7 .
0 0 0 1 1 −6 0 0 0 1 1 −6
Check that you do have the reduced row echelon form; find the columns with leading
ones and make sure they have zeros elsewhere (above and below). As the question
specifically asks you to put the matrix into reduced row echelon form, if you stop at row
echelon form and use back substitution, you will not earn full marks, and you are also
less likely to obtain the correct answer.
You can ‘read’ the solution from the reduced echelon form. Assign parameters, say s
and t, to the non-leading variables x3 and x5 , and write down the other variables in
terms of these using the equations deduced from the matrix. The general solution is
x1 1 + 3s + t 1 3 1
x2 7 − 2s 7 −2 0
x= x
3
= s = 0 + s 1 + t 0 .
x4 −6 − t −6 0 −1
x5 t 0 0 1
x = p + sv1 + tv2 , s, t ∈ R.
To answer the questions concerning the column vectors, you need to understand the
material in Chapters 9, 13 and 14 of the subject guide.
The columns of the reduced row echelon form of a matrix satisfy the same dependency
relations as the columns of the matrix. From the reduced row echelon form of A, you
can see that
c3 = −3c1 + 2c2 .
Indeed, this also follows from Av1 = 0, and you can, and should, check that it is correct:
5 1 4
4 = −3 0 + 2 2 .
5 −1 1
272
B
To answer the next part, it is enough to say that in the reduced echelon form of A, the
columns with the leading ones correspond to the vectors c1 , c2 and c4 . Therefore these
vectors are linearly independent. (The reduced row echelon form of a matrix C
consisting of these three column vectors would have a leading one in every column, so
Cx = 0 has only the trivial solution.)
To conclude that B is a basis of R3 , you can state that B is a set of three linearly
independent vectors in a three-dimensional vector space, R3 , therefore B is a basis of
R3 . It is not sufficient to merely say that the vectors are linearly independent and span,
you would need to give a reason why they span R3 . (For example, by stating that there
is a leading one in every row, so Ax = b has a solution for all b ∈ R3 .)
From the solution, Ap = b, you have b = c1 + 7c2 − 6c4 . You should recognise that this
expresses b as a linear combination of the basis vectors, and the coefficients are the
coordinates of b in this basis, B. That is,
1
[b]B = 7 .
−6 B
(b) This part of the question continues with the material on the basis of a vector space
contained in Chapter 14. The material on changing basis is in Chapter 16.
To show that S = {c1 , c3 , c4 } is also a basis of R3 , you can calculate the determinant of
the matrix with these vectors as columns.
1 5 3
0 4 2 = 1(−10) − 1(10 − 12) = −8 6= 0.
−1 5 0
273
B
B. Commentary on the Sample examination paper
So Q−1 M is the transition matrix from B coordinates to S coordinates. The easiest way
to find Q−1 is using the cofactor method. Then
−10 15 −2 1 32 0
1 4 3
1
Q−1 M = − −2 3 −2 0 2 2 = 0 12 0 = P.
8
4 −10 4 −1 1 0 0 0 1
You can find the S coordinates of b using this matrix and [b]B from part (a),
1 32 0
23
1 2
[b]S = 0 21 0 7 = 72 ,
0 0 1 −6 B −6 S
which you can easily check. Or, you can find the S coordinates directly from the basis S
by solving b = ac1 + bc3 + cc4 for a, b, c using Gaussian elimination or by using the
inverse matrix, Q−1 , which you found above.
You can also do this using the results of part (a). You know that b = 1c1 + 7c2 − 6c4
and c3 = −3c1 + 2c2 . If you solve the latter equation for c2 and substitute into the
equation for b, you will obtain the vector b as a linear combination of c1 , c3 , c4 , and
hence the coordinates of b in this basis.
This idea can be used in a better understanding of the matrix P . Notice the simple
form of the transition matrix P from B coordinates to S coordinates. If you have a
vector expressed as a linear combination of the basis vectors of B and as a linear
combination of the basis vectors of S, then the coefficients of the first and last vectors
will be the same in either basis since the first and last basis vectors are the same. Only
the middle vector is different. Therefore, P will be of the form
1 a 0
P = 0 b 0.
0 c 1
P = 0 21 0 .
0 0 1
274
B
Comment form
We welcome any comments you may have on the materials which are sent to you as part of your
study pack. Such feedback from students helps us in our effort to improve the materials produced
for the International Programmes.
If you have any comments about this guide, either general or specific (including corrections,
non-availability of Essential readings, etc.), please take the time to complete and return this form.
Name
Address
Email
Student number
For which qualification are you studying?
Comments
Date: