Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                

Alexander Graham - Kronecker Products and Matrix C

Download as pdf or txt
Download as pdf or txt
You are on page 1of 128

123

380A

ELLIS HORWOOD SERIES IN


MATHEMATICS AND ITS APPLICATIONS
Series Editor: Professor G. M.RELL, Chelsea College, University of London
The works in this series will survey recent research, and introduce new areas and up-to-date
mathematical methods. Undergraduate texts on established topics will stimulate student
lnterest by including present-day applications, and the series can also include selected
volumes of lecture notes on important topics which need quick and early publication.
In all three ways it is hoped to render a valuable service to those who learn, teach,
develop and use mathematics.
MATHEMATICAL THEORY OF WAVE MOTION
I G. R. BALDOCK and T. BRIDGEMAN, University of Liverpool.
MATHEMATICAL MODELS IN SOCIAL MANAGEMENT AND LIFE SCIENCES
D. N. IlURGllES ant! A. D. WOOD, Cranricld Institute of Technology.
MODERN INTRODUCTION TO CLASSICAL MECHANICS AND CONTROL
D. N. llUROIHIS, Cranfield Instilute of Technology and A. DOWNS, Sheffield Univ~rslty.

I
CONTROL AND OPTIMAL CONTROL
D. N. BURGHES, Cranfield Institute of Technology and A. GRAHAM, The Open Uni-
versity. Milton Keynes.
TEXTBOOK OF DYNAMICS
F. CHaRLTON, University of Aston. Birmingham.
1 VECTOR AND TENSOR METHODS
F. CHaRLTON, University of Aston. Birmingham.
,!, TECHNIQUES IN OPERATIONAL RESEARCH
VOLUME 1: QUEUEING SYSTEMS
~ VOLUME 2: MODELS. SEARCH, RANDOMIZATION
B. CONOLLY, Chelsea College, University of London
MATHEMATICS FOR THE BIOSCIENCES
G. EASON, C. W. COLES, G. GETTINBY, University of Strathclyde.
HANDBOOK OF HYPER GEOMETRIC INTEGRALS; Theory, Applications, Table.,
Computer Programs
H. EXTON, The Polytechnic, Preston.
MULTIPLE HYPERGEOMETRIC FUNCTIONS
H. EXTON, The Polytechnic, Preston '.-
COMPUTATIONAL GEOMETRY FOR DESIGN AND MANUFACTURE
\. D. FAUX and M. J. I'RATT, Cranfield Institu~c of Technology.
APPLIED LINEAR ALGEBRA '
R. J. GaULT, Cranfield Institute of Technology.
MATRIX THEORY AND APPLICATIONS FOR ENGINEERS AND MATHEMATICIANS
A. GRAHAM. The Open University, Milton Keynes. ..
APPLIED FUNCTIONAL ANALYSIS
D. H. GRIFfEL, University of Bristol.
GENERALISED FUNCTIONS: Theory. Applications
R. F. HOSKINS, Cranfield Institute or Technology.
MECHANICS OF CONTINUOUS MEDIA
S. C. HUNTER, University of Shefrield.
GAME THEORY: Mathematical Models of Conflict
A. J. JONES, Royal Holloway College, University of London.
USING COMPUTERS
B. L. MEEK and S. FAIRTHORNE, Queen Elizabeth College, University of London.
SPECTRAL THEORY OF ORDINARY DIFFERENTIAL OPERATORS
E. MULLER-PfEIFFER, Technical High School, Ergurt.
SIMULATION CONCEPTS IN MATHEMATICAL MODELLING
F. OLIVEIRA-PINTO, Chelsea College, University of London.
ENVIRONMENTAL AERODYNAMICS
R. S. SCORER, Imperial College of Science and Technology, University of London.
APPLIED STATISTICAL TECHNIQUES
K. D. C. STOODLEY. T. LEWIS and C. L. S. STAINTON, University of Bradford.
LIQUIDS AND THEIR PROPERTIES: A Molecular and Macroscopic Treatise with Appli-
cations
H. N. V. TEMPERLEY! University College of Swansea. University of Wales and D. H.
TREVENA, University of Wales, Aberystwyth.
GRAPH THEORY AND APPLICATIONS
H. N. V_ TEMPERLEY. University College of Swansea.
Kronecker Products and
Matrix Calculus:
with Applications

ALEXANDER GRAHAM, M.A., M.Sc., Ph.D., C.Eng. M.I.E.E.


S~nior Lecturer in Mathematics,
The Open University,
Milton Keynes

.:.....

ELLIS HORWOOD LIMITED


Publishers· Chichester
Halsted Press: a division of
JOHN WILEY & SONS
New York· Brisbane· Chichester· Toron to
123
380:0.

first published in 1981 by


ELLIS HORWOOD LIMlTEO IEB, England
Market Cross House, Cooper Street, Chichester, West Sussex, P019
of the
The publisher's colopho n is reproduced from James Gillison's drawing
aocient Market Cross, Olichester.

CxA
Distributors: I "'r
I:>0

Australia. New Zealand. South-east Asia: G, G


Jacaranda-Wiley Ltd., Jacaranda Press. I '\ I
JOHN WILEY & SONS INC.,
G.P.O. Box 859, Brisbane, Queensland 40001, Australia
Canada:
JOHN WILEY & SONS CANADA LIMITED
22 Worcester Road. Rexdale, Ontario, Canada.
Europe. Africa:
JOHN WILEY & SONS LIMITED
Baffins Lane, Chichester. West Sussex, England.
North and South America al1d the rest of the world:
Halsted Press: a division of
JOllN WILEY & SONS
605 Third Avenue, New York. N.Y. 10016, U.S.A.

© 1981 A. Graham/Ellis Horwood Ltd.


British Library Cataloguing in Publication Data
Grw:un. Alexander
Kronecker products and matrix calculus. -
(Ellis Horwood series in mathematics and its applications)
1. Matrices
I. Title
512.9'43 QA188
Library of Congress Card No. 81-7132 AACR2
ISBN 0-85312 -391-8 (Ellis Horwood Limited, Library Edition)
ISBN 0-85312 -427-2 (Ellis Horwood Limited, Student Edition)
ISBN 0-470-27 300-3 (Halsted Press)
Typeset in Press Roman by Ellis Horwood Ltd.
Printed in Great Britain by R. J. Acford. Chichester

COPYRIGHT NOTICE - ed, stored in a retrieval


All Rights Rescrved. No purt or this publication may be reproduc al. photocopying,
system, or trans1nittcd, III any form or by any means, electronic, mcchanic
Llmltcd, Market Cross
recording or otherwise , without the permbsio n of Ellis Horwood
House, Cooper StIcet, Chicheste r, West Sussex, England.
Table of Contents

Author's Preface .......................................... 7

Symbols and Notation Used .................................. 9

Chapter 1 - Preliminaries
1.1 Introduction ....................................... 11
1.2 Unit Vectors and Elementary Matrices ...................... J I
1.3 Decompositions of a Matrix ............................. 13
1.4 The Trace Function .................................. 16
1.5 The Vec Operator ..: ................................. 18
Problems for Chapter 1. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20

Chapter 2 - The Kronecker Product


2.1 Introduction ....................................... 21
2.2 Definition of the Kronecker Product ....................... 21
2.3 Some Properties and Rules for Kronecker Products ............. 23
2.4 Definition of the Kronecker Sum ......................... 30
2.5 The PerIllutation Matrix associating vecX and vecX' . ............ 32
Problems for Chapter 2 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35

Chapter 3 - Some Applications for the Kronecker Product


3.1 Introduction ....................................... 37
3.2 The Derivative of a Matrix .............................. 37
3.3 Problem 1: solution of AX + XB == C . . . . . . . . . . . . . . . . . . . . . 38
3.4 Problem 2: solution of AX + XA ==}1X ..................... 40
3.5 Problem 3: solution of X==AX+XB ..................... 41
3.6 Problem 4: to finu the transition matrix associated with
the equation X == AX + XB ............................ 42
3.7 Problem 5: solution ofAXB == C ......................... 44
3.8 Problem 6: Pole assignment for a Multivuriable System ........... 45
23

6 Table of Content s

Chapter 4 - Introdu ction to Matrix Calculus


. . . . . 51
4.1 Introduc tion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . . . . 52
4.2 The Derivatives of Vectors . .
for Vectors ...... . : . . . . . . . . . . . . . . . . . . . . . 54
4.3 The Chain rule
4.4 The Derivative of Scalar Function s of a Matrix
. . 56
with respect to a Matrix . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
4.5 The Derivati ve of a Matrix with respect to one of
. . 60
its Elements and Conversely . . . . . . . . . . . . . . . . . . . . . . . . . .
of a Matrix . . . . . . . . . . . . . . . . . . . 67
4.6 The Derivatives of the Powers
Chapter 4 . . . . . . . . . . . . . . . . . . . . . . • . . . . . . . . . 68
Problems for
Chllpter 5 - Further Development of Matrix Calculus including lin
Applica tion of Kronecker Product s
. . . . . 70
5.1 Introduc tion . . • . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
er Product s . . . . . . . . . . . . . . • 70
5.2 Derivatives of Matrices and Kroneck
5.3 The Determi nation of (ovecX) /(3vccY ) for more
. . 72
complic ated Equatio ns . . . . . . . . . . . . . . . . . . . . : . . . . . . . . .
ns with respect to a Matrix .... 75
504 More on Derivatives of Scalar Functio
tial. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 78
5.5 The Matrix Differen
. . 80
Problem s for Chapter 5. . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

Chapter 6 - The Derivative of a Matrix with respect to a Matrix


. 81
6.1 Introdu ction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . , ......
. . . . . . . . . . . . 81
6.2 The Definition and some Results . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . 84
6.3 Product Rules for Matrices . . . . . . . . .
.88
6.4 The Chain Rule for the Derivative of a Matrix with respect to Matrix
6 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 92
Problems for Chapter

Chapter 7 - Some Applications of Matrix Calculus


. . . . . 94
7.1 Introduc tion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
s of Least Squares and Constra ined Optimiz ation in
7.2 The Problem
. . 94
Scalar Variables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
7.3 Problem 1: Matrix Calculus Approac h to the Problems
. . 96
of Least Squares and Constrained Optlmiz ation . . . . . . . . . . . . . .
. . . . . . . . . . . . 100
7.4 Problem 2: The General Least Squares Problem . . .
the Multivar ia te Normal 102
7.5 Problem 3: Maximum Likelihood Estimate of
Transfo rmation s ... 104
7.6 Problem 4: Evaluation of the Jacobians of some
7.7 Problem 5: To Find the Derivative of an Exponen tial
. 108
Matrix with respect to a Matrix . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . III
Solution to Problems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . 121
Tables of Formula e and Derivatives. . . . . . . . . . . . . . . . . . . . . .
. . . . . . 126
Bibliography . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . 129
Index . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Author's Preface

My purpose in wrlting this book is to bring to the attention of tlw reader, some
recent developments in the field of Matrix. Calculus. Although some concepts,
such as Kronecker matrix. products, the vector derivative etc. are mentioned in
a few specialised books, no book, to my knowledge, is totally devoted to this
subject. The interested researcher must consult numerous published papers to
appreciate the scope of the concepts involved.
Matrix. calculus applicable to square matrices was developed by Turnbull
[29,30] as far back as 1927. The theory presented in this book is based on the
works of Dwyer and McPhail [15] published in 1948 and others mentioned in
the Bibliography. It is more general than Turnbull's development and is applicable
to non·square matrices. But even this more general theory has grave limitationS,
in particular it reqUires that in general the matrix elements are non constant and
independent. A symmetric matrix, for example, is treated as a special case.
Methods of overcoming some of these limitations have been suggested, but I am
not aware of any published theory wllich is both quite general and simple enough
to be useful.
The book is organised in the following way;
Chapter I concentrates on the preliminaries of matrix theory and notation
which is found useful throughout the book. In particular, the simple and useful
elementary matrix Is defined. The vec operator Is defined and many useful
relations are developed. Chapter 2 introduces and establishes various important
properties of the matrix Kronecker product.
Several applications of the Kronecker product are considered in Chapter 3.
Chapter 4 introduces Matrix CalculUS. Various derivatives of vectors are defined
and the chain rule for vector differentiation is established. Rules for obtaining
the derivative of a matrix with respect to one of its elements and conversely are
discussed. Further developments in Matrix Calculus including derivatives of
scalar functions of a matrix with respect to the matrix and matrix differentials
arc found in Chapter 5.
Chaptcr 6 deals with the derivative of a matrix with respect to a mattix.
123
3801'1.

8 Author's l'rcface

This includes the derivation of expressions for the derivatives of both the matrix
product IInu the Kronecker product of matrices with respect to a matrix. There
is also the derivation of a chain rule of matrix d if[eren tla llon. Various appJica lions
of ut least some of the mutrix culculus arc discussod in Chapter 1.
Dy making use, whenever possible, of simple notation, Including many
worked examples to illustrate most of the important results and other examples
at the end of each Chapter (except for Chapters 3 and 7) with solutions at the
end of the book, I have attempted to bring a topic studied mainly at post-
graduate and research level to an undergraduate level.
Symbols and Notation Used

A,B,C ... matrices


A' the transpose of A
01; the (i,j)th element of the matrix A
[aji] the matrix A having alj as its (i,j)th element
I", the unit matrix of order m X III
el the unlt vector
e the one vector (having all elemen ts equal to one)
EI/ the elementary matrix
Om the zero matrix of order III X m
lit/ the Kronecker delta
A' I the Ith column of the matrix A
A/. the jtll row of A as a column vector
A j .' the transpose of A j • (a row vector)
(A').I the ithe column of the matrix A'
(A').; the transpose of the ith column of A' (that is, a row vector)
tr A the trace of A
vecA an ordered stock of columns of A
A®B the Kronecker product of A and B
iff if and only if
diag {A} the square matrix having elements all, a22, ... along its diagonal
and zeros elsewhere

a matrix of the same order as Y

a matrix of the same order as X

an elementary matrix of the same order as X I


an elementary matrix of the same order as Y "
)'
i:

"
CHAPTER 1

Preliminaries
I't
, j
, I
!
I

j'
Ii
l.l INTRODUCTION 'j

In this chupter we introuuce some notation and discuss some results which will
be founu very useful for the uevelopmcnt of the theory of both Kronecker
products ,IIlU mutrix uifrcrentialion. Our aim will be to make the notation as
simple as possible although inevitably it will be complicated. Some simplification
may be obtaineu ut the expense of generality. For example, we may show that a
result holds for a square matrix of order 11 X 11 and state that it holds in the more
general case when A is of order m X n. We will leave it to the interested reader to , I

modify the proof for the more general case.


Further, we will often write

or just 22a;i instead of "!fia;i'


i=I/=1

when the summation limits are obvious from the context.


Many other simplifications will be used as the opportunities arise. Unless of
particular importance, we shall not state the order of the matrices considered.
It will be assumed that, fur example, wht:n taking the product All or ABC the
matrices are conformable.

1.2 UNIT VECTORS AND ELEMENTARY MATRICES

'OlC unit vcctors of order II arc Je{incJ as

o o
o a
a , ... , e tl o (1.1)

o a
12 Preliminaries [ell. 1

The one vector of order n is defined as

e ... (1.2)

From (1.1) and (1.2), obtain the relation


e = Eel (1.3)
The elementary matrix Ell is defined as the matrix (of order m X n) which
has a unity in the (i,j)th position and all other elements arc zero.
For example,
000 '" 0
o0 I ... 0
000 ... 0 (1.4)

000 ... 0

The relation between ej, ej and Eli is as follows


EI/ = elej (1.5)
where ej denotes the transposed vector (that is, the row vector) of ej.
Example 1.1
Using the unit vectors of order 3
!,
(i) form Ell, Ell' and El3 I
(ii) write the unit matrix of order 3 X 3 as a sum of the elementary matrices.

Solution
I
(i)
=
1 0
000
O~
[
000
Sec. 1.3] Decompositions of a Matrix 13

3
(ii) • [= E;I + Ell + 4'33 = L ejej .
1:1
The Kronecker delta 0lj is Jefined us

6 -
. {I if i::: j
I,j - 0 if j =P j

it can be expressed as
(1.6)
We can now determine some relations between unit vectors and elementary
matrices.
Eller == e/e}er (by 1.5)
= aIrel (1.7)
and
e~Eij = e~eiel
= Orlej . (1.8)
Also
EijErs := eleiere~ ::: 6irele~ = ol,Els . (1.9)
In particular if r = j, we have

EilEis = oifEis = Eis


and more generally

Ei/E/sEsm = EisEsm = Elm . (1.1 0)


Notice from (1.9) that

iJllErs = 0 If N= r .
1. 3 DECOMPOSITIONS OF A MATRIX
We consider a matrix A of order r?1 X n having the following form

A = 7 all all ... a l ]


1 an ... a2n (1.11 )

[
t1ml t1m2 ... t1mn
We denote the n columns of A by A'I> A. l , . .. A. n . So that

al /
A' I = 7 1 (j = 1,2, ... ,n) (1.12)
[
a,,'1 J
123
38'0:0.

14 Preliminaries [Ch. 1

and the m rows of A by Al.,A.], ... A m • so that

[
AI' = a/l~
a:l] (i= 1,2, ... ,m) (1.13)

at" .

Both the A./ and the AI' are column vectors. In this notation we can write A as
the (partitioned) matrix
A [A' l A' 2 ••• A.,,] (1.14)
or as
A (1.15)
(where the prime means 'the transpose of').
For example, let

so that

then

The elements, the columns and the rows of A can be expressed in terms of the
unit vectors as follows:

The jth column A.j == Aej ( 1.16)


The ith row AI.' = eiA. (1.17)
So that
Ai' = (eiA)' == A'e/' (1.18)
TIle (i,j)th element of A can now be written as

e, e, == e,'A'e,.
all ;:; 'A (1.19)
We can express A as the sum

A == J:.J:.aIIEI1 (1.20)
(where the Eli are of course of the same order as A) so that

A == J:.J:.al/eiej. (1.21 )
I j
Sec. 1.3] Decompositions of a Matrix 15
From (1.l6) and (1.21)

A. j == Aej = (f7Qije/ej)ej

== 77Qiiei(ejej)

~aiiej . (1.22)
I
Similarly
(1. 23)
so that
(1.24)

It follows from (1.21), (1.22), and (1.24) that


A = "j;A.jej (1.25)
and
A (I. 26)

Example 1.2
Write the 111 atrix

A Jail a,~
~21 a2~
as a sum of: (i) column vectors of A; (ii) row vectors of A.

Solutions
(i) Using (1.25)
A = A.le', + A. 2e'2

= ~~J (1 OJ + ~~J (0 IJ ,,
Using (1. 26)
A = eIA 1 .'+ e2A2.' I,
r'
[~J [all all] + [~J [a21 a221
I
== • 1:
,'
There exist interesting relations involving the elementary matrices operating 011
the rna trix A .
For example

EijA = eiejA (by 1.5)


.i
== el A/ (by 1.17) (1. 27)
123 .

Preliminaries [Ch.l
16

Similarly
AEtj == AejeJ == A.tei .(by 1.16) (1. 28)
sO that
AEi/ = A.,ei (1.29)
AEtli == Ae,ejB = A.IBj .' (by 1.28 and 1.27) (1.30)
EllA Erl == e/e,' A '
eres (by 1.5)
-- ,
e/alre! (by 1.19)
== aJrele~ == alrEls (1.31)
In particular
EIIAE" == aJrEjr (1.32)

Example 1.3
Use elementary matrices and/or unit vectors to find an expression for
(i) The product AB of the matrices A = [al,l and B = [bljl.
(ii) The kth column of the product AB
(iii) The kth column of the product XYZ of the matricesX= [Xlj], Y= [YIJl
and Z == [tij]
Solutions
(i) By (1.25) and (1.29)
A = 1:A. Jei = 1:AEJi ,
hence
AB = i:,(AEJI)B = i:,(AeJ)(ejB)
= "EA.jBj.' (by (1.16) and (1.17)
(ii) (a)
(AB)." = (AB)e" = A (Bek) = AB." by (1.l6)
(b) From (i) above we can write
(AB)." = "E(AejejB)e"
j
= "E(Aej)(ejBe,,)
J
= "I;A.JbJk
]
by (1.16) and (!.l9)

(iii) (XYZ)'k = "I;Zjk(XY)'j by (ii)(b) above


I

== "I;(Z/kX)Y' J by (ii)(a) above.


]

1.4 THE TRACE FUNCTION


The trace (or the spur) of a square matrix A of order (n X n) is the sum of the
diagonal terms
Sec. 1.4] The Trace Function 17
We write
tr A == 'kall . (1. 33)
From (1.19) we have

so that
tr A = ke~Ael . . (1.34)
From (l.l6) and (1.34) we find
tr A == ke/A' I (1.35)
and from (I.J 7) and (1.34)
tr A = kAI:e~ . (1.36)

We can obtain similar expression for the trace of a productAB of matrices.


For example

tr AB = fe/ABel . . (1.37)

= 'k'k(eiAe,)(ejBea (See Ex. 1.3)


/ I

I
= 'k'kal/b .. (1.38) ,I •
j I /1 . t
! I
Similarly
tr BA = l:ejBA ej
j

:Ek(ejBe/)(ejAeJ)
I J

= LLbj/aIJ . (1.39)
I I "

From (1.38) and (1.39) we find that


tr AB = tr BA . (l.40)
From (1.16), (1.17) and (1.37) we have
tr AB == kAj.'B./ (1.41)
Also from (l.40) and (1.41)

tr AB == 'J:,Bj.A. j • (1.42)
Similarly
tr AB' == :EAj.'B I . (1.43)
and since tr AB' == tr A'B

tr AB' == LA.iB' 1 (1.44)


380A

Preliminaries [eh. 1

.T\IIO important properties of the trace are


tr (A + B) = tr A + tr B (l.4S)
and tr (exA) = ex tr A (1.46)

where IX is a scalar.
These properties show that trace is a linear function.
For real matrices A and B the various properties of tr (AB') indicated above
show that it is an inner product and is sometimes written as
tr (AB') = (A, B)

1.5 THE VEe OPERATOR


We shall make use of a vector valued function denoted by vec A of a matrix A
defined by Neudecker [22].
If A is of order m X n

A.]
vecA ==

l ~'l

A· n
From the definition it is clear that vec A is a vector of order mn.
For example if
(1. 47)

then

vecA:: a2l all


[ al2
a22

Example 1.4
Show that we can write tr AB as (vec A')' vec B

Solution
By (1.37)
tr AB :: '2:eiABe,
I
::
7A,:B., by (1.16) and (1.17)

:: 1;(A')
1
,B
.1.1

(since the ith row of A is the ith column of A')


Sec. 1.5 J The Vee Operator 19
Hence (assuming A and B of order 11 X 11)

"AD = I(A').;(A').; .. '(A')'''J~~l


= (vee A')' vee B lJ
Before discussing a useful application of the above we must first agree on
notation for the transpose of an elementary matrix, we do this with the aid of
an example.

LetX = XII X12 Xl'~


[ xli Xn Xl3 '
,
then an elementary matrix associated wilh will X will also be of order (2 X 3).
For example, one such matrix is i'
rO 1al, I·
Lo ° oj II
II
The transpose of Ell is the matrix '1 ;

.1
I, ..
Although at first sight this notation for the transpose is sensible and is used
frequently in this book, there are associated snags. The difficulty arises when
the suffix notation is not only indicative. of the matrix involved but also deter-
mines specific elements as in equations (1.31) and (1.32). On such occasions it
will be necessary to use a more accurate notation indicating the matrix order and
the element involved. Then instead of E12 we will write E 12 (2 X 3) and instead
of Et2 we write E21 (3 X 2).
More generally if X is a matrix or order (m X n) then the transpose of
Ers(m X /I)
will be written as

unless an accurate description Is nccl)ssary, in whidl case the trunspost: will be


written as
Esr(nXm) •
Now for the application of the result of Example 1.4 which will be used later on
in the book.
20 Preliminaries [eh. 1]

From the above


tr E:sA (vec Ers)' (vec A)

where aN is the (r,s)th element of the matrix A.


We can of course prove this important result by a more direct method.
Ir E~~A

Problems for Chapter 1

(1) The matrix A Is of order (4 X /1) and the matrixB is of order (n X 3). Write
the product AB in terms of the rows of A, that is, AI-. A:z., ... and the
columns of B, that is, B.}, B. 2 , •••
(2) Describe in words the matrices
(a) AEllc and (b) ElkA .
Write these matrices in terms of an appropriate product of a row or a column
of A and a unit vector.
(3) Show that

,
(a) tr ABC == l:.A/.BC./

(b) tr ABC = tr BCA ::;: tr CAB


(4) Show that tr AEli ::;: ajl
(5) B = [b ij ] is a matrix of order (n X n)
diag{B}::;: diag{b ll , b 22 ,.·., b nn } == l:.bttEtt .
Show that if
ati == tr BEt/III
then A == [ajil == diag{B} .
CHAPTER 2

The Kronecker Product


,.
i
:1 :
.1.

~I t

2.1 INTRODUCTION I!
Kronecker product, also known as a direct product or a tensor product is a ,.
;

concept having its origin in group theory and has important applications in
particle physics. But the technique has been successfully applied in various fields
of matrix theory, for example in the solution of matrix equations which arise
when using Lyapunov's approach to the stability theory. The development of the
technique in this chapter will be as a topic within the scope of matrix algebra.

2.2 DEFINITION OF THE KRONECKER PRODUCT


Consider a matrix A == [aji] of order (m X n) and a matrix B = [bii ] of order
(r X s). The Kronecker product of the two matrices, denoted by A ® B is defined
as the partitioned matrix

.i
A®B == (2.1)

A ® B is seen to be a matrix of order (mr X liS). It has mil blocks, (he (i,j)th
block is the matrix aliB of order (r X s).
For example, let

then
A"" III
a21
aid'
a22
B = [;11 bl~,
b 21 b22

all b ll all b l2 a12 b ll a12 b 12

A®B GaB ""~ =


a21 B a22 B
all b 21 all b 22 a12 b21
a21 b ll a21 b 12
a12 b 22

anbll an b 12

a21 b 21 a21 b22 an b 21 an b 22


380A

The Kronecker Product [eh.2


I
--:Notlce that the Kronecker .prod~ct is d~fine.d .irrespective of the order of the
-matrices involved. From thispomt of view It IS a more general concept than I
I
;1Jil1trbt multiplication. As we develop the. theory we will. note ot~er ~esults
which are more general than the corresponding ones for matrlX multlphcatlOn.
The K.ronecker product arises naturally in the following way. Consider two
linear transformations
x == Az and y == Bw
which, in the simplest case take the form

xil == rau al~[Z0 and [Yll = rbu bl~ rw~ (2.2)


[ xJ L:z21 a2~ zJ YlJ ~21 bJ LwJ
2

We can consider the two transformations simultaneously by defining the following

and (2.3)

To find the transformation between I' and v, we determine the relations between
the components of the two vectors.
For example,
XIYI == (auzl + a12 z 2) (b u wI + b12 w2)

Similar expressions for the other components lead to the transformation

[UbU a12 b ll
allb 12

I'
allb 21 au b 22 al2 b2 1 ,nb'J
a12 b 22
v
1,

a21 bl! a21 b l2 a22 b l2

I
a 22 b u

a21 b l2 a21 b 22 a22 b 21 a22 b 22


or \
p. == (A ® B)v , !
I
that is
Az 0 Bw == (A ® B)(z 0 w) (2.4)

Example 2.1
Let t lj be an elementary matrix of order (2 X 2) defined in section 1.2 (see 1.4). \
Find the matrix !
2 2 L
U == 2: 2: E
1=1/=1
1,} ® E/,/
~
I
I
'.
Sec.2.3J Some Properties and RuieH for Kronecker Products 23
Solution

so that

Note. U is seen to be a square matrix having columns wWch are unit vectors
el(i = 1,2, .. ). It can be obtained from a unit matrix by a permutation of rows I,
or columns. It is known as a permutation matrix (see also section 2.5).

2.3 SOME PROPERTIES AND RULES FOR KRONECKER PRODUCTS


We expect the Kronecker product to have the usual properties of a product.
If a is a scalar, then

A@(aB) = a(A@B) (2.5)


-- '" 'I,~·'
'/
\'-
Proof
The (i,j)th block of A @ (aB) is rf;
!:'1:"
[ali (aB)] ~' ,
= a[ajiB] i,:.';;

= a[ (i,j) th block of A @ B]

TIle result follows.

II The product is distributive with respect to addition, that is


(a) (A + B) @ C = A @ C +B ® C (2.6)
(b) A®(B+C) = A'~B+A®C (2.7)
Proof
We will only cOllsiucr (a). The (i,j)lh block of (II + B) ® C is
(all + b1j)C .

The (i,j)th block of A ® C + B @ Cis

aijC + bijC = (aji + bij)C .


380A

The Kronecker Product [Ch.2

Since the twO blocks are equal for every (iJ), the result follows.

III The product is associa tive


A ® (B ® C) := (A ® B) ® C . (2.8)

IV There exists
a zero element Om" := Om ® 0"
(2.9)
a unit clement Imn == 1m ® ill .
The unit matrices are all square, for example 1m in the unit matrix of order
(m X m).
Other important properties of the Kronecker product follow.

V (A ® B)' = A' ® B' (2.10)

Proof
The (i,j)th block of (A ® B)' is
ajlB' .
VI (The 'Mixed Product Rule ').
(A ®B)(C®D) = AC®BD (2.11)

provided the dimensions of the matrices are such that the various expressions
exist.

Proof
The (j,j)th block of the left hand side is obtained by taking the product of the
jth roW block of (A ® B) and the jth colum block of (C ® D), this is of the
following form
c1jD
[ailB anB ... a/rIB] c2j D

The (i,j)th block of the right hand side is (by definition ofthe Kronecker product)

gijBD
where gij is the (i,j)th element of the matrix AC. But by the rule of matrix
multiplications

glj = '2/al r Cr j •
Sec. 2.3] SOllie Properties and Rules for Kronecker Products 25

Since the (i,j)th blocks are equal, the result follows.

VII Given A(m X m) and B(II X n) and subject to the existence of the various
inverses,
(2.12)

Proof
Usc (2.11)
(A ® B)(A- I ® B- 1) = AA- I ® BO- I = 1m ® In := 1,'111

The result follows.

VIII (See (l.47»


vec (AYB) (B' ®A)vecY (2.13)

Proof
We prove (2.13) for A, Y and B each of order n X n. The result is true for
A(m X n), Y(1l X r), B(r X ~). We use the solutions to Example 1. 3 (iii).

Y' l
[bl",A b2",A ... b/l",Aj Y' 2

I:

[B.",'®AjvecY
= [(B')",.' ®A] vee Y
"
since the transpose of the klh column of B is the kth row of 8'; the results
follows.

Example 2.2
Write the equation

~~: :~~ ~~ :~
in a matrix-vector form.

Solutioll
The equation can be written as AX! = C. Use (2.12), to find
vec(AXI) = (I®A)vecX = vccC,
380A

The Kronecker Product [Ch.2


26
so that

~" ~,] ~:l [']


al'l 0
a'll a22 0 C21

0 0 all Cil

0 0 a21 a'l2 X4 el2

Example 2.3
A and B are both of order (n X n), show that
(i) vecAB = (/®A)vecB
(ii) vee AB = (B' ® A) vee /
(iii) vee AB = 'k(D')./c ® A./c

Solution
(I) (As in Example 2.2)
In (2.13) let Y =B andB =/.
(li) In (2.13) let Y =: I .
(iii) In vec AB = (B' ® A) vee /

substitute (1.25), to obtain

veeAB = [f(B')./ei ® rA.jej] vecl


[fr«B')., ®A.j ) (e~ ® ej)] vecl (by 2.11)

The product ej ® ej is a one row matrix having a unit element in the [(i -l)n +
jJth column and zeros elsewhere. Hence the product
[(B')., ®A.d [ei ® ell
is a matrix having

as its [(i -l)n + j]th column and zeros elsewhere. Since vecl is a one column
matrix having a unity in the 1st, (n + 2)nd, (211 + 3)rd ... n2 rd position and
zeros elsewhere, the product of
[(B')./ ®A.J][ei ® ej] and vecI
is a one column matrix whose elements are all zeros unless j and j satisfy
(i-l)n+j = l,orn+2,or2n+3,,,.,orn2
Sec.2.3} Some Properties and Rules for Kronecker Products 27

that is
i = i = J or i = i = 2 or i = i == 3 or ... , i =i = n
in which case the one column matrix is
(B').; ® A' I (i = 1,2, ... , /I)
The result now follows.

IX If {Ad and {xtl are the eigenvalues and the corresponding eigenvectors for A
and (Ilj} and (y/} are the eigenvalues and the corresponding eigenvectors for B,
then
A®B

has eigenvalues p,'tJ..lj} with corresponding eigenvectors {XI ® y/}.

Proof
By (2.11)
(;I ® 11) (xI ® y/) = (AXt) ® (By/)
= (A/XI) ® (JJiYJ)
= AtJ-lj(x/ (8) y/) (by2.S)
The result follows.

X Given the two matrices A and B of order II X /I and m X m respectively

where IAI means the determinant of A.


,I

Proof
Assume that AI> 1... 2 , •.. , A" and Ill> J.l2, ... , Ilm are the eigenvalues of A and B
respectively. The proof relies on the fact (see [18] p. 145) that the determinant
of a matrix is equal to the product of its eigenvalues.
Hence (from Property IX above)

IA®BI=TI\Pj
I, /

;:: (AT [i 11/) (f.!.21 nIll) ... (A~ IqII J1/)


~I ~I

= O\1/...2 ... A,,)m (J.ll J12 ... J1m)"


= IAlmlB11I •
28 The Kronecker Product [Ch.2

Another important property of Kronecker products follows.


A ®B = Vi(B ®A)V2 (2.14)
where VI and V1 are permutation matrices (see Example 2.1).

Prool
tetAYB' = X, then by (2.13)
(B ® A) vec Y = vec X . (1)
Oil takins transpose, we obtain
Byil' = X'
So that by (2.13)
(A ®B)vecY' = vecX' . (2)
From example 1.5, we know that there exist permutation matrices VI and V2
such that X' U X I·
vee = I vee and vec Y = U2 vee Y .
Substituting for vee Yin (1) and mUltiplying bOlh sides by VI> we obtain
UI (B®A)V2 vecY' == VI vecX. (3)
Substituting for vec X' in (2), we obtain
(A ®B)vecY' = VI veeX . (4)

The result follows from (3) and (4).


We will obtain an explicit formula for the permutation matrix V in section
2.5. Notice that Vi and V2 are independent of A and B except for the orders of
the matrices.

XII If f is an analytic function, A is a matrix of order (n X n) and f(A) exists,


tllell
(2.15)
and
(2.16)

Proof
Since f is an analytic function it can be expressed as a power series such as

fez) = aO+alz+aZz z + ...


so that
f(A) = aoln + alA + azA2 + ... = 2: akAk ,
k=O
where AO = In.
By Cayley Hamilton's theorem (see [18]) the right hand side of the equation
for f(A) is the sum of at most (n + 1) matrices.
Sec. 2.3] Some Properties and Rules for Kronecker Products 29
We now have
[(J/II ®A) = L ak(Jm ®A)
k=O

~"" (I", ®akA k ) by (2.11) ,I


k=O !I'

L(I". ®ak Ak ) by (2.5)


k=O

_ '\ k
- J", ® LUkA by (2,7)
k=O

= 1/11 ®[(A) .
This ~roves (2.15); (2.16) is proved similarly.
We can write
I [CA ® 1m} = 2>k(A ® Im)k
k=O
Ir
I, = 2: ak(A 1m)
k=O
k
® by (2.11)

! 2: (ak Ak ® 1m)
I k=O
,. .
, I

·1 I

I = [(A) ®Im
by (2.6)

This proves (2.l6).


An important application of the above property is for
fez) = eZ •

(2.l5) leads to the result

(2.17)
and (2.16) leads to
eA®Im = eA ®Im (2.18)

Example 2.4
Use a direct method to verify (2.17) and (2.18).

Solution
The Kronecker Product [Ch.2

'Ole right hand side is a block diagonal matrix, each of the m blocks is the sum
I A2
1m + A + 2! + ... = eA
The result (2.17) follows.
eA ® 1m = (In ® 1m) + (A ® 1m) + ft (1m ® A)2 + ...

= (In ® 1m) + (A ® 1m) + 2!1 (A2 ® 1m) + ...


= (I" + A + 2\ Al + ...) ® 1m
= eA ®Im
XIII tr (A ® B) = tr A tr B
Proof
Assume that A is of order (n X n)
tr (A ® B) = tr (au B) + tr (~2B) + ... + tr (a"nB)
= all tr B + a22 tr B + ... + ann tr B
= (all + all + ... + a,.,.) tr B
= tr A tr B .

2.4 DEFINITION OF THE KRONECKER SUM


Given a matrix A(n X n) and a matrixB(m X m), their Kronecker Sum denoted
by A (£) B is defined as the expression
(2.19)
We have seen (Property IX) that if {AJ and {J.lj} are the eigenvalues of A and B
respectively, then {AiJ.lj} are the eigenvalues of the product A ® B. We now show
, the equivalent and fundamental property for A (£) B.

XIV If {Ad and {J.lj} are the eigenvalues of A and B respectively, then {AI + J.lj}
are the eigenvalues of A (£) B.

Proof
Let x and y be the eigenvectors corresponding to the eigenvalues A and J.l of A
and B respectively, then

(A (£)B)(x®y) = (A®I)(x®y)+(I®B)(x®y) by (2,19)


= (Ax®y)+(x®fly) by (2.11)
= A(X ® y) + /lex ® y)
= (A + J.l)(x ® y)
The result follows.
Sec. 2.4 J Definition of the Kronecker Sum 31

Example 2.5
Verify the Property XIV for

A = G-~ and B

Solution
For the matrix A;
AI = 1 and XI = [~J
A. = 2 and x. = [-:J
For the matrix B;
J.il = I and YI = C]
J.i. = 2 and Y2 = [~J
We find

~
~
0
0 -Io -1
C=A(f)B =
0 3 0
0 2 1

and !p[ - C! = p (p - 1) (p - 2)(p - 3), so that the eigenvalues of A (f) Bare


p = 0 = AI + J12 and XI <8> Y2 = [0 o 0]'
p = 1 = A2 + J12 and X2 <8> Y2 = [0 o -1]'
and
p = 2 = AI + J11 and XI<8>YI [1 o OJ'
p = 3 = A2 + J11 and X2 <8> YI [1 -1 -1]'

The Kronecker sum frequently turns up when we are considering equations


of the form;
AX + XB = C (2.20)
where A(n X n), B(rn X rn) and X(II X rn).
Use (2.13) and solution to Example 2.3 to write the above in the form
(lm <8> A + B' ® I,J vee X = vee C (2.21 )
or
(ll'0A)vecX = vecC .
It is interesting to note the generality of the Kronecker sum. For example,
exp (A + B) = exp A exp B
32 The Kronecker Product [Ch.2

if and only if A and E commute (see [18] p. 227)


whereas exp (A e E) = exp (A ® I) exp (/ ® B)
even if A and B do not commute!

Example 2.6
Show that
exp(A eE) = expA ® exp B
where A (n. X II), E (ill X 111).

Solution
By (2.11)
(A ®Im)(In ®E) == A ®B
and
(In ® B) (A ® I,n) == A ® B
hence (A ® 1m) and (In ® B) commute so that
exp (A (1:) E) == exp (A ® 1m + In ® B)
= exp (A 01m ) exp (In ® B)
= (expA ® In,) (J" ® expB) (by 2.15 and 2.16)
== expA 0 expE (by 2.11)

2.5 THE PERMUTATION MATRIX ASSOCIATING vecX AND vecX'


If X = (xii] is a matrix of order (m X /1) we can write (see (1.20»
X = LLX .. g·
I I 1/ 1/

where Ell is an elementary matrix of order (m X n). It follows that


X' ::: LLXIIEI/.
so that
vec X' == LLXI/ vec 1:.1/, (2.22)
We can write (2.22) in a form of matrix multiplication as

Xli

vecX'::: [vecE{I:vecE~I:'" vecEI~I:vecE;l: ... vecE':1nl X",I


x12
Sec. 2.5 J The Permutation Matrix 33
that is
vee X' == [vee £{1: vee £~1: ... vee £';'1: vee £{2: .. , vee E';'nJ vee X.
So the permutation matrix associating vee X and vec X' is
, I

(2.23) I
I
Example 2. 7 , t

Given "

determine the matrix U , ,


I
such that
vecX' = UvecX.
Solution

::

I
o 0 0 0 0
o0 000
00001 0
r:"
j <I
U=
o 1 0 0 0 0 I
!
00000 ,
1
!
o0 0 0 0

We now obtain the permutation matrix U in a useful form as a Kronecker


product of elementry matrices.
As it is necessary to be precise about the suffixes of the elementary matrices,
we will usc the notation explained at the end of Chapter 1.
As above, we write
m n
X' == 22
r=1 s=1
xrsEsr en x m)
By (1.31) we can write

X' = "2./sr (/I X m) XEsr (/I X m) . ;,


r,s
3801\

The Kronecker Product [Ch.2

vec X I = vec L.
'"
Esr (11 X m) XEsr (ll X m)
r,s
= 2: vn
r,s
[Ers X n) ® Esr (11 X m)J vec X . by (2.13)

It follows that
U = 2:
r,s
b..... s (m X 11) ® Esr (/I X m) (2.24)

or in our less rigorous notation

(2.25)
r,s
Notice that U is a matrix of order (mil X 1Il1l).
At first sight it may appear that the evaluation of the permutation matrices
Ut and U1 in (2.14) using (2.24) is a major task. In fact this is one of the examples
where the practice is much easier than the theory.
We can readily determine the form ofa permutation matrix - as in Example
2.7. So the only real problem is to determine the orders of the two matrices.
Since the matrices forming the product (2.14) must be conformable, the
orders of the matrices UI and U2 are determined respectively by the number of
rows and the number of columns of (A ® B).

Example 2.8
Let A = [aij] be a matrix of order (2 X 3), and B = [b ij ] be a matrix of order
(2 X 2).
Determine the permutation matrices ~ and U2 such that
i
A ® B = UI (B ® A) U2

Solution
(A ® B) is of the order (4 X 6)
From the above discussion we conclude that UI is of order (4 X 4) and U2 is of
order (6 X 6).
0 0 0 0 0
I.1
1 0 0 0 0 0 1 0 0 0
0 0 1 0 0 0 0 0 I 0
UI = and U2 =
o1 0 0 0 0 0 0 0
o0 0 0 0 0 0 0
0 0 0 0 0
Sec.2.5J The Permutation Matrix 35

Another related matrix which will be used (in Chapter 6) is

(2.26)
r,s

When the matrix X is or order (/II X /I), Dis or order (m 2 X n2 ).

Problems of Chapter 2
(i) Given,
U = 2:
r,s
Ers (m X /1) ® Esr (/I X m) .
Show that
u- I = U' = LEsr (II X m) ®E,. (m X n)
r,s

(2) A = [ali], B = [bi;] and Y = [YliJ are matrices all of order (2 X 2), use a
direct method to evaluate
(a) (i) A YB
(ii) fl' ®A .
(b) Verify (2.13) that
vecAYB = (B'®A)vecY.

(3) Given

A = G~ 'nd B = L~ ~
(a) Calculate
A ®B add B ®A .
(b) Find matrices U1 and U2 such that
A ®B = u..(B ®A)U2 .

(4) Given

- rL-23 -~'
A - 41 calculate,

(a) exp(A)
(b) 'exp(A ®1).
Verify (2.16), that is
exp(A)®I = exp(A®I).
~6
The Kronecker Product [eh.2]

(5) Given A -
-
[2 IJ
-1 -1
and B -
-
~
3
J
4'
calculate

(a) £1 ® B- 1 and
(b) (A ®Btl.
Hence verify (2.12), that is
(A ®Btl = A-I ®B- I

(6) Given
A =
G -J
4 -2
and B =
G ~,
2 find

(a) The eigenvalues and eigenvectors of A al'ld B.


(b) The eigenvalues and eigenvectors of A ® B.
(c) Verify Property IX of Kronecker Products.

(7) A, B, C and D are matrices such that


A is similar to C, and
B is similar to D.
Show that A ®B is similar to C®D.
CHAPTER 3

Some Applications of the


Kronecker Product

3.1 INTRODUCTION
There are numerous applications of the Kronecker product in various fields
including statistics, economics, optimisation and control. It is not OUr intention
to discuss applications in all these fields, just a selected number to give an idea
of the problems tackled in some of the literature mentioned in the Bibliography.
There is no doubt that the interested reader will find there various other appli-
cations hopefully in his own field of interest.
A number of the applications involve the derivative of a matrix - it is a well
known concept (for example see f18] p. 229) which we now briefly review.

,"
3.2 THE DERIVATIVE OF A MATRIX , :
Given the matrix
"
'j

the derivative of the matrix, with respect to a scalar variable t, denoted by


(d/dt)A (t) or just dA/dt or ACt) is defined as the matrix

:tA(t) = [d~ aii(t)] . (3.1)

Similarly, the integral of the matrix is defined as

(3.2)

For example, given

2t2
A= [
sin t
380A

Some Applications of the Kronecker Product [eh.3

d
-A -
dt - G t
cos t 2t OJ und
f ~
Adt =
t3 4t Je
-cos t 2t + t 3/3
+

e
where is a constant matrix.
One important property follows immediately. Given conformable matrices
A(t) and B{t), then

dt [AB] = dAB + A dB (3.3)


d dt dt

Example 3.1
Given
e = A®B
(each matrix is assumed to be a function of t) show that

de == dA ® B +A ® dil (3.4)
dt dt dt

Solution
On differentiating the (i,j)th block of A ® B, we obtain

J ( ) Jail dB
dt ailB = Tt B + all dt

which is the (i,j)th partition of

dA®B +A® dB
dt dt '
the result follows.

3.3 PROBLEM 1
Determine the condition for the equation
AX+XB = e
to have a unique solution.

Solution
We have already considered this equation and wrote it (2.21) as
(B' (±) A) vee X = vee C
or
Gx =c (3.5)

where G = B' (±) A and c = vee C.


/
"
Sec. 3.3] Problem 1 39

Equation (3.5) has a unique solution Iff G Is nonsingular, that is iff the
eigenvalues of G are all l1onZIlro. Since, by Property XIV (sec section 2.4), the
eigenvalues of G are D\ + J.lJ} (note that the eigenvalues of the matrix B' arc
the same as the eigenvalues of B). Equation (3.5) has a unique solution iff

AI + Pi *' 0 (al! i and j) .


We have thus proved that AX + BX = C has a unique solution iff A and (-B)
have no eigenvalue in common.
If on the other hand,A and (-B) have common eigenvalues then the existence
of solutions depends on the rank of the augmented matrix

[G;c] .

If the rank of [G :c] is equal to the rank of G, then solutions do exist, otherwise
the set of equations

AX+XB = C
is not consistent.

Example 3.2
Obtain the solution to

AX+XB = C
where

! (i) A .,'
I
.,
(ii) A
t
j

I Solution
Writing the equation in the form of(3.5) we obtain,

(i)

where for convenience we have denoted


38'0:4

Some Applications of the Kronecker Product [Ch.3

solving we obtain the unique solution

(li) In case (ii) A and (-B) have one eigenvalue (i\ == 1) in common. Equation
(3.5) becomes

[-~ =~ ~J l=~J [~J


0

4 0 0 -1 X3 == 5
0401 X4 -9

and fllnk G == rank [G;e].


G is seen to be singular, bu t
rank G == rank [G: c] == 3
hence at least one solution exists. In fact two linearly independent solutions are

Xl ==
~I J
-2 -1
and Xl == 11 q
l-2 -lJ
any other solution is a linear combination of Xl and Xl'

3.4 PROBLEM 2
Determine the condition for the equation
AX-XA == pX (3.6)
to have a nontrivial solution.

Solution
We can write (3.6) as

Hx == px (3.7)
whereH ==I®A -A'®I and
x == vecX .
(3.7) has a nontrivial solution for x iff
IpI-HI := 0

that is iff JJ. is an eigenvalue of H. But by a simple generalisation of Property XIV,


Sec.3.5} Problem 3 41
section 2.4, the eigenvalues of H are {(AI - Ai)} where {AI} are
the eigenvalues
of A. j-Ience (3.6) has a nontrivial solu lion iff : I

J.L = "'1- '>v


EXample 3.3
Determi ne the solution s to (3.6) when

A = [~ ~J and J.L = -2.


Solution
p. = -2 is un eigenvalue of H, hence we expect a nontrivi al solution .
Equatio n
(3.7) becomes

[~ ~ ~~ -~ ~~ -2~i]
On solving, we obtain

3.5 PROBLEM 3
Use the fact (see [18] p. 230) that the solution to

is
x=Ax , x(O)= c (3.8)
x = exp(At )c (3.9)
to solve the equation

X = AX+X ll, X(O) =C (3.10)


where A(n X II), B(rn X 1/1) and X(n X m).

Solution
Using the vee operato r on (3.10) we obtain

Gx, xeO) = c (3.11 )


where
x = vee X, c = vee C
and
G = 1m ®A +B'®11l •
42 Some Applications of the Kronecker Product [eh.3

Dy (3.9) the solution to (3.11) is

vee X == exp {(1m ® A)t + (B' ® !,I)t) vec C


== [exp (1m ® A)tJ[cxp (li' ® III)r] vee C (see Example 2.6)
= [Im ® exp (A t) 1[exp (B't) ® 1111 vee C by (2.17) and
(2.J8).
We noW make use of the result
veeAB == (B'®J)veeA
(in (2.13) put A = I and Y == A) in conjunction with the fact that
[exp(B't)] == exp(Rt),
to obtain
(exp (B't) ® In) vee C = vee [Cexp (Rt)]
Using the result of Example 2.3(1), we finally obtain
vee X = vee [exp(At) Cexp(Bt) (3.12)
So that X = exp (At) C exp (Bt).

Example 3.4
Obtain the solution to (3.10) when

A
_- II-~
LO ~' B == [
I
o
al
-lJ
and C = [-21 olIj
Solution
(See [18] p. 227)

exp(At) = ' exp (Ht) =


ret
La 0 J
e-t
hence

X -
_[_e 3t
2f _ e3f
e

3.6 PROBLEM 4
We consider a problem similar to the previous one but in a different context.
An important concept in Control Theory is the transition matrix.
Very briefly, associated with the equations
i == A(t)X or x= A(t)x
is the transition matrix 1>1 (t, r) having the following two properties
4>1 (t, r) ;;: A(t)1>J (t, r) (3.13)
and
Sec. 3.6] Problem 4 43

[For simplicity of notation we shall write <P for <1)«(, r).J If A is a constant matrix,
it is easily shown lhal
<I' :::: cxp(At) .
Similarly, with the equation
X == XlJ so that .r' == lJ'X'
we associate the transition matrix <I12 such that
4>. = B'(p. . (3.14)
111e problem is to find the transition matrix associated with the equation
X=:: AX+XB (3.15)
given the transition matrices epl and (D 2 defined above.

SO/U/iol1
We can write (3.15) as
x == Gx
where x and G were defined ill the previous problem.
We define a matrix'" as
!J; (t, r) = 'I>. (t, r) ® <PI Ct, r) (3.16)
We obtain by (3.4)
~ ::;; (i12 ® 'PI + 1/.12 ® (~I
+ '112 ® (A(ll t )
::;; (lJ'<p 2) ® '/>I by (3.13) and (3.14) "

== (B'q,2) ® (1epl) + (1ep2) ® (Aepl)


.,

I Ience
== [B'®I+I®A}[ep2®q,d

~ = GifJ .
by (2.11) 'J

r
Also
if;(t, t) = <P2 (r, t) ® (PI (t, r)
== J®J
!
=1. (3.18)
The two equations (3.17) and (3.18) prove that if; is the transition matrix for
(3.15)

Example 3.5
Find the transition matrix for the equation

. ~l° -~
X == X+X G
1 OJ
2 0 -1 .
44 Some Applications of the Kronecker Product [eh.3

Solution
In this case both A and B are Constant matrices. From Example 3.4.

[~t :~r- e
2
<PI == exp (At) ;:;:
J
<P2 = exp (Bt) = ret 0 J
["
La e- t
sO that
a
I/J ;:;: .<P2 ® <PI
a ,"
e3t
o 0
o a
- ," a
1
a
L,j
et
For this equation

G I~ -~ ~ _~l
l~ 0
and it is easily verified that
0 J
~ = GI/J
and
1/1(0) = I.

3.7 PROBLEM 5
Solve the equation
AXB;:;: C
where all matrices are of order n X n.

Solution
Using (2.13) we can write (3.19) In the form

Hx ;:;: c (3.20)
where H = B' ® A, x = vee X and c ;:;: vee C.
The criteria for the existence and the uniqueness ofa solution to (3.20) are
well known (see for example [18]).
The above method of solving the problem is easily generalised to the linear
equation of the form
A I XB I +A 2 XB 2 + ... +ArXBr ;:;: C (3.21)
Sec.3.!l1 Problem 6 4S

Equation (3.21) can be written as for exampJe (3.20) where this time
II = n; ® AI + /32® A2 + '" + B; ® Ar
Example 3.6
Find the matrix X, given
AIXB I + A 2 XB2 = C
where

AI = G-J' B\ = ~~. ~J, A2 = [-~ --~J '


B2 = [ 0 iJ, and C = ~ -6l .
-J ~ Lo 8J
Solution
For this example it is found that

2 2
1 -1
-21 -~2
022 S
[
-4 -2 -5 -4
and c' = [4 0 --ti 8J
It follows that

x = W', = [=i
so that
X = rL-l -2loj
J

3.8 PROBLEM 6
This problem is to determine a constant output feedback matrix K so that the
closed loop matrix of a system has preassigned eigenvalues.
A multi variable system is defined by the equations
x= Ax +Bu
(3.22)
y = Cx

where A(n X n), BCn X m) and CCr X n) are constant matrices. u, x and yare
r.olumn vectors of order m, nand r respectively.
3S0A

Some Applications of the Kronecker Product [Ch.3


We are concerned with a system having an output feedback law of the form
u = Ky (3.23)
.. where K(rn X r) is the constant control matrix to be determined.
On substituting (3.23) into (3.22), we obtain the equations of the closed
loop system
x = (A+BKC)x
(3.24)
y = ex .
The problem can now be restated as follows:
Given the matrices A, B, and C, determine a matrix K such that
(3.25)
= 0 for preassigned values A = AI' Al' ... ,A/I .

Solution
Various solutions exist to this problem. We are interested in the application of
the Kronecker product and will follow u method suggested in [24].
We consider a matrix H(n X n) whose eigenvalues are the desired values fq,
Al' ... , An, that is

1t.J-HI:::; 0 for A:::; AltAl'''',}.n (3.26)


and
IAI - HI = ao + alA + '" + an_lAn-I + "An (3.27)
Let
A+BKC = H
so that
BKe = H-A = Q (say) (3.28)

Using (2.13) we can write (3.28) as


(e' @ B) vec K = vec Q (3.29)

or more simply as
Pk = q (3.30)

whereP= e'@B,k=vecK and q = vec Q.


Notice that P is of order (n 2 X mr) and k and q are column vectors of order
mr and n2 respectively.
The system of equations (3.30) is overdetermined unless of course m = n = r,
in which case can be solved in the usual manner - assuming a solution does
exist!
In general, to solve the system for k we must consider the subsystem of
linearly independent equations, the remaining equations being linearly dependent
Sec. 3.8.1 Problem 6 47
on this subsystem. In other words we determine a nonsingular JlIatrix T(nl X n7.)
such that

(3.31 )

where PI is the matrix of the coefficients of the linearly independent equations


of the system (3.30) and Pl is a null matrix.
Premultiplying both sides of (3.30) by T and making use of (3.31), we
obtain
TPk:; Tq
or

(3.32)

If the rank of P is mr, then PI is of order (IIlr X fIlr), P2 is of order «(n 2 - II1r1X
mr) and u and v are of order mr and (/1 2 - mr) respectively.
A sufficien t condition for the I!xbtencc of a solu tion to (3.32) Of equivalently
to (3.30) is that
v :; 0 (3.33)
in (3.32).
If the condition (3.33) holds and rank PI:; mr, then
k = Wlu . (3.34 )
The condition (3.33) depends on an appropriate choice of H. The underlying
assumption being made is that a matrix H satisfying this condition does exist.
This in turn depends on the system under consideration, for example whether it
is controllable.
"

Some obvious choices for the foo11 of matrix 11 are: (a) diagonal, (b) upper 'j

or lower triangular, (c) companion form or (d) certain combinations of the above
forms.
Although forms (a) and (b) are well known, the companion form is less well
documented.
Very brief1y, the matrix

0 1 0
0 0 I
11=

-aD
0 0
-al
0
-a2
-fJ
is said to be in 'companion' form, it has the associated characteristic equation

(3.35)
Some Applications of the Kronecker Produ.ct (eh.3
Example 3. 7
Determine the feedback matrix K so that the two input - two output
system

x ~ ~ r~ ~3 x+ u
G ~J ~ J
-3
has closed loop eigenvalues (-I, -2, -3).

Solution
We must first decide on the form of the matrix H.
Since (see (3.28»
H-A = BKC
and the first row of B is zero, it follows that the first row of
H-A
must be zerO.
We must therefore choose H in the compan ion form.
Since the characteristic equation of His
(:\+1) (:\+2) (:\+3) = :\3+6:\ 2+11:\ +6 O.

H = I ~ ~ ~J (see (3.35»)
L-6-1 1-6
and hence (see (3.28))

Q = I_~ _~ ~l
L-8 -8 -~J o 0 0 0
I 0 1 0
o 1 0 1
o 0 0 0
p = C'®B o 0
o 0
o 0 0 0
o 0 I 0
000
Sec. 3.8) Problem 6 49 i I

An appropriate matrix T is the following

0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0
0 0 1 0 0 0 0 0 0
T= 1 0 0 0 0 0 0 0 0
0 0 0 1 0 0 0 0 0
0 0 0 0 0 0 1 0 0
0 0 I 0 0-1 0 0 0
0 0 0-1 0 0 0 0

It follows that

0 0 0
0 0 0
I 0 I 0

{~j
0 1 0
TP = ----------
0 0 0 0
0 0 0 0
0 0 0 0
0 0 0 0
0 0 0 0 ,',
and

0
-8
-3
-8
Tq =
0 t;j
0
0
0
0
23

so Some Applications of the Kronecker Product [eh.3]

Since

so that (see (3.34»

~3J
Hence
k
[
~ If'. ~ _:

K = f-3 ol
Lo-sJ
CHAPTER 4

Introduction to Matrix Calculus

4.1 INTRODUCTION
It is becoming ever increasingly clear that there is a real need for matrix calculus
in fields such as multivariate analysis. There is a strong analogy here with matrix
algebra which is such a powerful and elegant tool in the study of linear systems
and elsewhere.
Expressions in multivariate analysis can be written in terms of scalar calculUS,
but the compactness of the equivalent relations in terms of matrices not only
leads to a better understanding of the problems involved, but also encourages the
consideration of problems which may be too complex to tackle by scalar calculus.
We have already defined the derivative of a matrix with respect to a scalar
(see (3.1)), we now generalise this concept. The process is frequently referred to
as fonnal or symbolic matrix differentiation. The basic definitions involve
the partial differentiation of scalar matrix functions with respect to all the
elements of a matrix. These derivatives are the elements of a matrix, of the same
order as the original matrix, which is defined as the derived matrix. The words
'formal' and 'symbolic' refer to the fact that the matrix derivatives are defined
without the rigorous mathematical justification which we expect for the corres-
ponding scalar derivatives. This is not to say that such justification cannot be
made, rather the fact is that this topic is still in its infancy and that appropriate ",
mathcmatical basis is being laid as thc subject develops. With this in mind we
make the following observations about the notation used. In general the elements
of the matrices A, B, C, ... will be constant scalars. On the other hand the
elements of the matrices X, Y, Z, ... are scalar variables and we exclude the
possibility that any element can be a constant or zero. In general we will also
demand that these elements are independent. When this is not the case, for
example when the matrix X is symme tric, is considered as a special case. The
reader will appreciate the necessity for these restrictions when he considers the
partial derivatives of (say) a matrix X with respect to one of its elements x rs .
Obviously the derivative is undefined if x rs is a constant. The derivative is
Ers if x rs is independent of all the other elements of X, but is Ers + Esr if X is
symmetric.
38'01\

52 Introduction to Matrix Calculus [Ch.4

There have been attempts to define the derivative when xrs is a constant (or
zero) but. as far as this author knows. no rigorous mathematical theory for the
general case has been proposed and successfully applied.

4.2 THE DERIVATIVES OF VECTORS


Let x and y be vectors of orders nand m respectively. W~ can define various
deriva tives in the following way [lS]:
(1) The derivative of the vector y with respect to vector x is the matrix
3YI 3Y2 3Ym
OXI OXI OXI
oY oYI oll 3Ym
== (4.1)
ax ax?, ox?, OXl

OYI 3Y2 8Ym


OXn OXn aXn
of order (n X m) where YI' Y?" ... ,Ym and XI, X2 • •• ' ,Xn are the components of
y and x respectively.

(2) The derivatives of a scalar with respect to a vector. If Y is a scalar

oY
aXI
oy
(4.2)
ax

(3) The derivative of a vector y with respect to a scalar x

y
.,. a m] (4.3)
ax
Example 4.1
Given
y
Sec. 4.2] The Derivatives of Vectors 53
and
YI = xl -Xl
Y2 = xj + 3x.
Obtain iJy/iJx.
Solutiofl
aYI ah
2XI\ 0
aXI aXI
3y iJYI ay.
== -1 3
iJx ax. ax.
aYI aYl
0 2X3
aX3 aX3

In multivariate analysis, if x and yare of the same order, the abwlutc value
of the ueterminan t of 3x/3y, that is of

I :; i
is called the Jacobian of the transformation determined by
t
y = y(x) . t
Example 4.2
t
The transformation from spherical to cartesian co-ordinates is defined by x =
r sin e cos ..p, y = r sin e sin ..p, and z = r cos () where r > 0, 0 < () < 1f and
t
o ~..p <21f.
Obtain the Jacobian of the transformation.
Solution
Let
" .,
I
i
f
I
and t
r = Yl, () = Yl, 1/1 = Y3 , f.
f,.
f
J =
ax
oy
sin h cos Y3
Yl cos h cos Y3
sin h sin Y3
Yl cos h sin Y3
COSh
-Yl siny.
.
t
~

-Yl sin Y. sin Y3 Yl sin h cos Y3 o !f


=Y? siny •. 1
l

Definitions (4.1), (4.2) and (4.3) can be used to obtain derivatives to many
frequently used expressions, including quatratic and bilinear forms.
It
t.
f
i,
Introduction to Matrix Calculus [Ch.4
For example consider
y == x'Ax
Using (4.2) it Is not difficult to show that
iJy
:= Ax +A'x
iJx
:= 2Ax if A is symmetric.
We can of course differentiate the vector 2Ax with respect to x, by deflJ1ition
(4.1).
-a - (ay) == -(2Ax)
a.
ax ax ax
= 2A' = 2A (if A is symmetric).

The following table summarises a number of vector derivative formulae.


y ay
scalar or a vector ax
Ax A'
x)t A (4.4)
,
xx 2x
x~x Ax +A'x

4.3 THE CHAIN RULE FOR VECTORS


Let

x ~[~' [}j y and z

m
Using the definition (4.1), we can write

aZ l aZ l aZ l

aXl aX2 aXil


aZ 2 aZ 2 aZ2
G:)' aXj aX2 aXn
(4.5)

aZm aZm aZm


aX l aX2 aXil
Sec. 4.3] The Chain Rule for Vectors 55
Assume that
z = y(x)
so that
OZt
iJx,
= i OZt oYq
q=l oYq ox,
, :=

j :=
1,2, ... ,m
1,2, ... , II.

Then (4.5) becomes

3Yq oYq
~
iJZ1 0Z1

2: Yq
iJZ1

OXI 2: oYq iJX2 "J. oYq ax"


(::)' 2: 0Z2 oYq
iJYq oXl 2:
0Z2 iJYq
oYq OX2 2:
0Z2
oYq
aYq
ox"

iJZm iJYq '2.0Zm OYq 0Zm aYq


2. ilYq aXI aYq 0.\:2
... 2: aYq aXn

aZI aZI aZI aYl ay! °YI


aYI aYl ay, aXl i)x 2 aXn
aZ 2 aZ2 aZ2 aYl ah ah
=
aYI aYl ay, aXl aX2 OX'I

aZm aZm aZm ay, oy, ay,


,
, .
;

OYI OYl oy, aXI aX2 OXn

G:)' G~) (by (4.1))

= (a y az)'
ax oy

on transporting both sides, we finally obtain

oz oy az
--
oy = ax oy (4.6)
56 Introduction to Matrix Calculus [Ch.4
4.4 THE DERNATlVE OF SCALAR FUNCTIONS OF A MATRIX WITH
RESPECT TO THE MATRIX
Let X = [xII) be a matrix of order (m X n) and let
y = f(X)
be a scalar function of X.
The derivative of y with respect to X, denoted by
ay
ax
is defined as the following matrix of order (m X n)
ay 3y 3y
aXil aX 12 aX In
ay
ax
=
Cly Cly
aXll aX22
ay
ax1n = [:,J =
L ay
EII - (4.7)
1.1 aXil

ay ~ ay
aXml aXm 2 aXmn

where Eii is an elementary matrix of order (m X n).


Definition
When X = [Xi;} is a matrix of order (m X 11) and y = f(X) is a scalar function of
X, then af(X)/aX is known as a gradient matrix.
Example 4.3
Given the matrix X = [xli] of order (11 X n) ob tain ay lax when y = tr X.

Solution
Y = tr X = Xll + X22 + ... + xnn = tr X' (see 1.33) hence by (4.7)
ay
- 1
ax = n'

An important family of derivatives with respect to a matrix involves functions


of the determinant of a matrix, for example
y = IXI or y = lAX I .
We will consider a general case, say we have a matrix Y = [Yi/] whose
components are functions of a matrix X = [Xii]' that is
YI/ = Ii/(x)
where x = [xu xu··· xmnJ'.
Sec.4.4J The Derivative of Scalar Functions of a Matrix 57

We will determine

which will allow us to build up the matrix


a/YI
ax
Using the chain rule we can write
olYI
- - ==

where Yi/15 the cofactor of the elementYl/ln IY!. Since the cofactors Yilt Yi., ...
are Independent of the element YII' we have
olYI
- - ==
aYi!
It f ol1O\'/s that

alYI
(4.8)

Although we have achieved our objective in determining the above formula, ,i


it can be written in an alternate and useful form. ,"
With
and bil
oxr.\'
we can write (4.8) as

L "2.I ali b
I
lj

= LLal/ejbl/el
I I
"f,A1.'Bi . (by (1.23) and (1.24»
I

tr (AB') == tr (B'A) (by (1.43»

where A == [ali) and B == [bl;l.


123
380A

58 Introduction to Matrix Calculus [Ch.4 .

Assuming that Y is of order (k X k) let


Yll Yll Ylk
(4.9)

and since

we can write

oIYi = tr(OY'z) . (4.10)


ax,.. Xn

We use (4.10) to evaluate olYl/a1:l1. alYl/axl2, ... , a IYI/oxmll and then


use (4.7) to construct
olYl
ax
EXample 4.4
Given the matrix X = [xli) of order (2 X 2) evaluate aIXI/aX.
([) when all the components xII of X are independent
(ii) when X is a symmetric matrix.

Solution
(i) In the notation of (4.1 0), we have

so that ay/axn = En (for notation see (1.4».


As
Z = rXll Xl~.
LX21 X2~
we use the result of Example (1.4) to write (4.10) as

a/YI ,
- = (vee E,..) vee Z
ax/:\,
I
• J
to
y
i
Scc.4.4J The Derivative of Scalar Functions of a Matrix S9

So that, for example

ant!

= XI2 and so on.

Helice

alYI = alXI = fXlI .(\\~


ax ax LX21 X2J
IXI(X- I )' (Sec [18] p. 124).
(ii) This time

Y= ~II
x
XI~
I2 X 2 2
hence
alYI alYI
= Ell , - = EI2 +E 21 and so on.
Xll Xu

(See the introduction to Chapter 4 fur explanantion of the notation.)

'.
, I

It fOllO:;::'~ ~
JX12
alYl
JX21
[0 I I 01 ;,:J =
Xu
X +X
21 I2
= 2X
I2

X22 (Since X12 = X21 )


hence

The above results can be generalised to a matrix X of order en x IJ).


We obtain, in the symmetric matrix case

alXI
-ax = 2[X·j-diag{X·}.
I} /I
60 Introduction to Matrix Calculus [Ch.4

We defer the discussion of differentiating other scalar matrix functions to


Chapter 5.

4.5 THE DERNATNE OF A MATRIX WITH RESPECT TO ONE OF ITS


ELEMENTS AND CONVERSELY
In this section we will generalise the concepts discussed in the previous section.
We again consider a matrix
X ::: [Xi,] or order (m X /I) .
The derivative of the matrix X relative to one of its elements Xrs (say), is
obviously (see (3.1))
ax
- = Ers (4.11)
axrs
where Ers is the elementary matrix of ord~r (m X n) (the order of X) defined in
section 1. 2.
It follows Immediately that
ax'
- =E~. (4.12)
3xr.r

A more complicated situation arises when we consider a product of the form


Y = AXB (4.13)
where
x = [xii] is of order (m X n)
A ::: [aii] is or order (l X m)
B = (h i/ ] is of order (n X q)
and
Y = (Yij] is of ordeF (I X q) .

A and B are assumed independent of X.


Our aim is to find the rule for obtaining the derivatives
ay
and

where X rs Is a typical elellll:nt of X and YI/Is a typicul element of Y.


We will first obtain the (I,})th element Ylj In (4.13) as a function of the
elements of X.
We can achieve this objective in a number of different ways. For example,
we can use (2.13) to write
vecY = (B'<8lA)vecX.
Sec. 4.5] The Derivative of a Matrix 61

From this expression we see that Yij is the (scalar) product of the ith row of
[bljA: b2j A: ... : bTl; AJ and vec X.
so that

Yij = Ii
p=1 1=1
ail bp;Xlp (4.14)

From (4.14) we immediately obtain


OYi;
- = airbsj ( 4.15)
axrs
We can now write the expression for OYi// aX •

°Yi/
aXil aXJ2
°Yil °Yii
aXln i
aJ"'j
-=
ax
?!!L
aX'll
oYI/ 'OYII
(4.16) I
I
aX22 ilX 2n

°Yii aYii OYii

Using (4.15), we obtain


ilXml ilXm2 aXmn
I
°Yii
-=
ax
ail b Ij
ai2 b lj
ail b2j
a/2 b 21
ail bnj
ai2 bnj (4.17) I
1

We note that the matrix on the right hand side of (4.17) can be expressed
as (for notation see (1. 5)(1.13) (1.16) and (1.17))

= Aj.n./
= A'e/ei B'.
123
38'0:"

62 Introduction to Matrix Calculus [Ch.4


So that

OYi/ = A'E n'


oX II (4.18)

where Ell is an elementary matrix of order (l X q) tlte order of the matrix Y.


We also use (4.14) to obtain an expression for aYjox,s.

oY ~YI/J (r, s fixed, i, / variable 1 .;; i '" I, 1 .;; /..; q)


ax,s = U3xrs
that is
aYIl oYI2 aYlq
ox's ax,s ax,s
ay OY21 aY22
--:;: aYlq ,LE11-
aYil
ax,s ax,s = (4.19)
ax,s ax,s i.1 ax,s

aYII 0Y12 OYlq


ax,s ox's axrs

where E/j is an elementary matrix of order (l X q).


We again use (4.15) to write

aJ,bsl al,bSl al,bsq

Utax,s
a2,bsl

am,bsl
..
a2,bs2

am,bSl
a2,bsq

amrbsq

ai'
:;: al, [bsl bs2 ... bsq ]

a,n,

:::: A·rBs.' :::: Ae,e~B .


So that
a(AXB)
:::: AErsB (4.20)
ax,s

where Ers is an elementary matrix of order (m X n), the order of the matrix X
Sec.4.5J The Derivative of a Matrix 63

Example 4.5
Find the derivative aY/ox rs , given
Y == AX'B
where the order of the matrices A, X and B is such thallhc product on the right
hand side is defined.

Solution
l3y the method used above to obtain the derivative il/ilxr$ (AXB), we find
a
- - (AX'B) == AE~sB .
axrs '.

Before continuing with further examples we need a rule for determining the
derivative of a product of matrices.
Consider
Y == UV (4.21)

where U == [uij] is of order (m X Ii) and V == [ttl] is of order (/1 X l) and both
U and Vare [unctions of a matrix X.
We wish to determine

The (i,j)th element of (4.21) is


n
YI/ == L
p~1
u/ p !7>j (4.22)

hence

(4.23)

For fixed rand s, (4.23) is the (i,j)th element of the matrix ay/axrs of
order (m X /) the same as the order of the matrix Y.
On comparing both the terms on the right hand side of (4.23) with (4.22),
we can write

o(UV) au av
-v+u- (4.24)
aXrs axrs
as one would expect.
64 bltroduction to Matrix Calculus [Ch.4
of the
011 the other hand, when fixing (i,j), (4.23) is the (r,3)th element
matrix aYij/ax, which is of the same order as the matrix X, that is

aY/1 =
ax
p=1
*au/pv + ~ u aVpl
L.., ax pi L.., /p ax
p=1
(4.25)

s.
We will make use of the result (4.24) in some of the subsequ ent example

Example 4.6
Let X = lX,sJ be a non-singular matrix. Find the derivative ay/ax,s, given
(i) y = AX-IB, and
(ii) y= X)tX

Solution
(i) Using (4.24) to differentiate
yy-\ = I,
we obtain
ay ay-I
-y-I+ y _ = 0,
ax,s ax,s
hence
ay ay-I
-y-y -.
ax,s ax,s
But by (4.20)
ay-l

so that
ay

Oi) Using (4.24), we obtain

ay ax' , a(AX)
--r- = -AX+X_
ax,s axrs ax,s
(by (4.12) and (4.20» .

for all i, j
Both (4.18) and (4.20) were derived from (4.15) which is valid
r, $, defined by the orders of the matrices involved .
Sec.4.5J The Derivative of a Matrix 65

TIle First Transformatioll Principle


It follows tilat (4.18) is a transformation of (4.20) and conversely. To obtain
(4.18) from (4.20) we replace A by A', B by B' and Ers by Eii (careful,E,s and
Ell may be of different orders).
The interesting point is that although (4.18) and (4.20) were derived for
constant matrices A and B, the above transformation is indepcndcnt of the
status of the matrices and Is valid even when A and n arc functions of X.

Example 4. 7
Find the derivative of oYii/ax, given
(i) Y=AX'n,
(ij) Y=AX-1n, and
(iii) Y = x'Ax
where X = [xli] is a nonsingular matrix.

Sollllion
(i) Let W = X', tlien .'
ay
Y = AWn so that by (4.20) - =AE,sB
awrs
hence
.,
aYil = Alg.B ' .
aw IJ

But ;

aYij = aYlj = (ayi/)'


ax aw' aw
hence r~ .;

ay··
-.2!. = BE'A
II
l
ax IJ
j.
~,
~.
(ii) From Example 4.6(i) ~ .
ay
F'.
;
I,.
t.
- = -AX-1t."',sx-1n.
axrs

letA! =AX- 1 a11dB l =X-1B, then


ilY

so that
66 Introduction to Matrix Calculus (eh.4

(iii) From Example 4.6(ii)


ay / / ,
- =: E,sAX + X AE,s
ax,s
letA, == I,B, ::: AX,A 2 = x'A and B2 = I, then
ax
- - =: A1ErsBI
/ + A 2 Ers ih .
ilxrs
'fhe seal,)l\u tcrlll on the right haml side is in st~lH.lard form. The first term is in
the form of the solution to Example 4.5 for which the derivative ilYj)ilX was
found in (i) above, hence

= AXEij + A'XEjj •
It is interesting to compare this last result with the example in section 4.2
when we considered the scalar Y == x/Ax.
In tlus special case when the matrix X has only one column, the elementary
matrix which is of the same order as Y, becomes
Ejj ::: E:j = I .
Hence
aYij ily /
- == - == Ax +Ax
ax ax
which is the result obtained in section 4.2 (see (4.4».
Conversely using the above techniques we can also obtain the derivatives of
the matrix equivalents of the other equations in the table (4.4).

Example 4.8
Find

when
(i) Y == AX, and
(li) Y == X/X.

Solution
(i) With B == I, apply (4.20)
aY
- = AE'3'
ax,s
Sec,4.6J The Derivatives of the Powers of a MatrLx 67

The transformation principle results in


iJYlj ,
ax = A Eij'

Oi) This is a speciul case of Example 4.6 (ii) in which A ::: I,


We IlUve found the solution

-(lxilY = E,.X
,
+ XE" rs
rs
and (Solution to Example 4.7 (iii))

aYi,
ax = XE/
J
+ x£..
1/'

4.6 THE DERIVATIVES OF THE POWERS OF A MATRIX "

,
:~
Our aim ill this section is to obtain the rules for determining
ay
and
aXrs ax
when
n
Y = X ,

Using (4.24) when U = V = X so that


Y = Xl
we immediately obtain
ar \
- ::: ErsX + Xn~s ,:
axrs I I
fL

t·;1,
and, applying the first transformation principle, ~;

ay·· Ii
I

-.-!!. = E .. X' + X'E ... ~


ax 1/ 1/ ~l
\.,
It is instructive to repea t this exercise wi th ~ ..
~.
U = X2 and V = X
so that

We obtain

and
123
3S'OA

(Ch.4
68 Introduction to. Matrix Calculus

Marc generally, it can be proved by induClioil, that [or

Y == X"
ay "-I (4.26)
_ == ""
L..., XkE,s X"-k-I
ax,s k=O

O
where by defmition X == I, and

aYi/' .
- == 2:
>1-1
(X')"Eij(X,)"-k-1
(4.27)
ax k=1

Example 4.9
Using the result (4.26), obtain ay/ax,s when
y "" X-n

Solution
Using (4.24) on both sides of
X-fiX" =I
we find
acx-")
___ XII aeX")
+ X- II _ - = 0
ax,s ax,s
so that
acx-")
_
a(X")
== -X-" --X-".
ax,s ax,s
Now making use of (4.26), we conclude that

acX-")
_ _ ==
ax,s
~I-I
-X-II" XkE,sX"-k-1 X-II.
L...,
k=O
J
Problems for Chapter 4

J
~
(1) Given
X = IXl! Xl1 Xl~, y == 1
e2X
X-I

[.:11 X11 X1J 2Xl sin x

ay ay
and
ax ax
Sec.4.6 J The Derivatives of the Power of a Matrix 69

(2) Given·

x GillX xl and
Lcosx ej
evaluate
alXI
ax
by
(a) a direct method
(b) use of a derivative formula.

(3) Given

X = XII XI2 XI~ and y x'X,


[ XZI X22 X2J

use a direct me thad to evaluate


IlY and (b) oy 13
(a)-
ilx.! ax
(4) Obtain expressi ons for
ay aYij
and
axrs ax
when
(a) Y = XAX and (b) Y = X~X'.

(5) Obtain an expressi on for a\AXEI/ax rs . It is assumed AXE is non-sing


ular.

(6) Evaluate ily/axrs when

(a) Y = X(X')2 and (b) Y = (X')2X.


CHAPTERS

Fllrther Development of Matrix


Calculus including an Application
of Kronecker Products

5.1 INTRODUCTION
In Chapter 4 we discussed rules for determining the derivatives of a vector and
then the derivatives of a matrix.
But it will be remembered that when Y is a matrix, then vec Y is a vector.
This fact, together with the closely related Kronecker product techniques
discussed in Chapter 2 will now be exploited to derive some interesting results.
Also we explore further the derivatives of some scalar functions with respect
to a matrL'< first considered in the previous chapter.

5.2 DERIVATIVES OF MATRICES AND KRONECKER PRODUCTS


In the previous chapter we have found aYijlaX when

Y == AXB (5.1 )
where Y == [Ylil. A == [alj], X == [XII) and B == [bill.
We now obtain (a vec y)/(a vec X) for (5.1). We can write (5.1) as

y == Px (5.2)
where y == vec Y, x == vec X and P == B' ® A .
By (4.1), (4.4) and (2.10)

-ay == p' == ('rO.)'


B ~A == B ®A , . (5.3)
ax
TIle corresponding result for the equation

Y == AXE (S.4)
is not so simple.
[Sec. 5.2] Derivatives of Matrices and Kronecker Products
71
The problell1 is that when we write (5.4) in the form of(S.2),wehave this
time
y >:: pz
(5.5)
where z =: vee X'
We can lind (see (2.25» a permutation matrix U such that
vecX' = UvecX (5.6)
in which case (5.5) becomes

y == PVx
so that
~ =: (PU)' U'{E ®A'). (5.7)
ax
It is convenient to write

V'(E ® A') = (B ® A')(n) (5.8)

v' is seen to premultiply the matrix (ll ® A'). Its effect is therefore to rearrange
the rows of (ll ®A').
In fact the first and every subsequent 11th row of (B ® A') Corm the first
consecutive m rows of (ll ® A')(n)' The second and every subsequent Ilth row
form the next m consecutive rows of (B ® A')(Il) and so on.
A special case of this notation is for II=: J, then

(B ® A ')(1) =: B ® A' . (5.9)

Now, returning to (5.5), we obtain, by comparison with (5.3)

ay
- == (5.10)
ax
Example 5.1
Obtain (iJ vec Y)/(a vee X), given X == [xij] of order (m X n), when

(i) Y == AX, (ii) Y::= XA, (iii) Y = AX' and (iv) Y = XA .

Solution
Let y == vec Y and x = vec X.
(i) Use (5.3)withB==I
ily
= I®A'.
ax
3S'OA

Further Development of Matrix Calculus [eh.5

(ii) Use (5.3)


ay
- = A ®J .
ax
(iii) Use (5.10)
ay I
ax = (I ® A )(n)
(iv) Use (5.10)
ay
- = (A ®/)(n) .
ax
5.3 THE DETERMINATION OF (a vee X)/(a vee Y) FOR MORE
COMPLICATED EQUATIONS
In this section we wish to determine the derivative (il vee Y)/(il vee X) when, for
example,
Y = x'Ax (5.11)
when) X is of order (m X n).
Since Y is a matrix of order (/I X n), it follows that vec Y and vec X are
vectors of order nn and nm respectively.
With the usual notation
Y = [Yij] , X = [Xij]
we have, by definition (4.1),

ayu aY21 aYnn


axu aXil aXil
a vee Y aY11 aY21 aYnn
--= (5.12)
ovecX OXIJ aX21 aX21

aY11 aY21 aYnll


OXmn OXmn OXmn

Bu t by definition (4.19),
ay),
tile first rOW oflhc matrix (5.12) is vec- ,
( aXil

ilY)' ,etc.
the second row of the matrix (5.12) is vee -
( aX 21
Sec. 5.3] The Determination of (0 vee x)/(o vee Y) 73
We can therefor!: wrlte (5.12) us

oveeY [:
- - ::: vee -
oy.: vee -oy.: ... :. vee -ay]'
- (5.13)
oveeX aXil QX21 aXmn

We now use the solution to Example (4.6) where we had established that

oY
when Y = x'Ax, then = E:sAX + X'AErs . (5.14)
axrs
It follows that
oY " ,
vee _.- = vee ErsAX + vee X'AEts
ax rs
= (x'A' ® I) vee E:s + (/ ® x'A) vee Ers (5.15)

(using (2.13» .
Substituting (5.15) into (5.13) we obtain

avee Y
il veeX
+ [(J®X'A)[vecE l1 : veeEzI : ••. : vecEmnJ]'
[vec E{l: vec £;1: ... :vec E~n]' (AX ® I)
+ [vec Ell: vec E21 : vee Emn]' (1 ® A'X) (5.16)
(by (2.1 D».
The matrix

is the unit matrix 1 of order (mn X mil).


Using (2.23) we can write (5.16) as

ovee Y
U'(AX ®I) + (1 ®A'X).
aveeX
111 a t is
dRCY ,
-- := (AX ® 1)(11) + (I ® A X) . (5.17)
ilvccX

In thc above calculations we have u$cd the derivative af/ax rs to ob tain ca vee f)1
(0 vee X).
123

74 Further Development of Matrix Calculus [Ch.5

The Second Trans/onnation Principle-


Only slight modifications are needed to generalise the above calculations and
show that whenever

where A, B, C and D may be functions of X, then

avec Y I I
- - ;:: B ® A + (D ® C )(11) (S.18)
ovecX
We will refer to the above result as the second transformation principle.

Example 5.2
Find
avec Y
ovecX
when
(i) Y = X'X (ii) Y = AX -I B .

Solution
Let y = vec Y and x = vec X .
(i) From Example 4.8
OY I I
- - = ErsX + XErs
ax rs
Now use the second transformation principle, to obtain

ay
ax = I ® X + (X ® /)(n)
(li) From Example 4.6
ay
- = -AX-I ErsX- 1B
axrs
hence

Hopefully, using the above results for matrices, we should be able to rediscover
results for the derivatives of vectors considered in Chapter 4.
Sec. SA] More on Derivatives of Scalar Functions 7S

For example iet X be a column vector x then


Y = XiX becomes y == XiX (y is a scalar).
The above result for ay lax becomes
av
..:.... == (I®x)+(x®J)(l)'
ax
But the unit vectors involved are of order (n X 1) which, fur the one column
vector X is (I X I). Hence
ay
l®,x+x®l (use (5.9»)
ax
x + x == 2x
which is the result found in (4.4).

5.4 MORE ON DERIVATIVES OF SCALAR FUNCTIONS WITH


RESPECT TO A MATRIX

In section 4.4 we derived a formula, (4.10), which is useful when evaluating


aIY I/ax for a large class of scalar matrix functions defined by Y.

Example 5.3
Evaluate the derivatives
alog IXI illXl r
(i) ax and (ii) - -
ax
Solution
0) We have
() I al)(1
- (log 1)(1) == - - -
axrs 1)( I axrs
From Example 4.4,

alXI = IXI(X-I), (non-symmetric case).


ax
Hence
a log IXI (X- t )' .
ax
(ii) alXI' = rlXl r- 1 alXI
axrs axrs
123
3S'OA

Further Development of Matrix Calculus [Ch.5


76

Hence
illXl'
--=
ax
function s
Traces of matrices form an importa nt class of scalar matrix
of applicat ions, particula rly in statistics in the formu-
covering a wide range
.
lation of least squares and various optimisa tion problem s.
product s
Having discussed the evaluation of the derivative ay/ax,s for various
apply these results to the evaluati on of the derivativ e
of matrices , we can now

a(tr Y)
ax
We first note that

(5.19)

a matrix
where the bracket on the right hand side of (5.19) denotes , (as usual)
of the same order as X, defined by its (r,s)th element .
definitio n
As a consequ ence of (5 .19) or perhaps more clearly seen from the
(4.7), we note that on transpos ing X, we have

a(tr Y) _ (a-
--- - (tr -
Y))' (5.20)
ax' ax
considering
Another , and possibly an obvious property of a trace is found when
the definitio n of aY/ ax,s
(see (4.19)).
Assuming that Y = [Yij] is of order (n X n)

a y _ aYII aY22 aYnn


tr- = -+-+ ... + -
ax,s ax,.s ax,s ax,s
a
- (Yll + Y22 + ... + Ynn) .
ax,s
Hence,
ay a(tr Y) (5.21)
tr- = ---
ax,s ax,s

Example 5.4
Evaluate
a (r(AX)
ax
Sec. 5.4] More 011 Derivatives of Scalar Functions 77

Solutioll
u tr (AX) u(AX)
:: tf by (5.21)
ox's ax,s
tr (AEr.) by Example (4.8)
tr(E:3 A') since tr Y = tr Y'
(vee Er.)' (vec A ') by Example (1.4).
Hence,
a tr (AX) = A'.
ax
trace of
As we found in the previous chapter we can use the derivative of the
the derivative of the trace of a differen t product .
one product to obtain

Example 5.5
Evaluate
a tf (AX')
ax
Solution
From the previous rcsul t
a t1' (BX) _ utr (X'B') -B
_ '
.
----
ax ax

Let A' = B in the above equation, it follows that


atr(X'A ) atr(A'X )
---= =A.
ax ax
be found
The derivatives of traces of more complicated matrix products can
similarly.

Example 5.6
Evaluate
a(tr Y)
ay
when
(i) Y=XA X
(li) Y=XA XB

SO/utIUI/
It is obvious that (i) follows from Oi) when B = I.
78 Further Development of Matrix. Calculus [Ch.5

(ii) Y = XIB where XI = X~x.


aY ax!
-- B
ax,s ax,s
= E;sAXB +X~E,.rB (by Example 4.6)
Hence,
Oy), ,
(OX,S = tr (E,sAXB) + tr (X 'AE,sD)
tr -

= tr (E;sAXB) + tr (E;sA'XB')
= (vec Era)' vec (AXB) + (vee E,a)' vee (A'XB') .
It follows that
a(tr Y)
- - == AXB+A'xn'.
ax
(i) Let B == I in the above equation, we obtain
3(tr Y)
- - = AX+A'X = (A+A')X.
ax

5.5 THE MATRIX DIFFERENTIAL


For a scalar functionf(x) where x = [XI X2 '" x n ]', the differential dfis defined
as
n of
df = L-dXI. (5.23)
1=1 aXI

Corresponding to this definition we define the matrix differential dX for the


matrix X = [xii] of order (m X n) to be

(5.24)

The following two results follow immediately:


d(aX) = a(dX) (where a is a scalar) (5.25)
d(X+Y) = dX+dY. (5.26)
Consider now X = [Xij] of order (m X /1) and Y = [Ylj] of order (n X p).

XY = [~XijYjk] ,
J
ff{1

Sec. 5.5] The Matrix Differential 79


hence

It follows that
d(XY) = (dX)Y + X(dY) . (5.27)
Example 5.7
Given X = [xlf] a nonsingular matrix, evaluate
(0 dlXI, (il) d(X'-I) -;-

Solution
(i) By (5.23)

dlXI =
"" -alXI (dx,,)
L....
/,/ aXij

= Li,j Xjj(dXij)
since (aIXI)/caXij) = Xii' the cofactor OfXij in IXI.
By an argument similar to the one used in section 4.4, we can write
dlXI = tr {Z'(dX)} (compare with (4.10)) '.
where Z = [X,/).
Since Z I = IX IX -I , we can write
dlXI = IXI tr {X-I (dX)} .

(ii) Since
X-IX = I
we use (5.27) to write
d(X-I)X + X-I(dX) = O.
Hence
d(X-I) = -X-I(dX)X- 1
(compare with Example 4.6).
Notice that if X is a symmetric matrix, then
X = X'
and
(dX)' = dX (5.28)
80 Further Development of Matrix Calculus [Ch. 5]

Problems for Chapter 5


(J) Consider

A = [all a12] , x = [XU X12] and Y == AX'.


a21 ~2 Xli Xi2

Use a direct method to evaluate

avec Y
a vee X
and verify (5.1 0).

(2) Obtain
avec Y
i'lvecX '
when
(i) Y = AX'B and (ii) Y = llMi Xl .

(3) Find expressions for


3 tr Y
3X
when
(a) Y = AXB, (b) Y = X 2 and (c) Y == XX'.

(4) Evaluate
a tr Y
ax
when
(a) Y == X-I, (b) Y = AX-1B, (c) Y == Xn and (d) Y == eX.

(5) (a) Use the direct method to obtain expressions for the matrix differential
dY when

(i) Y == AX, (ii) Y == X'X and (iii) Y == X2 .

(b) Find dY when


Y = AXBX.
ClIAPTER 6

The Derivative of a Matrix with


respect to a Matrix

6.1 INTRODUCTION
In the previous two chapters we have defined the derivative of a matrix with
respect to a scalar and the aerivative of a scalar with respect to a matrix. We will
now generalise the dcfinitiftlls to include the derivative of a matrix with respect
to a matrix. The author 1.1u:Cadoptcd tile definition suggested by Vetter [31J,
although other definitions also'give rise to some useful results,

6.2 THE DEFINITIONS AND SOME RESULTS


Let Y == [Ylj] be a matrix of order (p X q). We have defined (see (4.19» the
derivative of Y with respect to a scalar Xm it is the matfix [aYlllaxrs ] of order
(p X q).
Let X == [xrs] be a matrix of order (m X 11) we generalise (4.19) and defme
the derivative of Y with respect to X, denoted by
ay
ax
as the partitioned matrix whose (r,s)th partition is
ay
aXrs
in other words
ay ay ay
aXil aXl2 aXln
ilY
-=
ax
ilY ()Y ()Y
L ilY
Ers®- (6.1)
aX21 aX22 aX2n r, S
axrs

ay ay ay
aXII1 1 aXm 2 aXmn
123

82 The Derivative of a Matrix with Respect to a Matrix [Ch.6

TIle right hand side of (6.1) following from the definitions (1.4) and (2.1) where
Ers is of order (m X n), the order of the matrix X.
l! is seen that aY/ax is a matrix of order (ml' X IIq).

Example 6.1
Consider
y= [;u X1:2 X22
sin (Xli + X12)
XIi
e x"
log (XII
J
:t- X21)
and

X= [Xli XI2J
X21 X22 I

Evaluate
oY
ax
Solution

~ x,LJ ~"x" ~II e


ay aY XIIX

']
aX21 aX22

Xllx
XI2 X2 2 X 22 e " XI 1 X22 0
ay
-= cos (XII + X12) cos (XI1 + XI2) 0
ax Xu + X21
0 0 Xu XI2 XI1 eXIl Xu
I
0 0 0
XII + X21
Example 6.2
Given the matrix X = [Xjj] of order (m X n), evaluate aX(dX when
(i) All elements of X are independent
(li) X is a symmetric matrix (of course in this case m = n).
Sec. 6.2} The Definitions and Some Results 83

Solution
(i) By (6.1)
ax
-=
ax 2: Ers ® Ers
r, s
= D (see (2.26))

(ii) ax
- = 1.:...... + l;'sr for r,-ps
oXrs
ax
-=1:-"',.,. for r=s
axrs
We cun write the above as;

Hence,
ax
-=
oX 2: Ers ® b....s + L EN ® b'sr -
'IS ,,s
Ors L
,,s
Esr ® Err

D+ U-"EErr®Err (see (2.24) and (2.26))

Example 6.3
Evaluate and write out in full ax' lax given

Solution
By (6.1) we have
ax'
- = Ers ® E;s
ax
u.
Hence
0 0 Q 0 0
0 0 1 0 0 0
ax' 0 0 0 0 1 0
==
ax 0 0 0 0 0
0 0 0 0 0
0 0 0 0 0
3801.

84 The Derivative of a Matrix wilh Re~\lect to 11 Matrix [Ch.6

From the definition (6.1) we obtain

by (2.10)

from (4.19).

It follows that
3Y'
(:;)' = (6.2)
ax'

'6.3 PRODUCT RULES FOR MATRICES


We shall first obtain a rule for the derivative of a product of matrices with
respect to a matrix, that is to find an expression for

il(XY)
az
where the order of the matrices arc as indicated

X(m X n), Yen X v), Z(p X q) .


By (4.24) we write
a(xY) ax ay
-y+x-
ilz,s az,s
where Z = [z,sJ.
If E,s is an elementary matrix of order (p X q), we make use of (6.1) to
write

a(xY) ax
- - = 2E,s® - Y + XilY]
[ ilz,s
-
ilZ ilz,s
" S
Sec.6.3J Product Rules for Matrices !IS

(where 1q ano lp arc unit matrices of order (q X q) and (p X p) respectively)

= 2: (t~s ® az,.
~s
ax )(Jq ® y) + 2: (Ip ® (£r$ ® a~y)
~s
X)
a~r:
(by 2.11)

finally, by (6.1)

a(XY) = ax (Iq ® Y) + (lp ® X) ay (6.3)


az az az
Example 6.4
Find an expression for

ax- I
ax
Solution
Using (6.3) on
XX- I =1,
we obtain

I a(xx- 1) ax ax- 1
_ _-c. = -(1®x- I )+ (J®X) - = 0
.j
hence
ax ax ax
ax-I ax
- = -(1®xt l -(1 (8) X-I)
ax ax
= -(1(8) X-I) 0(1 (8) X-I)

(by Example 6.2 and (2.12».

Next we determine a rule for the derivative of a Kronecker product of


matrices with respect to a matrix, that is an expression for

o(X® Y)
az
TIle order of the matrix Y is not now restricted, we will consider that it is
(u X v). On representing X ® Y by it (i,j)th partition [xijYj (i = 1, 2, ... , m,
k = 1,2, .. " n), we can write
380A

86 The Derivative of a Matrix with Respect to a Matrix [ell. 6

where (r, s) are fixed


= [_axllyl+ r _i.lYJ
az,s J t: X1J
az,:
ax aY
= -®y +x® -
az,: azrs
Hence by (6.1)
a(x® ax ay
az
y)
= 2: E rs ® -
az rs
® Y+ LE rs ® X ® -
r, S T,S az TS

where Ers is of order (p X q)

= -ax ® y +
az L. E rs ® (X ® -ay) •
azr :
To·

The summat ion on the right hand side is not X ® ay/az as may
appear at first
sight. nevertheless it can be put into a more conveni ent form, as
a product of
matrices. To achieve this aim we make repeated use of(2.8) and (2.ll)

by (2.11)

Hence
(6.4)

where U and U. are permuta tion matrices of orders (mu X mu)


1 and (nv X nv)
repectively.
We illustrate the usc of equation (6.4) with a simple example.

Example 6.5
A = [aii] and X = [Xij] are matrices, each of order (2 X 2). Use
(i) Equatio n (6.4), and
(ii) a direct method to evaluate
a(A ® X)
ax
Sec.6.3J Product Rules for Matrices 87

Solution
(i) In this example (6.4) becomes

il(A eEl X) rax ,l


--ax- = [/ eEl Ud Lax eEl A [/ eEl U.J
J
where / is the unit matrix of order (2 X 2) and

~ ~
0 0
0 1 ',.
U1 U2 '2:.Ers eEl h-';s .. ~

1 0
0 0
Since

l~ ~
0 0
ax 0 0
0=
ax 0 0
0 0

only a simple calculation is necessary to obtain the result. It is found that

all 0 al2 0 0 all 0 al2


0 0 0 0 0 0 0 0
a21 0 all 0 0 a21 0 all
aeA eEl X) 0 0 0 0 0 0 0 0
ax 0 0 0 0 0 0 0 0
all 0 an 0 0 all 0 all
0 0 0 0 0 0 0 0
a21 0 al2 0 0 a21 0 a22

(il) We evaluate

~BXB
all Xl2 al2 x U

Y = A @X =
all X21

a2l xli
all X22

allx I 2
al2 X21

a22 XII
'''X'j
al2 X22

an XI2

a21 x 21 a21 x n a2l x 21 a22 x 22


,;('.
and then make use of (6.1) to obtain the above result.
L

i
I
! .,.

t I,.

~. ·
38'0"

88 The Derivative of a Matrix with Respect to a Matrix [Ch.6

6.4 THE CHAIN RULE FOR THE DERIVATIVE OF A MATRIX WITH


RESPECT TO A MATRIX
We wish to obtain an expression for
az
ax
where the matrix Z is a matrix function of a matrix X, that Is
z = y(x)
where
X == [xli] is of order (m X n)
Y == [Yij] is of order (u X v)
Z == [zi/l is of order (p X q)
By definition in (6.1)
az az
-
ax
= 2: E
',s
rs ® -
rs ax
r = 1,2, ... ,m
s = 1,2, ... , n

where Ers is an elementary matrix of order (m X n),

i = 1,1, ... , u
j = 1,2, ... , q

where Ell is of order (p X q)


As in section 4,3, we use the chain rule to write

aZ11
-=
L aZ11 . aYail 0: = 1,2, ... , u
ax,s a, il aY<>jl ax,s {1 = 1, 2, ... , v
Hence
az
-=
ax 2:" s E,s ® [2: Eii 2 -aZil- . ax,s
..
-
il aY"'il
y
a ail ]
til a,

22 E,s- ®
aY"'il 2E aZij
lj - - (by 2.5)
ax,s I,J' aY"'il
',S "',il

aY"'il az
=
2
"',il
-0-
ax aY"il
(by (4.7) and (4.19» .
Sec. 6.4] The Chain Rule for the Derivative of a Mattix 89

If fro and fp are unit matrices of orders (n X n) and (p X p) respectively. we can


write the above as
()Z
-=
ax
Hence, by (2.11)
az
(6.5)
ax
Equation (6.5) can be written in a more convenient form, avoiding the summation,
if we define an appropriate notation, a generalisation of the previous one.
Since

[~~: ~~:
'" YIV

Y = ..• Y2"

Yul Yu2 YuvJ


than (vee Y)' = [YlI Y21 '" Yuv]'
We will write the partitioned matrix

[
aYlI ® 1 :- aYll 01 : ... ayuv ®
ax P - ax p. ax
jlpJ
as
a[YlI hi'" Yuvl
--'--'-'--a-x----.:...;..c. ® Ip
or as
a(vee Y)'
®Ip
ax
Similarly. we write the partitioned matrix

j 0-
az
n aYII

j
n
0-
aY21
az
as
~
1 0az
--
n avec Y
J
90 The Derivative of a Matrix with Respect to a Matrix [Ch.6

We can write the sum (6.5) in the following order

az = [aYII CiS> Ipl [In 0 az ]+ [aY21 0 I) az J+ ...


rIn 0 Yll
ax ax j. aYll ax j L
y
+ [a 0 ~] .
r1nLoy""
u
" 0 JpJ
ax
We can write this as a (partitioned) matrix product
az
10-
n aYll

ax
y
az = [a_ll®I
-. oy ay
ax P :• ... •:~®1
ax p.:--2.!®! ax P
J ! ®-
n .
az
aY21

Finally, using the notations defined above, we have

az = [a [vec Y)' 01J 1 0 ~J


ax az P Ln avec Y
r (6.6)

We consider a simple example to illustrate the application of the above formula.


The example can also be solved by evaluating the matrix Z in terms of the com-
ponents of the matrix X and then applying the definition in (6.1).

Example 6.6
Given the matrix A = [aij] and X = [xii] both of order (2 X 2), evaluate
aZjaX
where Z = Y'Y and Y:= AX.
(i) Using (6.6)
(ii) Using a direct method.
Solution
(I) For convenience write (6.6) as
az
- = QR
ax
where
Q = [a(vec Y]' J and R= [In®~J.
ax 01p ' avec Y
Sec. 6.4] The Chain Rule for the Derivative of a Matrix 91

From Example 4.8 we know that

3Yli = A'E.
ax 1/

so that Q can now be easily evaluated,

["o 0 00i '" 0 0


all 0 0 1 0
01 0 0 all 0
1
a21 0 01 0 0 0 all
0 0
0 0 0
a'll

0]
a:ll
Q= 1 1

~21
a1'2 0 0 0 1 a22 0 0 01 0 0 all 0 0 0 all
1 1
o all 0 0 1 0
1
a22 0 01 0 0 0
1
a l2 0 0 0 .

Also in Example 4.8 we have found

az = ErsY+ Y E
-- I I
rs
aYrs
we can now evaluate R

2Yl1 Y12 0 0
Y12 0 o o
o o 2Yll Y12
o o Y12 0
2121 121 0 0
Y22 0 0 0
o o
o o
R
o Yl1 0 0
Yll 2Y12 0 0
o o o Yl1
o o Yll 2Y12
-0- - --Y;; --0- - - -0--
121 2Y22 0 0
o 0 0 121
o o
3
3S'OA

92 The Derivative of a Matrix with Respect to a Matrix [Ch.

The product of Q and R is the derivative we have been asked to evaluate

allYn
o
+ allYll
aUYlI + a2lY2l
2allYI2+ 2a 21Y2l
1
o allYn + a22Y:U

a12Yl! + a22YlI 2a 12Y12 + 2anY22

(ii) 13y a simple .extension of the result of Example 4. 6(b) we find that when
Z = X/A/AX
az
ax rs
= E:sA'AX + X'A'AErs

where Y=AX.
By (6.1) and (2.11)

az
ax == 2: (Ers ® E:s) (/ ® A'y) + 2:
',3 r,s
(I ® Y'Z) (Ers ® Ey,) .

Since the matrices involved are all of order (2 X 2)

[ ~l
0 0

' 0 0
LErs®Ers = ~ 1 0
0 0
and

[~ ~l
0 0
0 0
LErs®Ers =
0 0
0 0

On substitution and multiplying out in the above expression for az/ax, we obtain
the same matrix as in (i).

Problems for Chapter 6


(1) Evaluate ay/ax given
COS (X12 -.t- X22) X ll X 2 1]
Y= [
and
e XII XI) Xl2 X 22
6] Problems 93

(2)
Xli X21]
The elements of the matrix X = XI2 X22
[
Xu X23
are all independent. Use a direct method to evaluate ax/ax.

(3) Given a non-singular matrix X = [XII X12]


X:21 ."(2;2

use a direct method to obtain

ax-I
ax
and verify the solution to Example 6.4.

(4) The matrices A = [alf] and X = [xi;1 arc both of order (2 X 2), X is non·
singular. Use a direct method to evaluate
123

CHAPTER 7

Some Applications of Matrix


Calculus

7.1 INTRODUCTION
As in Chapter 3, where a number of applications of the Kronecker product were
considered, in this chapter a number of applications of matrix calculus are
discussed. The applications have been selected from a number considered in the
published literature, as indicated in the Bibliography at the end of this book.
These problems were originally intended for the expert, but by expansion
and simplification it is hoped that they will now be appreciated by the general
reader.

7.2 THE PROBLEMS OF LEAST SQUARES AND CONSTRAINED


OPTIMISATION IN SCALAR VARIABLES
In this section we consider, very briefly, the Method of Least Squares to obtain
a curve or a line of 'best fit', and the Method of Lagrange Multipliers to obtain
an extremum of a function subject to constraints.
For the least squares method we consider a set of data
(Xi,Yi) i = 1,2, ... , n (7.1)
and a relationship, usually a polynomial function
Y = [(x). (7.2)
For each Xi, we evaluate [(Xt) and the residual or the deviation
et = Yt - [(Xt) . (7.3)
The method depends on choosing the unknown parameters, the polynomial
coefficients when [(x) is a polynomial, so that the sum of the squares of the
residuals is a minimum, that is

S = *
Lei2
1=1
= L~ (YI -
1=1
[(XI)) 2 (7.4)

is a minimum.
[Sec. 7.2] The Problems of Least Square and Constrained Optimisation 9S

In particular. when/ex) Is a linear function


y = ao + alx
S(ao.al) is a minimum when
as as
- = 0 =- (7.5)
aao aal
These two equations. known as normal equations. determine the two unknown
parameters ao and al which specify the line of 'best fit' according to the principle
of least squares.
For the second method we wish',to determine the extremum of a continuously
differentiable function
[(XI, Xl. "', xn ) (7,6)
whose Ii variables are contraincd by 1/1 equations of the form
g,(Xi> Xl, .... x n ) = O. 1= 1.2 .... ,TIl (7.7)
The method of Lagrange Multipliers depends on defining an augmented function
ttl

f'" = / + 2:
1;1
/Jigi (7.8)

where the III are known as Lagrange multipliers.


The extreme of lex) is determined by solving the system of the (m + 11)
equations
af*
= 0 r = 1, 2, ... , 11
ax, (7.9)
gj = 0 i = 1,2, '" ,m

for the m parameters Ill, 1l2, ... , Ilm and the n variables x determining the
extremum.

Example 7.1
Given a matrix A = [ali] of order (2 X 2) determine a symmetric matrix
X = [xli] which is a best approximation to A by the criterion of least squares.

Solution
Corresponding to (7.3) we have

E = A-X
where E = [eli] and elj = ali - Xii'
3S'Oi\

96 Some Applications of Matrix Calculus [Ch.7

The criterion of least squares for this example is to minimise

S = Le~ = L(ali -XI/)2


1,/

which is the equivalent of (7.6) above.


The constraint equation is
xll -Xli = 0
and the augmented function Is
[of'. = r.(al/ -x,,)2 + P(Xl'l -X21) =: 0

-
ar = -2(J u - Xli) = 0
aXil

ar
- = -2 (a22 -X22) = 0
aX22

This system of 5 equations (including the constraint) leads to the solution

Hence

x= [a.:: a" a" : a"J


2 a22

i(A + A')

7.3 PROBLEM 1- MATRIX CALCULUS APPROACH TO THE PROBLEMS


OF LEAST SQUARES AND CONSTRAINED OPTIMISATION
If we can express the residuals in the form of a matrix E, as in Example 7.1, then
the sum of the residuals squared is
S=trE'E. (7.10)
Sec. 7.3] Problem 1 97

The criterion of the least squares method is to minimise (7.1 0) with respect to
the parameters involved.
The constrained optimisation problem then takes the form of fmding the
matrix X such that the scalar matrix function
s = I(X)
is minimised subject to conlraints on X in the form of
. G(X) = 0 (7.11)
where G = [elf] is a matrix of order (s X t) where sand t are dependent on the
number of constraints gil inv~lvcd.
As for the scalar case, we usc Lagrange multipliers to forlll an augmented
matrix function ["(X).
Each constraint gil is associated with a parameter (Lagrange multiplier)

Since
kJ..lijgij = tr U'G
where
U = [J..II/]
we can write the augmented scalar matrix function as
f*(X) = tr E'E + tr U'G (7.12)
which is the equivalent to (7.8). To find the optimal X, we must solve the
system of equations

a/*
- = o. (7.13)
ax
Problem
Given a non-singular matrix A = [all] of order (n X n) determine a matrix
X = [xII] which is a least squares approximation to A
(i) when X is a symmetric matrix
(ii) when X is an orthogonal matrix.

Solution
(i) The problem was solved in Example 7.1 when A and X are of order (2 X 2).
With the terminology defined above, we write
E = A-X
G(X) = X-X' = 0
so that G and hence U are both of order (n X n).
123

98 Some Applications of Matrix Calculus [Ch.7


Equation (7.12) becomes
l'" = tr [A'-· X'] (A -Xl + tf V'(X -X']
= tr A'A - tr A'X - tr X'A + tr X'X + tr V'X - tr V'X' .

We now make use of the results, in modified form if necessary, of Examples 5.4
and 5.5, we obtain
or
_._. = -2A + 2X + V - v'
ax u-u'
= 0 for X = A +--
2
Then
V'-u
X' = A ' + - -
2
and since X "" X', we finally obtain
X=HA+A').

(ii) This time


G(X) = X'X-I = O.
Hence
t'" = tr[A'- X'J[A - X] + tr V'[XX'- I]
so that
ar =
- -2A +2X+X[V+ V']
ax
v +' V ']
= 0 for X = A-X r[-2-
Pre multiplying by X' and using the condition
X'X = I
we obtain
V+V'
X'A = 1+--
2
and on transposing
V+V'
A'X = 1+--
2
Hence
A'X = X'A. (7.14)

If a solution to (7.14) exists, there are various ways of solving this matrix
equation.
Sec.7.3J Problem 1 99

For example with the help of (2.13) lind Example (2.7) we can write it as
l(/ ® A') .- (II' ® I)UJ x =0 (7.15)
where U is a permutation matrix (see (2.24» and
x == vecX.
We have now reduced the matrix equation into a system of homogeneous
equations wh.ich cart be solved by a standard method.
If a non-trivial solution to (7.15) does exist, it is not unique. We must scale
it appropriately for X to be orthogonal.
There may, of course, be more than one linearly indepentlent solution to
(7.15). We must choose the solution corresponding to X being an orthogonal
matrix.

Example 7.2
Given

find the othogonal matrix X which is the least squares best approximation to A.

Solution

[I0A'] =

r 1 2
o
o
l
1
0
0
0
0

2
o
-1
1
,
and [A 0I] U = [,-I
0
2
0
0
I
0
0

0
2 l
Equation (7.15) can now be written as

[-~
0 0
-1
x = 0 .
-1 1
0 0 -IJ
There are 3 non-trivial (linearly independent) solutions, (see [18] p. 131). They
are
x = [1 -2 1 1]', x = [1 1 2 -1]' and x = [2 -3 3 2]' .

Only the last solu lion leads to an orthogonal matrix X, it is

X = yTI
I [2-3 2J31 .
380A
100 Some Applications of Matrix Calculus [Ch.7

7.4 PROBLEM 2 - THE GENERAL LEAST SQUARES PROBLEM


The linear regression problem presents itself in the following form:
N samples from a population are considered. The ith sample consists of an
observation from a variable Y and observations from variables Xl> X 2 , ••• , X"
(say).
We assume a linear relationship between the variables. If the variables are
measured from zero, the relationship is of the form
(7.16)
If the observations are measured from their means over the N samples, then
(7.17)
bOt b l , b2 • ••• , bn are estimated parameters and et Is the corresponding residual.
In matrix notation we Can write the above equations as
y = Xb+ e (7.18)
where

and
y =
[~:J ' [:] [i]
b = , e =

x12 Xln Xu X12 Xl n

X= X22 X2I1 or X= X21 x22 x2n

XN2··· xNn XNI xN2 '" xNn

As already indicated, the 'goodness of fit' criterion is the minimisation with


respect to the parameters b of the sum of the squares of the residuals, which in
this case is
S = e/e = (y/-b'X')(y-Xb).
Making use of the results in table (4.4), we obtain
a(e/e)
- - = -(y/X)' - X/y + (X'Xb + X'Xb)
ab
= -2X'y + 2X'Xb
= 0 for X'Xb = X'y . (7.19)

where "b is the least squares estimate of b.


If (X/X) is non-Singular, we obtain from (7.l9)
b= (X/xt 1 X/y . (7.20)
Sec. 7.4] Problem 2 101

We can write (7.19) as


X'(y -Xii) 0
or
X'e = 0 (7.21)

which is the matrix. form of the normal equations defiend in section 7.2.

Example 7.3
Obtain the normal equations for a least squares approximation when each sample
consists of one observation from Yand one observation from

(i) a random variable X


(ii) two random variables X and Z.

Solution
(i)
b

hence

X'[y-Xh]

So that the normal equations are

LY, = biN + h1, LX,


and
LX/Y, =: hi LX, + b1, LX? .

(ii) In this case

~:J
XI ZI YI

X= X2 Z2 Y Y1, b =

XN ZN YN
The normal equations are

LY, = biN + b1, LX; + b3 LZ,


- - 1,-
LX,Yj =: b l LX, + b1, LX; + b3 LX/Zj

and • - - 1,
LXjZ/ =: b l LZ; + b1, LXjZ; + b3 LZ/ .
123

102 Some Applications of Matrix Calculus [eh.7

7.S PROBLEM 3 - MAXIMUM LIKELIHOOD ESTIMATE OF THE


MULTIVARIATE NORMAL
=
Let X/(i 1,2, .•. , n) be /I random variables each having a normal distribution
with mean IJ./ and standard deviation 0/, that Is
XI = n (1J.1o all . (7.22)
The joint probability density function (p.d.f.) of the n random variables is

!(xl>X2," .,xn) =
1
(27T)n/21VII/2 exp
(-!ex - p.)' V-I(X
2
- p.))
(7.23)

where
-oo<X/<oo (i=1,2, ... ,II)
and

is the covariance matrix.


I ,,' = [IJ.IoI-ll,,,.,Pn], x' = [XI,X2,''''Xn ]
.'I and
01/ = Plj a/ OJ (I '1= j)
all = a?

are the covariances of the random variables.


Plj is the correlation coefficient between XI and Xj. The covariance matrix V
is symmetric and positive defmite.
Equation (7.23) is called a multivariate normal p.d.f. Maximum likelihood
estimates have certain properties (for example, they are asymptotically efficient)
which makes them very useful in estimation and hypothesis testing problems.
For a sample of N observations from the multivariate normal distribution
(7.23) the likelihood function is

L -
_ 1
nN/2
( I..!!::
IN/'l exp - - L... (XI-
I
p.) V
-I
(XI - P.
)}

(27T) IV I . 2 /=1
so that
N I ~
10gL = C--IogiVl-- L... (x,-,,), V- 1 (x,-,,) (7.24)
2 2 /=1

where C is a constant.
Sec. 7.5] Problem 3 103

(u) The muximum likelihood estimate of Il


On expanding the last term of (7. 24), we obtain
1~
--2 2: {x/ V-I xI - Il' V-I
1=1
XI - x/ V-I Il + Il' V-I Il}.
With the help of table (4.4) and using the result
(x/ V-I)' = V-I XI (since Vis symmetric)
we differentiate with respect to Il, to obtain

};XI
= 0 when jJ. = x.
N
Hence the maximum likelihood estimate of" Is " = X, the sample mean.

(b) The maximum likelihood estimate of V·


We note the following results:
(1) N
L y/V-IYI = tr(Y'V-Iy)
1=1

where Y = [YI Y2'" YNl


and YI = XI-I! (i=1,2, ... ,N.
V-I is a symmetric matrix.

(2) By Example 5.3, but taking account of the symmetry of V-I (see Example
4.4)
a log IV-II
- - - 1 - = 2V-diag {V}.
av-
(3) If X is a symmetric matrix

a tr(AX) I
---'----' = A +A - diag {A} .
ax
Let A = yy' and X = V-I, then
atr(yy'V- I ) I
== 2Yy' - diag {yy } .
aV-I
23
30'01.

104 Some Applications of Matrix Calculus [Ch.7


We now write (7.24) as
N 1
10gL = C+ -log V- I - - tr(YY'V- I ).
2 2 '

Differentiating log L with respect to V-I, u~ng the esHmate p = i1., and the
results (2) and (3) above, we obtain

a 10gL N i l ,
-- I = - [2 V - diag {V}] - YY + - diag {YY } .
av- 2 2
Let Q = NV - YY', then
alog L 1
aV-I "" Q - 2" diag {Q}
=0 when 2Q = dla!! {Q} .
Since Q is symmetric, the only solution to the above equation is
Q = O.
It follows that the maximum likelihood estimate of V is

7.6 PROBLEM 4 - EVALUATION OF THE JACOBlANS OF SOME


TRANSFORMATIONS
The interest in Jacobians arises from their importance particularly with reference
to a change of variables in multiple integration.
In terms of scalars, the problem presents itself in the following way.
We consider a mUltiple integral of a subset R of an n-dimensional space

Lf(XhXl,'" ,xn ) d'1:1 dx l ... , dxn . (7.25)

where f is a ph:ccwise continuous function in R.


We consider a one to one transformation which maps R onto a subset T

(7.26)

and the inverse transformation

XI = WI(Y), Xl = wz(y), ... , Xn = Wn(y) (7.27)


where
X' = [XIoXZ, ... ,x n] and y' = [YhYz,·.· ,Yn]
Sec. 7.6] Problem 4 105

Assuming the first partial derivations of the inverse transformation (7.27) to be


continuous. (7.25) can be expressed as

J![W\(Y). WI (y) •...• wlI(y)]IJI dYl dY2'" dYII (7.28)

where IJ I can be wrl tten as

aXI ax. aXil


aYI aYI aYl
aXI ax. ax"
III
ay' aYl aY2
I:.; ,. (7.29)

aXI ax. aXn


aYII aYn aYII

subject to IJI not vanishing identically in T.


Example 7.4

Let I = 2 L exp {-2xI + 3X2} dXI dx.

o<XI < 00. 0 <X2 < 00.


Consider the transformation
YI = 2xI -X2
Y2 = X2 .
Write down the integral corre~pondillg to (7.28).

Solution
We are given
R ;::: {(XI, X2); 0 <XI< "", 0 <X2 < co} .
The above transformation (corresponding to (7.26)) results in the following
inverse transforma tion (7.27)

XI = 1(YI +h)
X2 = Y2
which defines
123

106 Some Applications of Matrix Calculus [Ch.7

and by (7:29)
! 0
IJI = = ! .
!
Hence
/= J:[! VI +Y2),12] dYI d12

= {ex p (-YI + 2Yi)dYi dYl'

Our main interest in this section is to evaluate Jacobians when the transfor-
mation corresponding to (7.26) is expressed in matrix form, for example as
Y = AXB (7.30)
where A, X and B are all assumed to be of order (n X n). .
As in section 5.2 (see (5.1) and (5.2» we can write (7.6) as
y = Px (7.31)
where y = vee Y, x = vee X and P = 8' ® A .
In this case

ay = B®A'
ax
and
ax
= [B®A'tl = B-1®(A'r l by(2.12).
ay
It follows that

IJI avec YI- 1


181-n IAI- n (by Property X, p. 27) (7.32)
Ia vecX
Example 7.5
Consider the transformation
Y = AXB

rL-l2-4]
where
A = and B = [ 21 11]'
3
Find the Jacobian of this transformation
(i) By a direct method
(ii) Using (7.32).
Sec. 7.6 J Problem 4 107

Solution
(i) We have

X = A-IYB- I = 1~ [3Y I + 4)/2 -3Y3 - 4Y4 -3YI - 4Y2 + 6YJ + 8Y4J


YI + 2Y2 -Y3- 2Y4 -YI - 2yl + 2Y3 + 4)'4
so that
3 -3 -1
ax 4 2 4 -2
ay = (if 0 0 3
==
4
-4 -2 8 4
(ii) IAI = 2, IB\ = I hence \JI= i.

Similarly, we can use the theory developed in this book to evaluate the
lacobians of many other transformations.

Example 7. 6
Evaluate the Jacobian associated with the following transformation
(i) Y=X- 1
tii) Y = X2.

Solution
(i) From Example 5.2
ay
= -X-I ® (X-I)'
ax
so that
ax
= -X®X'.
ay
Hence
J = mod I : : 1 = \X ® x' \ = \X I-n IX I-n
(ii) From section 4.6

so that by the 2nd transformation principle (see section 5.3)

ay
- = X®I+I®X'
ax
and
J = IX®I+I®X'I-I
123

108 Some Applications of Matrix Calculus [Ch. 7

7.7 PROBLEM 5 - TO FIND THE DERIVATIVE OF AN EXPONENTIAL


MATRIX WITH RESPECT TO A MATRIX
Since We make use of the spectral decomposition of an exponential matrix, we
now discuss this technique briefly.
Assume that the matrix Q = [qjj] of order (n X n) has eigenvalues
AI, Al,',', A"
(n6t necessarily distinct) and corresponding linearly independent eigenvectors

The eigenvectors of Q' are

Y\, Yi,"" Yn •
ThClse two sets of eigenvectors have the property
xj Y/ = a or (equivalen tly) Y/ xI =0 (i "*' j) (7.33)
and can be normalised so that
xI Yi = 1 or Y; Xi = 1 (i = 1, 2, ... ,n) . (7.34)
Sets of eigenvectors {Xi} and {Yj} having the properties (7.33) and (7.34) are said
to be properly nomlalised.
It is well known (see [18] p. 227) that
exp (Qt) = P diag {eAlt, e A•t, ... , e A" t} p- I

where P is the modal matrix of Q, that is the matrix

P = [XI:X2:.":X,,]. (7.35)
It follows from (7.33), (7.34) and (7.35) that

p-l , (7.36)
Y2

,
Yn
Hence
,
o Yl
exp (Qt) = [XI X2 ... x,,] 0 o y~

,
o o Yn
Sec. 7.7] Problem 5 109

that is

exp (Qt) = i/=1


XjY! exp (Ajt) (7.37)

The right hand side of (7.37) is kIlOwn as the spectral representation (or spectral
decomposition) of the exponential matrix exp (Qt).
We consider a very siniple Illustrative example.

Exumple 7. 7
Find the spectral representation of the matrix exp (Qt), where

Solution
I, A2 = .-1; x; [2 -1], x; = [1 -1]
I
YI [1 1], y~ = [-1 -2] .
By (7.37)

exp(Qt) = [_~][1 1] exp(t)+[_~][-l -2] exp(-t)

=
[ 2 2]
-1 -1
exp (t) + l-1 -2J
1 3
exp (-t) .

Although we have considered matrices having real eigenvalues, and eigen-


vectors having real elements, the spectral decomposition (7.37) is also valid for
complex elements as can be shown by a slight modification of the above exposition.
By the use of (2.17), that is of the result
exp (1 ® Q) = 1 ® exp (Q)
we generalise the result (7.37) to

exp (1 ® Q)t = 'L(I ® x/y/) exp (A/t) . (7.38)

We now consider the main problem, to obtain an expression for


a(ll/aZ
where
cl>(t) = exp(Qt), (7.39)
so that
<1>(0) = 1, (7.40)

d
- (1) = Q<I>, (7,41 )
dt
and
Z = [zij] is a matrix of order (r X s).
123

110 Some Applications of Matrix Calculus [Ch.7]

The matrix Q is assumed to be a function of Z, that Is Q(Z). Making use of the


result (6.5), we can write

d a~
dt az = az = aQaz (/®~)+(/®Q) aclaz>
a(Q~)
(7.42)

and from (7.40)


a<l>
az (0) = O. (7.43)

We next make use of a generalisation of a well known result (see [19] p.68);
Given
d
-x = RX+BU
dt
'and
X(o) = 0,
then
X = itexp {R(t -r)1BU(r)dr. (7.44)

For

x ;: :; 04> R;:::; / ® Q B = aQ and U = / ® (ll


az' I az
the solution to (7.42) subject to (7.43) becomes

a4>
- =
It exp {I ® Q(t -r)} -
aQ
[/ ® <fl(t)]dr
az 0 az

;:::;
t ""
~ (/ ® XiYi) exp (Ai(t -r»
I a
az [/Q ,
® XjY, 1exp O,;r) dr
Jo 1,/
(by 7.37 and 7.38)

= .2 (/ ® XIY;) aQaz-;:- (/ ® Xjyj) exp (Alt)


It exp «AI - A/)r) dr.
I,i 0
Hence,
a<fl ""
= L, I ® XIY;) -
aQ (i ® x/yj) exp (Alt) flj (t) (7.45)
az 1,1 az
where
fu(t) = t if AI = Ai
and
ii/t);:::; (1/(AI-A/))[exp(A/-A/)t)-I] if A/*A/'
Solution to Problems

CHAPTER 1

(1)

(2) (a) The kth column of AElk is the ith column of At all other columns are
zero.
(b) The ith row of EikA is the kth row of A, all other rows are zero.

AEik = Aeie~ = A'ie~


EikA = eie~A = ejA'k

(3) tf ABC = L e;ABCei = L(eiA)B(Cei)

= L A /. BC.,
(4) tf AE/j = L e~ Eljek = L e~a,sErsEljek
k k",S

= 2: a,se~ e,e~eie;ek
kiT,S

== 2:
Ic, r,s
==
a,sflkrOslOjk a/I'
123
380A

112 Solution to Problems

(5) A = .L
1.1
2:
ail Ail =
/.1
tr(BEI/liil)Eil

= 2: ekBEilel<~1 L = ekBelfill<Ei/
l.i,I< I,i.k

= .L
/
eIBe/EIi = 2: I
bllEIi = diag {B},

CHAPTER 2

( 1) Since U is an orthogonal ma trix. the resul t follows.


More formally,

L
r.s
[Ers(m X n) ® Esr(n X m)] [Esr(n X m) ® Erim X n)]

=L
r,s
[Erim X n)Esr(n X m)] ® [Esr(n X m)Ers(m X n)]

L
r,s
[Oss Err Cm X m)] ® [orrEssCn X n)]

= L Err(m
r
X m)] ® [.2 Ess(n
s
X n)]

= 1m ® In ::;= Imn. the result follows.

(3) (a)

r ~ r
2 -1

A®B = i 0 2
~
-1
I
2
0

J
o -1 B®A =
2 0
0 2
2 0
(b)
1 0 0 0
0 0 0
UI = U2 =
0 0 0
0 0 0

(4)
See [18] p. 228 for methods of calculating matrix exponentials.
(a)
[2e -e- I 2(e -e- I )]
exp(A) = -I
e -e 2e- 1 - e
Chapter 2 113

(b) [2e -e-1 , 0


. 0 2e -e- I
exp(A ®I) =
-(e-e - I )
o
2(e-e- l )

{'~'-' 2e
0
-e- I 0
exp(A) ®I
c- I -e 0 2c- 1 -c
0 e-I-e 0
Hence exp (A) ® I = exp (A ® I) .
(5) (a)
A _, [I IJ _, '[-4 -~J '
=
-1. -2 '
B =-
2 3
so that

['
2 -4
-1
~~ ~
3
A-I ®B- I

-3
-2
-6
3
-J
-1
2

~~ ~-2 I ]
(b) As 4 1
8 3
A®B = , it follows that
-2 -1
-3 -4 -3

['
1
3/2 -1/2 3/2 -1/2
(A ®B)-I =
-1 4 -2
~3/2 1/2 -3 -4
This verifies (2.12)
(6) (a) ForA; Al = -1, 1..2 = 2, x; == [14] and x; == [11].
For B; III = 1, 112 == 4, y; = [I -1] and y~ == [1 2] .

(b) A ®B = ~6 ~ =~ =~J
8 4 -4 -2
== E (say).

8 12-4 --{)
23
380A
114 Solution to Problems

C(A) = /A/-E/ = A4-SAJ- 30A2 +40A-30


= (A + I)(A + 4)(A - 2)(A - 8) .
Hence the eigenvalues of E are

{-I, -4, 2, 8} = {AIJ.lb A,II.l2, A,2J.lI, A2J.l2}'


The corrcsponuing eigcllv(!ctors of H arC:

[~ .[il tJ
(c) This verifies Property IX
'nd [iJ
(7) For some non-singular P and Q
A = p-Icp and B = Q-IDQ.
Hence
A ®B = p-1CP®Q-1DQ
l
= (r ® Q-I)(CP®DQ) by (2.11)
l
= (P ® Qr (C ® D)(P ® Q) by (2.12) and (2.11)
=R-1(C®D)R
where
R = P®Q.

The result follows.

CHAPTER 4

~
(1) :; ~ [~: o ay 2e2X]
o -x-2
ax
4x cos x
(2)(a) IXI = x sin x -exp (2,)

~= [ex -cos xl
ax -x sinxJ.

IXI = exp (x) sin x -x cos x,


alxl = Ix -2e 2x l
ax L-2e 2x
sinx J.
Chapter 4 115

a~:1 = [;~: ;~:J [~X ~::SXJ.


(b)
=
alxl
ax = 2[X/j]-diag {Xii}

= [~~ex ~~i::J -[~ ~nxJ


(3)
xli + X~I XII XI2 + x21 X22
Y = Xl2Xli +X22X21 X!2 + X12
[XI3 X II +x2J X 2i XI3 X l2 +X23 X 22

hence

~2~
(a) 2X21 x22
ay
-- =
aX21 "
.

From Example 4.8


rX22

X2J 0
0
oJ.

~~ ~J ~"
o 0 21
XI2

x22 X2)
XI~ + [" X'] ~ ~J
x12 X22

XI3 x23
0
1 0

(b) Since Y13 = xU x 13 + X21x23

l" "J
aYI3 0
ax X23 0 X21 =

which is the result in Example 4.8.


123

116 Solution to Problems

(4)(a)

aYi, = E-jX'A' + A'X'E...


ax' 'I

(b)

°Yil = AX'Eii + E1,X'A' .


ax
(5) By (4.10)
alYI
= tr {IYI(y-I)'B'E;sA'}
= WI tr {A'(y-I)'B'E;s}
= IY I (vee Ers)' vee [A'(y-I)'B']
IAXBlzrs
where
[zrs] = Z = A'[(AXBtl]'B' .

(6) (a) Since


3 (XX') , ,
- - = ErsX +XErs ,
ax,s

(b)

CHAPTER 5

(l) Since
Yll allx ll + al2 x l2
Y2l a21 x 11 + 012 X I2

Y12 a11 x 2l + al2 x 22


Y22 a2l x 2l + a22 x 22
Chapter 5 117

o
all
o
0]
a21

0
all a22 •

(2) (a)
oveeY _ (B®A')(n) by(s.i8)
a vee X
(b)
o vee Y = X ®I +I ® X'
a veeX
(3)(a) a tr Y (vee E,.)' (vee A'il') ,
ax,s
hence
a tr Y
= A'lJ'.
ax
(b) a tr Y
2 tr E;sX' ,
ax,s
hence
aIr Y
2X' .
ax
(c) a tr Y
21rE;sX,
ax"
hence
a tr Y
- - - 2X.
ax
(4)(a) a tr Y -Ir X-I /.:.~. X-I -Ir E;s (X- 2 )' ,
axrs
hence
a tr Y
(x-2)' .
ax
(b) a tf Y
= -tf AX-IErS x-In ,
ax,s
hence
a tf Y -(X-lBAx- 1), •
ax
123
380A

118 Solution to Problems

(c) atrY

ax"
hence
a tr Y
- - = n(X II
-
1
)'
ax"
(d) I I
exp (X) = I +X +- Xl +- Xl + ...
21 31
hence by the result (c) above
a tr Y
ax- ,
= exp(X).
(5) (a) (i)

X l1 dx l2 +Xl2 dx ll +~21dx22 +X22dX21]


2X12dxI2 + 2X22dxn

= [:::~ :::~] [:~: :~:] + [::~ ::~] [::~: ::~:]


(dX)'X + X'(dX) .

(iii)

2X l1 dX l1 +X12dx21 +X21dx12
dY =
[
xlldx21 +X21 dx ll +X22 dx 21 +X21 dx22

Xli dx 12 + X 12 dx II + X 12 dx 22 + X 22 dx 12 ]
X21 Ux 12 +X12 Ux 21 + 2X'22 Ux 22

Xll dXll + X12 dX 21 Xll dX12 + X12dX22]


[ X21 dX
ll + xn dX21 X21 dX12 + x22 dX22

XlldXll +X2Idx12 x 12 dx ll +X22dX12]


[ Xll dX21 + x21 dX21 X12 dX21 + X22 dX22
= X(dX) + (dX)X .
Chapter 6 119

(b) Write Y== UVwhere U=AXand V=BX I

then dY== U(dV) + (dU)V


== AXB(dX) + A (dX)BX .

CHAPTER 6

~-'In(x" + x,,)
(I) 0
ay
-
ax
==
XI2exllx,.
xli
0 Xu eXllx
..

+ X 22)
0]
X22

~12
0 Xli -sin (x 12

0 0 0

(2)
and so on I

hence by (6.1)

1 0 0 I
0 0 0 0
0 0 0 0
0 0 0 0
ax
- 1 0 0 I O.
ax
0 0 0 0
0 0 0 0
0 0 0 0
0 0

(3) Since
X-I
/::,.
L"-X21
-x"J
XII

where /::,. == XIIX22 - Xl2X2l> we can calculate aX-ljaxrs , for example


123
380A

120 Solution to Problems

Hence

~
~2
ilX- 1 ~ -x21 X 22
-x 12X22
X I2X21
-X.2 X 21

X~I
XlI
X

-XlIX.I
22 J
oX A' -x12xn xI. X 12 X21 -XlI X 1.

XU X 22 ~XI2XU -XU X .1 xii


o
=
o

= -(I®X- I ) O(I®X- 1).


(4)
A ®X- 1 == -1 [a\lX22
-au Xu a12 x ,2
-a
l2 Xlj
A -aUx,l allxll -allx:!l a12x ll

a21 x 22 -a2l x l2 a22 x 22 -a21 x 12

-a2l x 21 a21 x ll -a2Z x "1l a22x ll

where A = XllX22 - Xl2x,1 •


We can now calculate
a(A ® X-l)/axrs
axrs
and form
o o 0 -all 0
o o 0 o
o o o o o
o a21 o a22 0 o o o
ax =A o o
0 o 0 all o
-all 0 -a12 0 0 o o o
o 0 o 0 all o o
o o o
I
·1

'.i
'.j
,I
I
i

.1
I
'j
Tables of Formulae and ':1

Derivatives "I
- i
· I
·.
;
i
j

Table 1
Notation used: A = [ailLE = [bill
,
Eii = e/ei
Bij :: e/ ci = e/ el

E;jer = Bjre;
E;jErs = BjrEis
EjjEjsEsm = E,m
EjjErs = 0 if j =l=r l
.1
A = l::Z;aijE;j
I J
A. j = Aej
,
;
· \
Aj • = A'el
E/jAErs = ajrEij
trAE = 'I.'£a/lbl/
/ I
tr AS' = tr A'B.
trAB = (vecA')'vecB.
123
3S'OA

122 Tables of Formulae and Derivatives

Table 2

A ®B = laljBj
A ® (aB) = a(A @ B)
(A + B) ® C = A ® C + il ® C
A ® (B + C) = A ® B +A ® C
A ® (B ® C) = (A ® B) ® C
(A ®il)' = A' ®B'
(A ®B)(C®D) = AC®BC
(A ®Br l = A-I ®B- I
vec(AYB) = (B'®A)vec Y
IA ® ill = IAlm IBln when A and B are of ordt!r
(/1 X n) and (m Xm) respectively
A ®B == UI(B ®A)U21 UI and U2 are permutation
matrices
tr (A ® B) = tr A tr B
A C±> B = A QlI 1m + In ® B
U = 'L'LErs®E~s
r s

Table 3

a(Ax) :=: A'


ax
a(x'A)
:=: A
ax
a(x'x)
--:=: 2x
ax
a(x'Ax) ,
---:=: Ax+Ax
ax
a7. ay az
ax ax ay
Tables of Fonnulae and Derivatives 123

Table 4

alex)
ax
alxl
IXI(X- 1), when elements of X are
ax independent
2 [X,/l - uiag {XII}, when X is sYlllmetric.
ax = Ers
axr .!'
ax'
- = E;s
oXrs

a(AXB)
- - - = AErsB
axrs
il(AX'B) ,
= AErsB
oXrs

a(X'A'AX)
E,.'A'AX + X'A'AErs
axrs

3(X'AX)
ax,s
a(x n )
ax,s
=
Ic=O
I XlcErsXn-k-l
1; 123

124 Tables of Formulae and Derivatives

Table S

a vee (AXB) ,
----=---'- = B ® A
a veeX
a vee (X'A,X) = U'(AX ® I) + (/ ® A 'X)
a vecX
J
a vee (AX- B) = _(X-I B)<® (x-I)'A'
I, a veeX
I'
II
ii Table 6
I

I I
a log IXI
ax
= (X-I)'

aIXlr
- - = rIXlr(X-I)'
ax
a tr (AX) ,
=A
ax
a tr (A'X)
-....:----=- = A
ax
a tr (X'AXB) , ,
-~-~ = AXB+AXB
ax
a tr (XX')
= 2X
ax
a tr (X") = nX"-1
ax
a tr(e x ) x
=e
ax
a tr (AX-I B) = -(X ~I BArl)'
Tables of Fonnula e Ilnd Derivatives 125

Table 7

ax (X symmet ric)
0+ U - L,Err ® Err
ax
ax
- = (] (clemen ts of X indepen dent)
ax
ax'
-=U
ax
a(xy) ax ay
-- = - (I ® Y) + (I ® X) -
az az az
ax- 1
- = -(1 ® X-I)O( / ® X-l)
ax
a ex ® Y)
az =
ax
az ®Y
raY ~
+ [/ ® Uil Laz ® X [1 ® U2 ]
J
123
38'01.

Bibliography

[1] Anderso n, T. W., (1958), An Introduction to Multivariate


Statistical
Analysis, John Wiley.
[2] Athans, M., (1968), Tile Matrix Minimum Principle, Informa tion
and
Control, 11,592- 606.
(3] Athans, M., and Tse, E., (1967), A Direct Derivati on of
the Optimal
Linear Filter Using the Maximu m Principle, IEEE Trans. Auto.
Control,
AC-12, No.6, 690-698 .
[4] Athans M., and Schwep pe, F. C., (1965), Gradient Matrices
and Matrix
Calculations, MIT Lincoln Lab. Tech., Note 1965-53 , Leming ton, Mess.
[5J Barnett, S., (1973), Matrix Differential Equatio ns and Kroneck
er Product s,
SIAM, 1. App/. Ma tho , 24, No.1.
[6] Bellman, R., (1960), Introdu ction to Matrix Analysis, McGraw
-Hill.
(7] Bodewig, E., (I 959),Matrix Calculus, Amsterd am: North Holland
Publishing
Co.
[8] Brewer, J. W. (1978), Kroneck er Product s and Matrix Calculus
in System
Theory, IEEE Trans. on Circuits and Systems, 25, No.9, 772-781 .
(9J Brewer, 1. W., (1977), The Derivatives of the Exponen tial
Matrix with
respect to a Matrix, IEEE Trans. Auto. Control, 22, 656-657 .
[lOJ Brewer, 1. W., (1979), Derivatives of the Characte ristic Polynom
ial Trace
and Determi nant with respect to a Matrix, IEEE Trans. Auto.
Control,
24,787- 790.
[11 J Brewer, J. W., (1977), The Gradien t with respect to a Symmet
ric Matrix,
IEEE Trans. Auto. Control, 22, 265-267 .
[12J Brewer, 1. w., (J 977), The Derivative of the Riccati Matrix with respect to
a Matrix, IEEE Trans. Auto. Control, 22, No.6, 980-983 .
[13J Conlisk, 1. (1969), The Equilibr ium Covariance Matrix of Dynami
c Econo-
metric Models, American Stat. Ass. Journal, No. 64,277- 279.
[14J Deemer, W. L. and Oikin, I., (1951), The Jacobian s of
certain Matrix
Transfo rmation s, Biometrika, 30, 345-367 .
~ I I I! i.'.

Bibliography 127

[15] Dwyer, P. S. and Macphail, M. S., (1948), Symbolic Matrix Derivatives,


Ann. Math. Statist., 19,517-537.
[16] Dwyer, P. S., (1967), Some Applications of Matrix Derivatives in Multi·
variate Analysis, American Statistical Ass. Journal, June, pt 62; 607-625.
[17] Geering, ll. P., (1976), On Calculating Gradient Matrices, IEEE Trans.
Auto. Control, August, 615-616.
[18J Graham, A., (1979), Matrix Theory and Applications for Engineers and
Mathematicians, Eills Horwood.
[J 9] Graham, A., and Butglles, D., (1980), Introduction to Control Theory
Including Optimal Control, Ellis Horwood.
[20] Lancaster, P., (1970), Explicit Solutions of Unear MatriA Equations,
SIAM Rev., 12, No.4, 544-566. :
[21] MacDuffee, C. C. (1956), The Theory of Matrices, Chelsea, New York.
[221 Neudecker, H. (1969), Some Theorems on Matrix Differentiation with
special reference to Kronecker Matrix Products, 1. Arner. Statist. Assoc.,
64,953-963.
[23] Neudecker, II., A NOle alKronecker Matri.;'( Products alUi Matrix Equation
Systems.
[24] Paraskevpoulos, P. N. and King. R. E., (1976), A Kronecker Product
approach to Pole assignment by output feedback, Int. 1. Contr.. 24. No.3,
325-334.
[25J Roth, W. E .• (1944), On Direct Product Matrices. Bull. Amer. Math. Soc.,
No. 40, 461-468.
[26] Schonemann. P. H., (1965), On the Formal Differentiation of Traces and
Detenninants, Research Memorandum No. 27, University of North Carolina.
[27] Schweppe, F. C., (I973), Uncertain Dynamic Systems, Englewood Cliffs,
Pren lice I-lall.
[28J Tracy, D. S. and Dwyer, P. S., (1969), Multivariate Maxima and Minima
with Matrix Derivatives, 1. Arner. Statist. Assoc., 64, 1576-1594.
[29] Turnbull, H. W., (1927), On Differentiating a Matrix,Proc. Edinburgh Math.
Soc., 11, ser. 2, 111-128.
[30] Turnbull, H. W., (1930/31), 'A Matrix Form of Taylor's Theorem', Proc.
Edi/lburgh Math. Soc., Ser. 2, 33-54.
[31 J Vetter, W. 1., (1970), Derivative Operations on Matrices, IEEE Trans.
Auto. Control, AC-15, 241-244.
[32] Vetter, W. 1., (1971), Correction to 'Derivative Operations on Matrices',
IEEE Trans. Auto. Control, AC-16, 113.
[33] Vetter, W. 1., (1971), An Extension to Gradient Matrices,JEEE Trans.
SysL Man. Cybernetics, SMC-l, 184-186.
[34J Vetter, W. 1., (1973), Matrix Calculus Operations and Taylor Expansions,
SIAM Rev., 2, 352-369.
[35] Vetter, W. 1., (1975), Vector Structures and Solutions of Linear Matrix
Equations, Lillear Algebra and its Applicatiol1s, 10, 181-188.
12

1 Bibliography

[W. 1., (1971), On Linear Estimates, Minimum Variance and Least-


Weighting Matrices, IEEE Trans. Auto. Control, AC-16, 265-

[:-, R.I. and Mulholland, R. J., (1980), Kronecker Product Represen-


or the Solution of the General Linear Matrix Equation,JE.EE Trans.
'antral, AC-25, No.3, 563-564.
jex

c K
Rule Kronecker delta, 13
~rix, 88 produc t,21,23 ,33,70 ,85
tor,54 sum, 30
:teristic equatio n, 47
L
or,57
11 vector, 14
Langrange multipli ers, 95
mion form, 47 least squares, 94, 96, 100
'ained optimis ation, 94, 96
M
D Matrix
nposition of a matrix, 13 calculus, 51, 94
product , 21 compan ion, 47
ltive decomp osition, 13
onecker product , 70 derivative, 37,60, 62,67, 70,75,
itriX, 60, 62, 64, 67,70, 75,81 81,84, 88
alar function , 56,75 differential, 78
ctor,52 elemen tary, 12, 19
minant, 27,56 expone ntial, 29, 31,42, 108
tion,94 gradien t, 56
integral, 37
E orthogo nal, 97
lValues, 27, 30 permut ation, 23, 28, 32
Ivectors, 27, 30 product rule, 84
entary matrix, 12, 19 symmet ric, 58, 95, 97
'anspose, 19 transitio n, 42
mential matrix, 29, 31, 42,108 maximu m likeliho od, 102
mixed product rule, 24
G multivariable system, 45
ient matrix, 56 multivariate normal, 102

J N
lbian, 53,109 normal equatio ns, 95,101

You might also like