Intro 1.0
Intro 1.0
Work with competence and integrity: In the end, we seek evidence based
decision making - not decision based evidence making.
My Summer Vacation
My Summer Vacation
My Summer Vacation
My Summer Vacation
current trends
In addition to back-office workers, on machines are replacing a lot of highly paid people, too.
Average compensation for staff in sales, trading, and research at the 12 largest global investment banks, of which
Goldman is one, is $500,000 in salary and bonus
for the highly paid who remain the pay of the average managing director at Goldman will probably get even bigger,
as there are fewer lower-level people to share the profits with, he says.
his expertise makes him suited to the task of CFO, a role more typically held by accountants Everything we do is
underpinned by math and a lot of software
Goldmans new consumer lending platform is entirely run by software, with no human intervention
current trends
EY
Whats the likelihood that the machines will
replace accounting and audit work?
Professional Competency EY
CPA skills are not enough anymore and the workforce is becoming
more and more competitive
Professional Opportunity
The job market is changing rapidly and most of the lower level
admin jobs are disappearing at the same time, analytics jobs are
exploding
McKinsey
Analytics job market today
trends This is our target, and this
position will increase faster
than the others
Analytics
Lead
Business
Translator
...Analytics job market today
Scope and Direction of Analytics
Major 2000 2010 2020
disruptions Internet Mobility Artificial Intelligence
IDEA, ACL
Visualization
(Tableau, Power BI)
Data Warehouses
(SSAS)
Data Warehouses
(SSAS)
Revenue: Change of volume or price? product mix? product technology and function? markets?
customers? new products, deprecated products? distribution channels? capacity, competitors, perceived
customer value, delivery costs
Costs: Change in capacity, transportation costs, inventory turn, manufacturing strategy / logistics,
materials market, procurement strategy, suppliers, product design and BOM
If youre in charge of the audit, do you think you need to understand this? What about managers in the
company?
Can the machines analyze this? If machines can identify hidden drivers, do you think the client would
consider that valuable?
Learning Systems
Two Broad Applications: Deep Learning and Process Automation
Data: What data needs to be considered? Do you need the entire Order-to-Cash process? Contracts -
T&Cs, Sales Order transactions, Shipping Transactions, Customs transactions, Payments.
How extensive and complex is that data? Where is it? How do you match and what about partial
matching? Will you need to read descriptions and documents? Make judgements?
Attitude
Its a competitive world. It takes a hundreds of things done right to get a
promotion, and one thing done wrong to get fired. Take your job seriously
(and this class is your job right now):
Come to class prepared (there will be pop quizzes)
Apply what you learn ask yourself what if and test your knowledge
Take responsibility this is graduate school
Professor
Ellen Terry
http://econolytics.org
ewterry@bauer.uh.edu
MH 360K
713-743-4820
Background:
Model
, a,e,
Data
Ellens Opinion: Its mostly important that you have applied the existing body of knowledge in a competent
manner. If the model is the best you have (not so obvious) then go with it (just be transparent and always
keep project sponsors in the loop). Telling managers and clients that the business model is invalid because of
some theoretical issue doesnt inspire them to write checks and continue your employment
Work with competence and integrity: In the end, we seek evidence based decision making - not decision
based evidence making.
Model Flexibility vs. Interoperability
Linear
Regression
Local Models
(Splines and LOESS)
Support
Vector
Machines
Flexibility
In general, more restrictive models will be better for inference (understanding relationships between predictors
and response variables) and ensembling (consolidating algorithms within larger models, or piping data
between models).
More flexible models are usually more complex and involve more parameters leading to issues with
overfitting, more training data and processing times.
Central Concept: Bias vs Variance
Bias Variance
^ ^ ^
Err(x)=(E[(x)]f(x))2+E[((x)E[(x)])2]+2e
Test MSE model training Random
parameters samples
General rule: as you move to more complex models, bias decreases and variance increases
Bias vs Variance
Test MSE
Training MSE
^
Comparing different f models to the true f non-linear data (composite for all models)
bias vs variance
# OK, let's create a couple of vectors and move on with matrix operations
B <- c(3,2) # vector 1
C <- c(0,1) # Vector 2
D <- A #easy to duplicate data structures
D <- A[,1] # or parts - notice that D beccomes a vector
D <- cbind(D, A[,2]) # and back again
D <- t(D) #transpose a matrix
A matrix transposition rotates the matrix on the main diagonal (from 1,1)
DA1 Review : Basic Matrix Operations
Operator or
Description
Function
A+B Must be same structure
A-B Addition / Subtraction always an element operation
t(A) Transpose
A*B Element multiplication (the product of vectors or matrices e.g., product * price)
A %*% B Matrix multiplication (Important)
A %o% B Outer product. AB'
crossprod(A,B)
A'B and A'A respectively.
crossprod(A)
DA1 Review : Matrix Addition and Multiplication
G <- A %*% B
#Dot product
t(A)
H <- crossprod(A,B)
#(equivalent to t(A) * B)
DA1 Review: Diagonals and Determinants
diag(x) Creates diagonal matrix with elements of x in the principal diagonal
E <- diag(A)
I <- diag(2)
If you feed it a matrix, it will give you back a vector of the diagonal
If you feed it a number, it will create an identity matrix with that number (1 being the most common)
J <- det(A)
(2*1)-(3*2) =
used to solve linear systems of
equations
K <- solve(A)
To get the inverse (A)-1: swap the positions of a and d, put negatives in front of b and c, and divide everything by
the determinant (thats why you have solve ) a matrix * its inverse equals the identity (definition of inverse) A-1A = I
Check to see if K*A is a matrix of ones (not the same as an identity matrix)
L <- K*A
DA1 Review: Eigenvector transformation
Eigenvectors & Eigenvalues
Originally utilized to study principal axes of the rotational motion of rigid bodies, eigenvalues and eigenvectors have a wide
range of applications, for example in stability analysis, vibration analysis, atomic orbitals, facial recognition, and matrix
diagonalization. In essence, an eigenvector v of a linear transformation T is a non-zero vector that, when T is applied to it,
does not change direction. Applying T to the eigenvector only scales the eigenvector by the scalar value , called an
eigenvalue. This condition can be written as the equation:
Av = v , or (A- I) = 0
if det((A- I) = 0
is a scalar eigenvalue associated with an eigenvector v that can be used for transformation of a matrix.
Eigenvalues are:
Non-Zero
nxn (square) matrix only (matrix is diagonalizable)
From a geometrical perspective, does not change the direction of a vector A (next slide)
There always exists at least one eigenvalue / eigenvector
When eigenvectors are applied to linear transformation, the matrix just gets scaled, and the transformation still tells
us what we need to know about the original matrix
Z <- A%*%B
When dot products are applied to linear transformation, the transformation still tells us what we need to know
about the original matrix
Note that the eigenvectors and the dot product vector scale the original vector, but the direction doesnt
change (i.e., it gives us a mechanism to transform data without senescence)
DA1 Review: More Definitions
Vector Norm (sometimes call the magnitude)
|x|2 = |x| = x = c(1,2,3), then |x|2 = 14
2
=1
Note: the norm of a matrix is often written ||x||
L2 Norm (or Euclidean norm) is most common (there are different ways to calculate a norm and they get
different answers). More later on this
At
Get clear on Transpose (a few examples)
Recall earlier:
L <- K%*%A
DA1 Review: matrix decomposition
X2 + 4X + 3 = (X + 3)(X + 1)
You can factor Matrices too, which turns out to be very useful
0 = 81 + 202 56 solve(X, B)
[1,] 3.5
0 = 201 + 602 154 [2,] 1.4
= (XTX)-1 (XTY)
# note we can also use SVD for dimension reduction (like PCA)
# it's also used in advanced numerical solutions (won't be doing
that here)
fun with vectors
# exercise for fun # get the vector magnitude (eclidian norm)
norm_vec <- function(x)sqrt(sum(x^2))
V1 <- data.frame(x=c(0, 3), y=c(0, 4)) mV1 <- norm_vec(V1)
p <- ggplot(V1, aes(x=x, y=y))+geom_point(color="black") # Calculte the Direction Vector and Show that the norm = 1
p <- p + geom_segment(aes(x = V1[1,1], y = V1[1,2], xend = V1[2,1],
yend =V1[2,2] ), arrow = arrow(length = unit(0.5, "cm"))) cosX <- V1[2,1]/mv1
p <- p +xlim(0, 20) + ylim(0,20) cosY <- V1[2,2]/mv1
p V4 <- data.frame(X=c(0,cosX), Y=c(0,cosY))
# create and draw the eigenvector (transpose first) p <- p + geom_segment(aes(x = V4[1,1], y = V4[1,2], xend = V4[2,1], yend
tV1 <- t(V1) =V4[2,2] ), col="red", linetype="dashed", arrow = arrow(length = unit(0.5,
eV1 <- cbind(eigen(tV1)$values, eigen(tV1)$vectors) "cm")))
ev2 <- eigen(V1)$values p
300 B.C.
Alexandria, Ptolemaic
Egypt
Non-Euclidian Geometry
~ 1800 Gauss
Extended
~ 1900 Hilbert
kernel functions
Einstein
Many of the underlying principles that form the basis of statistical learning theory are still based
on Euclids Axioms (we just extend them into infinite dimensions using the work of Gauss and
Hilbert (and many, many others)