20131212 - Sydney - Garvan Institute - Human Genetics and Big Data

Human Genetics & Big Data
Human Genetics & Ethics
Today we talk about
technology and methodology

Me, Us
• Allen Day, Principal Data Scientist, MapR
Human Genetics PhD, UCLA School of Medicine
6 years Hadoop, 10 years R (Genetics/Biostatistics)

• MapR
Distributes open source components for Hadoop
Adds major technology for performance, HA, industry standard
API’s

• See Also
– @allenday @mapR
– http://slideshare.net/allenday
– “allenday” most places (twitter, github, maprtech.com, etc.)

What Does Machine Learning Look
Like?

What Does Machine Learning Look
Like Under the Covers?
é
T
é A A ù é A A ù=ê
2 û ë
1
2 û
ë 1
ê
ë
é
=ê
ê
ë
é r ù é
ê 1 ú=ê
ê r2 ú ê
ë
û ë

O(κ k d + k3 d) = O(k2 d log n + k3 d) for small k,
high quality
O(κ d log k) or O(d log κ log k) for larger k,
looser quality
Here’s how to keep it simple yet powerful…

T ù
A1 úé
A1
AT úë
2 û

A2 ù
û

ù
T
T
A1 A1 A1 A 2 ú
AT A1 AT A 2 ú
2
2
û

ù
T
T
A1 A1 A1 A 2 úé h1
ê
T
T
úê h 2
A 2 A1 A 2 A 2 ûë
é
é T
ùê h1
T
r1 = ê A1 A1 A1 A 2 ú
ë
ûê h 2
ë

ù
ú
ú
û
ù
ú
ú
û

Behavior of a
crowd helps us
understand what
individuals will do

HOW RECOMMENDATIONS WORK

Recommendations
Alice

Charles

Alice got an apple and a
puppy

Charles got a bicycle

Recommendations
Alice

Bob

Charles

Alice got an apple and a
puppy

Bob got an apple

Charles got a bicycle

Recommendations
Alice

Bob

Charles

?

What else would Bob like?

Recommendations
Alice

Bob

Charles

A puppy, of course!

Recommendations
Alice
What if everybody gets a
pony?
Bob

Charles

?

Now what does Bob want?

Log Files
Alice
Charles
Charles
Alice

Alice
Bob
Bob

Log Files
u1

t1

u2

t2

u2

t3

u1

t4

u1

t3

u3

t3

u3

t1

Log Files and Dimensions
u1

t1

u2

t2

u2

t3

Things
t1

u1

t4
t2

u1

t3
t3

u3

t3

t4
u3

t1

Users
u1 Alice
u2 Charles
u3 Bob

History Matrix

Alice

✔

Bob

✔

Charles

✔

✔
✔
✔

✔

Co-occurrence Matrix

1

1
2

2

1
1

1

1

Indicator Matrix

✔
id: t4
title: puppy
desc: The sweetest little puppy ever.
keywords: puppy, dog, pet
indicators:

(t1)

Problems with Raw Co-occurrence
• Very popular items co-occur with everything
– Welcome document
– Elevator music
– Everybody wants a pony

• That isn’t interesting
– We want anomalous co-occurrence

Recommendation Basics
• Co-occurrence
t3

not t3

t1

2

1

not t1

1

1

Co-occurrence Matrix
not
1
not

1

Spot the Anomaly
A

not A

B

13

1000

not B

1000

100,000

A

not A

B

1

0

not B

0

10,000

0.90
4.52

A

not A

B

1

0

not B

0

2

A

not A

B

10

0

not B

0

100,000

1.95
14.3

• LLR (log likelihood ratio) is roughly like standard
deviations

Genes => Traits => Behaviors => Fitness

Typical Dimensions
in Genetics/Medicine
•
•
•
•

Genotype
Gene Expression
Samples
Phenotypes

Incidence/Co-occurrence
• Genotype * Phenotype
• Genotype * Genotype (sample similarity)
• Sample * Sample (gene expression similarity)
– Known genes => Sample annotation
– Expression Level * Expression Level (sample similarity)
– Known samples => Gene annotation

• Gene expression * Phenotype
– Etiological subtypes & re-diagnosis

• Phenotype * Phenotype
– (expression distance OR genotype distance) Etiological reclassification

DTRA102-007 – Forensic DNA
Analysis Kit for Genetic Intelligence
•
•
•
•
•
•
•
•

Sex
Blood type
Ancestry
Hair morphology
Dimples
Freckles
Shoe size
Flat-footedness

•
•
•
•
•

Vision correction
Ear lobe attachment
Ear lobe crease
5th digit clinodactyly
Eye color, hair color, skin
color
• Height, handedness
• Etc

https://sbirsource.com/grantiq#/topics/85383

Genotype and Phenotypes & GWAS
DTRA102-007: chr7 Earlobe Morphology

SNPs and SNPs
HapMap: Genotype call / spatial ordering

This is the essence of the HapMap Project

Samples and Samples
Label sex based on expression
●
●

●
●
●
●● ●
●● ● ● ●
●
●●
●
● ●
●
●●
●●
●
● ●
● ● ● ● ● ● ●●
●●
●
● ●● ●● ● ●● ● ● ●●
●
● ●● ●
● ● ●●
●
●
●●
●
●●
●
● ●●
● ●
●
●
●
●
●
● ●● ●
● ● ●● ● ● ●●●● ●● ●● ● ● ●
●
●● ●
● ● ● ● ● ●●● ● ● ●●
●● ●
● ●
●● ● ●●● ●● ●● ● ●●
●
● ● ● ● ●●● ●● ● ●
● ●
●
●●
●
●
● ●● ● ● ●●●●●● ● ●●
● ● ●●●●●
●
●● ● ●●●●●● ●●●●●● ● ● ● ●● ●
● ● ●
●
●
●
●
● ● ● ●●● ●●●●●●●●●● ● ● ● ●
● ●
● ● ● ●
●
● ●
●●
●●● ● ● ● ●
●
● ●●●● ●●●● ●● ● ● ●
● ●
● ● ● ●●
●
●
●● ● ● ● ●●●●●●●●●● ● ●● ●●● ● ● ● ● ●
●
●●
●
● ● ● ●● ● ●●● ●●● ● ● ●●●
●●
● ●● ●●● ● ● ●● ● ●●●
●
●● ● ● ● ● ●
● ●● ● ● ●● ● ●
●
● ● ● ●●●●●●●● ●● ●●●●●
●
●●●●●●●●●●●●●●●●●●●●● ● ●
●
●
● ●●
● ●●
● ● ●
●
●
●●●● ●●● ● ●
●●●●●●●●●●●●●● ● ●● ●
●
●
●
●●
●
● ●● ●● ●●● ●●● ● ● ●
●
●●
●
●
●
● ●●● ●●●●● ●●●●●●●●●●●●●●
● ● ● ●●
●
● ● ● ●● ●●●● ●
●
● ● ●●●●●●●●●●●●●●●●●● ●●●
● ●●● ●● ●● ●● ●● ● ●
●
● ●●●● ●● ● ● ●
● ● ●● ● ● ●● ●
●
●
●
●●●● ● ●● ●
●●
●●
● ● ●● ●● ●
●
● ● ●● ●●●●● ●● ●● ●●●●● ● ●
● ●● ●●
●
● ● ● ●●●●●●●●● ●●● ●●●●●●●●●●● ● ●
● ● ●
●
● ● ● ●●●● ●●● ●●● ● ●● ●● ● ●
●
●
● ●
● ● ● ● ●● ● ● ●● ●
●● ● ● ● ● ●● ●
●
● ●●●●●●● ●● ● ●●● ●●●●
●
●
●
●
● ●●●● ●●●●●●●●●●● ●●●●● ●●● ● ●●●●●
●● ●
●●●● ●●● ● ●●●● ●
●●
●
●●●●● ● ● ●● ●●
● ●
●●● ●
●
●
● ●●
●● ● ●● ● ● ●● ●● ●● ● ●
●
● ●
● ● ●● ● ●●
●
●
● ●
●●
●
●● ●● ● ● ●●●●● ●●●●●● ● ● ●●● ●●● ●●● ● ●
●
●
●
● ●
● ● ●● ●●
● ● ●● ● ● ●
●● ●● ● ●●●●●●●●●● ●●● ●●●●● ●● ● ● ●
● ● ●● ●●●●●● ● ●
●● ●●●● ●● ●●● ● ● ●● ● ● ● ●
● ●
● ● ●●●
●●
●
● ● ● ● ●● ● ●
● ●● ●● ●●●● ●●●●● ● ●●● ●● ● ●
● ●
●
●
● ●● ● ● ●● ●
●
● ●●●●●●●●●●●●●●●●●●● ●●●● ●●● ●● ●
●
●●● ● ●●●●●●●●●●●● ●● ●
●
●
● ●● ● ●
● ● ● ●●●● ●● ● ●
● ● ●●● ●●●●● ● ● ●
● ● ●● ● ● ●●●●●●●●●●●●●●●● ●●●●
●
● ● ● ● ●●●●●●●●●●●● ● ●
●
● ● ●●●● ●●●●●●●●●●● ●● ●●
●
● ●
●●●●● ●●●● ●● ●● ● ● ●
● ● ●● ● ●●●●●●●●●●●●●●●●●●●●●● ● ● ● ●
●● ●● ●●●● ●● ● ●●●
●●● ● ●
●● ●
●
●●● ●● ● ●● ●●● ● ●● ● ●
●
● ●●● ● ●●● ●●● ● ● ●●
●● ●●●●● ●●●●●●●●●● ●● ●● ●
●
●
●●●● ● ● ●●● ●● ● ● ●
●
●
●
● ●●●●●●●●●● ● ●●● ●●● ● ●
●●
●
● ● ● ● ●
●
●● ●● ●●●●●●●●●●●●●●● ● ● ● ●
● ●
●● ●●●●●●●●●●●● ●●
● ● ● ●●● ● ●● ●
●●
● ● ●●● ● ●● ●●●●●●●●●●●●●●● ●●●● ● ● ● ● ● ● ● ●
●
●
● ● ●●●●●●●●●●●●●●●●●●●●●●●●●
●●● ●●●●● ●
● ● ●
●●● ●●● ●●●●●● ●● ●
●
● ●● ●● ●●●●●●● ● ● ●
●
●● ●
● ●● ● ●●● ●●
● ● ● ●● ● ●●●●● ● ●● ●● ● ● ●
●
● ●● ●
● ●
● ● ●●● ● ●●●●●●●●● ● ● ● ●● ●●
●● ●●●●●●● ●●●●●●●●●●● ●●●● ●●●●● ●● ● ●
●●● ● ●●● ● ●● ● ●●
● ●
●
●● ●●● ● ●● ● ●
●
● ●
●
●
●
● ● ●●●●●● ●●●●●●●● ●● ●● ●●● ●● ●● ● ●
● ●● ● ●● ● ● ●
●
● ● ● ●● ● ●●●●●●● ● ●● ● ●
● ● ●●●●●●●●●●●●●● ●● ● ●
●● ●
● ● ●● ●●●●●●● ●●●●● ●
●
●
●
●
●
●
●●● ● ● ●
● ●
● ●●● ●● ●●● ●● ●
●●● ●●●●●● ●●●●●● ● ●
●
●
●
●
● ● ●●●● ●●●●●●●●●●●● ● ●● ●●●● ●
●
●
●
● ●● ●●●●● ● ●● ●●●● ●
● ●●● ● ●●●●● ●● ●●● ● ● ● ●
● ●●●●●●●●●●●●●●●●● ●● ● ● ●● ● ● ●
●●●● ● ●
● ●●●●●●● ●●●● ● ●
●
●●
●
●
●●
●
● ● ● ●
● ●● ●
● ●●●●● ●● ●●● ●● ●● ● ●●● ●● ●
●
● ●●●●●●●●●●●●●●●●●●●● ●● ●●● ● ●
●●●● ●● ● ●● ●●●●● ● ● ● ●
●●
●●
●●
●
●
●
●
●
● ●
●
● ● ●● ● ●● ● ●●●●●● ● ●●
●●
●● ● ●●● ●● ● ●
● ● ● ●●●● ● ●●● ● ●●●●●● ●● ● ● ●● ● ●
●
●
●
●●● ● ●
●● ●
●
●●
●●●● ●●●●● ●●● ● ● ● ●
● ●● ● ● ● ● ●
●
●
●
● ● ●● ● ●● ● ●
●
●
● ● ●
● ● ● ●● ●●●●● ●● ● ● ● ● ●● ● ● ● ● ● ●●● ●
●
●● ●● ● ● ● ● ● ●
●● ● ●●● ● ●
●●
● ●
●
●
●●
●
●
● ● ●● ●●●●●●●● ● ●●●● ●●●●●●● ● ● ● ● ●●
●● ● ●● ●●●●●● ●
●
●
●
●
● ●● ●● ●● ●●● ● ● ●
●●
●●
● ●● ● ●●●● ●●
●●● ●● ●●● ●●●●●●● ●● ●● ● ● ● ●●
●
●
●
● ●
● ● ●● ●● ●
● ●
●
●
●
● ●● ●● ● ● ●●
●
● ●●● ● ● ● ● ● ●
●
●
●
● ● ● ●●●●● ●● ●●●●●● ● ● ● ●
●
●
●● ●
●
● ●● ●● ●● ●●● ●
●
●●
● ●●●●●●●● ●●●● ●●●●● ●
● ● ● ●●●●●●●● ●●●●●●●●●●●● ● ● ● ●●
●●
●● ● ●
●
●● ● ● ●●● ●
●
●
● ●
●●●● ● ●●●
● ● ●
●
● ●
●
●● ● ● ●● ●● ● ●● ● ●●● ●● ●● ● ● ● ● ● ●
●
●
●●
●
● ●
●●● ● ● ● ● ● ● ●
● ●
● ●● ● ●●●●●● ● ● ● ● ● ●
●
●
●
●
●
● ●
● ●
●●● ●●● ●●●●●● ●●●● ●●●●● ● ●
●
●
● ● ●
●
●
●●
● ●
● ●● ● ● ● ● ●● ● ● ●
●
●
●
●
● ●
● ● ● ● ● ●● ● ● ● ●
● ●● ● ●●●● ●●●●● ●● ●● ● ●● ● ●
●● ● ●
●
● ●
●
●
●
● ● ●● ●● ● ●
● ● ● ●● ●●●
● ● ●● ●●●●● ● ●●●● ●● ● ●
●
●
●
● ● ● ●● ●
● ● ●●●● ● ●
●
● ●
● ●
● ●
●
●●
●
●●
● ●
● ●● ● ● ●●
●
● ●
●
●
●
● ● ● ● ●●● ●●
●
●
●
●
● ●●● ● ●●●●●●●●● ●● ● ● ● ● ● ● ● ●
● ●
● ●
● ● ●●● ● ● ●●●●●●●●●● ●● ●
●● ●
●
●
● ● ●● ●
●
●● ● ● ●●●● ●● ● ●
●
●● ● ● ●
●
●
●
●
●
●
●
●
●
●
● ●
● ●● ● ● ● ● ● ● ●●
●
●
●●
●●
● ● ● ● ● ●●●●●●● ● ● ● ● ●●●●●
●
●●
●
●● ● ●
● ● ●● ● ●
●●
● ● ● ●●● ●● ●● ● ●● ● ●●
●
●
●
●
●
●
● ●
●
● ● ●
●
●
● ●●●● ● ●
●
●
● ●● ● ● ● ● ● ● ● ●
● ● ● ●● ●● ● ●●●●● ● ● ●● ● ●
● ● ●● ● ●
●
●
● ●●● ● ● ●●
●
●
●
● ●
●
●
● ●● ●
●
●
●
●● ●
●●● ● ● ● ● ●
●
●
●
● ●
●●
● ●
●●
●
● ●● ● ● ● ● ● ●
●
●●
●●
● ● ●● ●
●●
●
● ● ● ● ●●● ●
●
●●
●
● ●● ●
● ●
●
●
● ● ●●
● ●● ● ●●
●● ● ● ●
●
● ● ● ● ●
● ●
●● ●
●
●● ●
●
●
●● ● ●●● ●
● ●
●●
●
●
●
●●
●
●
●
●
●
● ● ●
● ●
●● ● ●
●
●
●
●
● ● ●
●
●●
● ●
●
● ●
●
●
●
● ● ●● ● ● ● ● ● ● ●
●
●
● ●
● ●
●● ●
● ● ● ●●●● ● ●● ●●●
●
●
● ●● ●
●
●
●
●
●●
●
● ● ●
● ●● ● ●● ● ● ● ● ●
●● ●
●
●
●
●●
●
●
●
●
●
●
● ●●
● ● ● ●●
●
●●
●
● ●●
●
●
●● ●
● ●
●
●
● ●● ●
●
●
●
●
● ●
●
● ●●
●
●
●
●
● ●
●
●
●
●
● ●
●●
●●
●●
●
● ●
● ●●
●
●●●
●
●
●
●
●
●
●
●
●
●
● ● ●● ● ●
● ●●
●
●
●● ●
●
●● ● ● ●
●● ●
●
● ● ●
●
●
● ● ● ●●
●
●
●
●
● ●
●
●
●
●●
●
●
●●
●
●
● ● ● ● ● ●●
●
●
●
●
●
●
● ● ●
●
●
●
●
●
●
●
●
●● ●
● ●●
●
●●
●
●
●
●
●
●
●
●●
●
●
●● ● ●
●
●
●
●
●
●
●
●
●
●
●
●
● ●●
●● ●
● ●
●
●
●
● ●
●
● ● ● ● ●●
●
● ●
●
●
● ●
●
● ●●
●
●
●●
●
● ● ●
● ●
●
●
●●
●
●
●
●
●
●●
● ●● ● ●
●
● ●●
●●
● ●
● ●
●
●● ● ●
●
●
● ● ●
● ●
●
●
● ●● ●
● ●
●
●
●
●●
●
●
●
●●
●●
●● ● ● ●
●
●●
●
● ● ●●
● ●●●● ● ●
●● ●
●
●
●
●
● ●●
●
●
●●● ●
●●
●
●
●●
● ● ●
● ● ●● ● ● ● ● ● ●
●
● ●
● ●
●
● ●
●
●
●
● ●
● ● ●
●
●●
●
● ●
● ● ● ●●●
● ●
●●
● ●●● ● ●● ●● ● ● ● ● ●● ● ●● ●● ● ●●●●
● ●●
●
● ●●
●
● ●
● ●
●
●
●
●
● ●● ● ● ● ● ● ● ● ●
●
● ●
●●●
●●●
● ●●
● ● ●
●
● ● ● ●● ●●● ● ●● ●● ●● ●●●● ● ● ● ● ●● ● ● ● ● ● ● ●
●● ●
● ●
●
●
●
●
●
●
●
●●
●
●
●● ● ● ● ●
● ● ●
●
●
● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●●● ● ● ●●● ● ● ● ●● ●● ● ● ● ●●●● ●● ● ● ● ● ● ●
●
●
●
●
● ●
● ● ● ●●
●
●
● ●
●●
●
●
● ● ●
● ● ● ● ●● ●
●
●● ●
● ● ● ● ●●●●●● ● ●
●
● ●● ●
●
● ●● ● ● ● ●
●
●
● ● ●● ●
●
● ●● ● ● ●
●
● ●●
●
●
●●
● ● ● ● ● ●●●●● ● ●● ●● ● ● ●●● ● ● ●● ●●● ● ●●
●
● ● ● ●
● ●● ●●● ● ● ● ●● ●● ●●
●
● ● ●● ● ● ●
● ● ●
● ●●
● ●● ● ●● ● ● ● ● ●●● ● ●● ●●●●● ●● ● ● ● ● ●●● ● ●● ●● ● ● ● ● ●●●● ● ● ●
● ●
●
●
●● ● ●● ●
● ●
●
● ●
● ●
●
●●● ●●●●● ● ● ●
●
●
●● ●
●
●●
● ● ● ●●
●● ● ●
● ● ● ●●●
●
● ● ● ●● ●●● ● ● ●● ● ● ● ● ● ●●●
●
●
●
● ● ●● ● ●
● ● ●● ●●●
●● ● ●●●●●
●
●
● ● ● ● ● ● ● ● ●● ● ● ●● ● ●●
●
●
●
● ● ●● ● ● ● ●● ● ● ●●
●
●
●
● ●●
●● ●
●
● ●
● ● ●
● ● ●● ●● ●●●●● ●● ●●● ● ● ● ● ● ● ● ●● ● ● ● ● ●● ● ●● ● ●
● ● ● ●● ● ●● ●● ●●● ●● ●● ● ● ●● ● ● ● ●● ●
● ● ● ●● ● ● ●●
● ●●
●
● ● ● ●● ● ● ●
●●
● ●
● ● ● ● ● ● ● ● ● ● ●●
●
● ●● ●●
●●
●●● ● ● ●● ● ●
● ●
●
●
● ● ●● ●● ●● ●● ●●
● ●
●
●
● ●● ● ● ●●● ●●● ● ●●●●●●● ● ●● ● ●●●●●● ●● ● ● ●● ● ●●● ● ● ● ● ● ●●●●●● ●●
● ●
● ●●
●
●
● ● ● ●
●● ● ●●
●
●● ● ● ●●●●● ● ●●●●● ● ●● ●
●
●
●
● ●
●●● ●
●
●
●
●
●● ●●
● ● ●
●
●●● ● ●●●●● ● ●
● ● ●● ●● ● ●
● ● ● ●●● ● ● ●●● ● ● ● ● ● ● ●● ● ●● ●●● ● ●● ● ● ●●● ● ● ● ● ● ● ● ● ● ● ●●● ● ●● ● ● ● ●● ●● ●● ● ● ●●● ●● ● ● ● ● ● ●
●
● ● ● ● ●● ● ● ● ●
● ● ● ●●● ● ●● ●
●● ● ● ● ●●● ●●●
● ●●
●
● ●● ● ●
● ● ● ●● ●
●
● ● ●●●● ●●● ●● ●●●●
● ● ●● ● ●
●● ●
●
●
● ● ●
●
●●
● ●
● ●●
● ●● ● ●●●● ● ● ● ● ● ●● ●● ●●●●● ●●●●●●● ●● ● ● ● ●●● ● ●● ●●●● ● ●● ● ●●● ●● ● ● ● ● ● ● ●
●
● ● ●● ●
● ●
●● ● ● ●●● ●
● ● ● ● ● ●● ● ● ●
● ●
●
● ●● ● ●●●●●●●●●●●●●●● ●● ● ● ● ●●●●●● ●● ● ●●●
● ● ● ● ●● ●● ● ● ●
●●
●
●
●
●● ●● ● ●●● ●● ●●● ●● ● ● ● ●
●
●
●
● ● ●
● ● ●
● ●
● ●
●
● ●●
●
● ●
● ● ●● ● ● ●● ●
●●
●
● ●
●● ● ● ●
●
●● ● ●● ●●●●●●●●●●●●●● ●
●● ● ● ●● ●● ● ●● ● ● ●●● ●● ●●●● ● ● ● ● ● ● ● ● ● ● ● ●●● ●● ●●●● ●● ● ●●● ● ●●●●● ●● ●● ●● ● ●●● ● ● ● ● ● ●●
● ● ●●● ●
●● ● ● ● ●
●
●
●
● ● ● ●
● ● ● ●● ● ● ● ● ● ● ● ● ● ●● ● ● ● ●● ●●
● ●● ●● ●●
● ● ●
●
●
●
●●
●●● ●●●●●● ● ● ●
●
● ●● ●● ● ● ●● ● ● ● ●●● ● ● ● ● ●● ● ●●●● ● ●●● ● ●● ● ●●●● ● ● ●●● ● ● ●● ●●●● ● ● ● ●●●●●● ●● ●●● ● ●●●●● ●● ● ●●● ●● ●●● ●●●●● ● ●● ●● ●● ●● ● ● ●●
● ● ●
● ● ● ● ●● ●
●●
●
●
● ● ● ● ● ● ● ●●
●●●● ●●●●● ● ●● ●● ●
●● ● ●
● ● ● ●
● ●●●●●● ●●●●●●●●● ●●●● ●●●
● ●●●●● ●●●●●●●● ●●●
●●● ●
● ●● ● ●● ● ● ● ●
● ●
●
●
●
●
● ●●
●
● ● ●
●
●●● ●● ● ● ● ● ●● ●● ● ● ●●●●●●● ● ●●● ● ● ●●●●●●●●● ● ● ● ●●● ●●●● ●● ● ● ● ●●●● ●● ●●●●●● ● ●● ●● ● ● ● ● ● ● ● ●
●
● ●●
●
●
● ●
●
●
●● ●● ● ● ●
● ● ● ● ● ● ● ● ● ●● ●●
●
●
●● ●● ● ●
●
●
●
● ●● ● ●
●
● ●● ● ●● ●●● ●●● ●●●●
●
●● ● ● ● ●●●●●●●● ●●●●●●● ●●● ●● ●●● ● ● ●●
● ● ● ● ●● ● ●●● ● ● ● ● ● ● ●● ● ● ● ●
●● ● ●
● ●●● ● ●● ●●●●●● ●●●●
● ●
● ● ● ●●
●
●● ● ●●
●●●●● ●●● ● ●
●
●
●● ●● ● ●●● ● ● ● ● ● ●● ● ●●● ●●●● ● ● ●● ● ● ● ● ●● ● ● ●● ●●●●●
● ●●
●
● ●●● ● ● ●●● ● ●●● ●● ● ●● ●● ● ●●●●●●● ● ●●●●●●● ● ● ●●● ●●●●●●●● ●●●●● ●●●●●●●●●●● ●●●●●●● ● ● ●●● ● ●● ● ●●
● ●
●
● ● ● ● ●● ● ●
● ●
● ● ●
● ● ● ●● ●
●
● ● ●● ●●●● ●●● ●
● ●
●
●
●
●●
● ● ●
●
● ● ● ● ●● ● ● ● ● ●●●● ● ●● ●●● ● ● ● ● ● ●● ● ● ●●● ●●● ● ● ●● ● ● ●●● ● ●● ●● ● ● ●●● ●●●● ●
●
●
● ●● ● ● ●
● ● ● ●●● ●● ● ● ●● ● ●
●
●
●
●
●●
● ●● ●● ● ●● ● ● ●● ● ●●● ● ● ● ●
● ●
●
● ●
● ●● ● ● ● ●● ●
●●
● ●●●● ●●●●● ●●●●● ●●● ● ●
●●
● ●
●
● ● ● ●●● ●●●●●●●● ●●●●●● ● ●
●
●
●●
●
●
● ● ●● ● ● ● ● ●● ●● ●● ● ●
● ●● ●
●
● ●
● ● ● ●
●● ● ● ● ●●● ●●●●●●●●●●●●●●● ●● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●●● ● ● ●● ● ●●●● ●● ● ● ● ● ● ●●●●●●● ● ● ● ●●●●●● ● ●●● ●● ● ●●●●●●●● ● ● ●●●●●●●● ● ●● ● ●●●● ●●● ● ● ● ● ●
●
● ● ● ●● ● ●
●
● ●
● ●
● ● ●
● ● ●● ● ●
●
●
● ●● ●● ● ●●●●● ●
●
● ●● ● ●●
●
●
●● ●●● ●● ●●
● ● ● ●●● ●
●
●
●
●● ●● ●● ●● ● ● ● ● ● ●● ● ● ● ● ●●●● ●● ● ● ●
● ●●● ●● ● ● ●● ●● ● ●● ●● ● ● ●●● ●● ● ● ● ● ● ●● ● ●●● ●●● ● ● ●● ●●●● ● ● ●●● ●●●●● ● ● ●
●●
●
●● ●
● ● ●●
●
● ● ●● ●●● ● ●●
● ● ● ●●●● ● ●●● ●●●● ●● ● ● ● ● ●
● ● ●●● ●●●●● ●● ●
●● ● ● ●
●● ● ●
●
●
●
● ● ● ●● ●
●
●
● ●●
●
●
● ●● ●
●
●
● ● ●
●
● ● ●●●●●● ●●●●● ● ● ● ● ● ● ●● ● ● ●
● ● ●●●●● ●●●●● ●●
●
● ● ●● ●● ● ●
●
●
● ●● ● ● ●●● ● ●●● ● ● ●● ● ●●●●●●● ● ● ●● ●●●● ●● ● ● ● ● ● ●●●●● ●●● ●●●● ● ●●●●●●● ●●● ●● ●●●● ● ● ●●●●● ●
● ●
●
●
●
●
●
●●
●
● ● ● ●●●●● ●● ● ● ● ● ●● ●● ●● ● ●
● ● ● ●● ●
●●
●
● ●
● ●● ●● ● ●● ● ● ● ● ●●
● ●
●
●
●
●●
●●
● ● ● ● ●
●
●
● ● ●●● ●
● ● ● ● ● ● ●● ●●● ● ● ●●● ● ●● ● ●●●●● ● ●●●●● ● ● ● ●● ●●●●●● ●●●● ● ●● ●●●● ● ● ●● ●●●●● ●● ●●● ● ●● ● ●● ●●●●●● ● ● ●●●●●●●●● ●●
● ● ● ● ● ●●●●●●●●●● ● ●● ● ● ●●
● ●
●
● ● ● ● ● ●●● ● ●●● ●
●●
● ● ● ●●● ●
● ● ● ● ● ● ● ●●
●●
●
●
●
●
●
● ● ● ● ●●
●●
●●● ● ● ● ● ● ●● ● ●● ●● ●●●●● ● ●●● ● ●● ● ●●● ●●●●● ● ●● ● ● ●● ● ●● ● ● ●● ● ●
● ●●
●
●
●●
●
●● ●
●
● ● ●● ● ● ●
●
●
● ● ● ●● ●●●●●● ●●●● ● ● ● ● ● ●
●●●
●
●●
● ● ●●● ●●●●● ●● ●● ●● ● ●
●●
●●● ● ● ● ● ● ● ● ● ● ●● ● ● ●●● ● ●● ●●●● ●● ●●
●● ●
●
● ●● ● ● ● ● ●
●
●
● ●● ●●●● ●● ●●
● ●●● ● ● ●●● ●● ● ● ● ● ●●●●● ● ● ● ● ●●● ●●●●
● ● ● ●
● ●
●● ● ● ● ●
●
● ●
●
● ● ● ● ●
●
● ● ●
●
●
●●
● ●● ●
●● ● ●
●
●
●●●
●
●
●
●
●
●
● ● ●
●
●●
● ●●
●● ●● ● ●
●●
● ● ●●● ●
●
●●
●
● ● ●● ●
● ● ● ●
●
● ●
● ● ● ● ●●
●●
● ● ●●●●● ●● ● ● ● ●● ●● ●● ●●●● ● ●●● ● ● ● ●
●
●
● ● ● ●●●● ●
●●
● ●● ● ●
● ●
● ● ● ● ● ●●
●●
●●
● ●
●
●
●
●
●
●
●
● ●
● ● ● ● ●●● ● ●●●●● ● ● ●●● ● ●
● ● ● ● ●● ● ● ● ●
●
●
● ●●
●● ●
● ●
●
●
●
● ●●
● ● ● ● ●● ● ●
●
●
●
● ●● ●●
●●
●
●
●● ● ● ● ● ● ●
●●
● ●
● ●
●●
●
●
●
●
● ●● ●
● ● ●● ● ● ●●
●
●
● ● ●●
●
●
● ●●●
●
● ●
●
●
●● ● ●●● ●
●●
● ●
●
●
●
●
●
●● ●● ●● ● ● ● ● ● ●● ● ● ●
●
●
●
●
● ●
● ●●
●●
●
● ●
●
● ●
●
●
● ●● ●
●
●
●
●
●
●
●
●
●●
●
● ● ●
●
●●
●
●
●
●
●

●

●

3.0
2.5
1.5

2.0

RPS4Y1 log(RMA)

3.5

●

●

1.5

2.0

2.5

3.0

XIST log10(RMA)

Celsius: a community resource for Affymetrix microarray data.
http://www.ncbi.nlm.nih.gov/pubmed/17570842

3.5

FZD10

SLC28A3

HSPC159

BDKRB1

HAS2

XYLT1

RNF24

RNF24

SOD2

RELB

RLF

NUPL1

EIF2C2

FOSL1

RELA

ETNK1

MMP12

AKR1C1

TNMD

CYTL1

SOX5

MIA

CHST3

PDLIM4

PDPN

WISP1

C1QTNF3

THBS3

COL10A1

COL10A1

COL11A1

COL11A1

EPYC

MATN3

MAST4

NGF

EDIL3

ITGA10

HAPLN1

HAPLN1

MATN4

LECT1

MATN1

COL9A1

COL11A2

COL11A2

ACAN

ACAN

ACAN

CSPG4

MMP13

NOS2A

LIF

MMP3

BMP2

BMP6

Expression and Expression (10K+ samples)
Gene Annotation (co-expression)
SLC28A3
HSPC159
BDKRB1
HAS2
XYLT1
RNF24
RNF24
SOD2
RELB
RLF
NUPL1
EIF2C2
FOSL1
RELA
ETNK1
MMP12
AKR1C1
TNMD
CYTL1
SOX5
MIA
CHST3
PDLIM4
PDPN
FZD10
WISP1
C1QTNF3
THBS3
COL10A1
COL10A1
COL11A1
COL11A1
EPYC
MATN3
MAST4
NGF
EDIL3
ITGA10
HAPLN1
HAPLN1
MATN4
ACAN
ACAN
ACAN
LECT1
MATN1
COL9A1
COL11A2
COL11A2
CSPG4
MMP13
NOS2A
LIF
MMP3
BMP2
BMP6

Disease gene characterization through large-scale co-expression analysis.

FZD10

SLC28A3

BDKRB1

HSPC159

HAS2

RNF24

XYLT1

RNF24

RELB

SOD2

RLF

EIF2C2

NUPL1

FOSL1

ETNK1

RELA

MMP12

TNMD

AKR1C1

CYTL1

MIA

SOX5

CHST3

PDPN

PDLIM4

WISP1

THBS3

C1QTNF3

COL10A1

COL11A1

COL10A1

COL11A1

MATN3

EPYC

MAST4

EDIL3

NGF

ITGA10

HAPLN1

HAPLN1

MATN4

MATN1

LECT1

COL11A2

COL9A1

COL11A2

ACAN

ACAN

ACAN

MMP13

CSPG4

NOS2A

MMP3

LIF

BMP2

BMP6

Co-expression (10K samples) and Linkage
Gene Annotation / Set Completion
SLC28A3
HSPC159
BDKRB1
HAS2
XYLT1
RNF24
RNF24
SOD2
RELB
RLF
NUPL1
EIF2C2
FOSL1
RELA
ETNK1
MMP12
AKR1C1
TNMD
CYTL1
SOX5
MIA
CHST3
PDLIM4
PDPN
FZD10
WISP1
C1QTNF3
THBS3
COL10A1
COL10A1
COL11A1
COL11A1
EPYC
MATN3
MAST4
NGF
EDIL3
ITGA10
HAPLN1
HAPLN1
MATN4
ACAN
ACAN
ACAN
LECT1
MATN1
COL9A1
COL11A2
COL11A2
CSPG4
MMP13
NOS2A
LIF
MMP3
BMP2
BMP6

+

=>

Disease gene characterization through large-scale co-expression analysis.

Typical Dimensions
•
•
•
•

Genotype
Gene Expression
Samples
Phenotypes (traits/behavior)

Typical Dimensions
in Behavioral Data
•
•
•
•

Genotype
Gene Expression
Samples Individuals
Phenotype
– Traits
– Behaviors

Traits and Behaviors
Content Topic Modeling / UX Personalization

Behaviors and Outcomes
Economic Fitness (Korn/Ferry)

=>
Allen

Korn/Ferry ProSpective
http://linkedin.kornferry.com

Behavior of a
crowd helps us
understand what
individuals will do

HOW CROSS-RECOMMENDATIONS
WORK

Example Multi-modal Inputs
•
•
•
•

Overlap in restaurant visits is useful
Big spender cues
Cuisine as an indicator
Review text as an indicator

Too Limited
• People do more than one kind of thing
• Different kinds of behaviors give different quality,
quantity and kind of information
• We don’t have to do co-occurrence
• We can do cross-occurrence
• Result is cross-recommendation

For example
• Users enter queries (A)
– (actor = user, item=query)

• Users view videos (B)
– (actor = user, item=video)

• ATA gives query recommendation
– “did you mean to ask for”

• BTB gives video recommendation
– “you might like these videos”

The punch-line
• BTA recommends videos in response to a
query
– (isn’t that a search engine?)
– (not quite, it doesn’t look at content or meta-data)

Real-life example
• Query: “Paco de Lucia”
• Conventional meta-data search results:
– “hombres del paco” times 400
– not much else

• Recommendation based search:
– Flamenco guitar and dancers
– Spanish and classical guitar
– Van Halen doing a classical/flamenco riff

Hypothetical Example
• Want a navigational ontology?
• Just put labels on a web page with traffic
– This gives A = users x label clicks

• Remember viewing history
– This gives B = users x items

• Cross recommend
– B’A = label to item mapping

• After several users click, results are whatever
users think they should be

Previous Click Histories

user1

user2

user3
user4
user5
1

2

3

4

5

6

7

8

Detect similar content: 2 & 8

user1

user2

user3
user4
user5
1

2

3

4

5

6

7

8

Call to Action – Request Clicks

user1

Show me more:

user2

sports
user3

comedy

technology
user4
user5
1

2

3

4

5

6

7

8

“Under
Construction”

Guess Labels:
4=sports ; 2 & 8=comedy
user1

Show me more:

user2

sports

user4
user5
1

2

3

4

5

6

7

8

comedy

2&8

technology

user3

4

Under
construction

Extrapolate

1
3

Show me more:
userX

comedy

2

8

4
2

Matrices A (U*Q) and B (U*V)

Clicked Videos

Users

Query Term = Clicked Term

Users

Query Terms

Query Terms

Join on dimension U…

Users

Query Terms

Relate Q to V

Users

Relate Q to V

Query Terms

Clicked Videos

Medicine
Forensics

Job Performance


Psychometrics
Movie Preferences

(Traits/Behaviors) and Outcomes
Reproductive Fitness (eHarmony)
eHarmony @ Hadoop World: Data Science of Love
http://eharmony.com

(Traits/Behaviors) and Outcomes
Reproductive Fitness (eHarmony)
eHarmony @ Hadoop World: Data Science of Love
http://eharmony.com

= 185cm
Allen

Medicine
Forensics

Job Performance


Psychometrics
Movie Preferences

Fitness
Reproductive Outcomes

20131212 - Sydney - Garvan Institute - Human Genetics and Big Data

More Related Content

20131212 - Sydney - Garvan Institute - Human Genetics and Big Data

Editor's Notes