Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
Download as pdf or txt
Download as pdf or txt
You are on page 1of 3

A convex program for inferring a formula from GCMS

steventeddy
July 8, 2023

1 Notation

Table 1: Symbols
Term Description
[n] shorthand for {1, 2, . . . , n} Pn
∆n probability simplex over n dimensions, ∆n = {p ∈ Rn : 0 ≤ pi ≤ 1, i ∈ [n], i=1 pi = 1}
 elementwise inequality
1n ones vector of length n
0n zeros vector of length n
In identity matrix of dimension n

Table 2: Variables
Term Description
n # of aromachemicals present in GCMS
m # of naturals
y formula from GCMS, a vector in ∆n
x our formula, with naturals, a vector in ∆n+m
A a column stochastic matrix of size n × m representing our materials, each column ai ∈ ∆n

2 Original Problem
Assumptions:
1. If the GCMS of a perfume leaves out some material, then that material is left out of the GCMS’ of all
naturals.
2. We know the exact composition of each natural used in the perfume.
Goal:
1. Approximate the aromachemicals in the GCMS, i.e. the vector y by some convex combination x of the
materials.
The result of the GCMS y ∈ ∆n is the combination of aromachemicals (and possibly) naturals. Denote
the combination by x ∈ ∆n+m . Write
x> = (xac xnat )>

1
where xac represents the amount of direct adds, and xnat is the indirect additions via naturals.
Fix the order of aromachemicals in the GCMS and say linalool is the first aromachemical. This means
that y1 is the amount of linalool in the perfume. Thus, each aromachemical is represented as a canonical
basis vector, ei ∈ [0, 1]n where ei is 1 in the ith position and 0 elsewhere. The contribution of aromachemicals
is In xac . I.e. one could get the GCMS result by directly adding every aromachemical.
Let B ∈ [0, 1]n×m represent the column stochastic matrix of naturals. Let the columns of B be written
as
B = (b1 , b2 , . . . , bm ).
B being column stochastic means bi ∈ ∆n for all i ∈ [m]. Say the first natural is lavender 40/42, and that
linalool/linalyl acetate are the first two aromachemicals in that order. Then,

b11 = 0.4 and b12 = 0.42,

recording the amounts of linalool and linalyl acetate. Bxnat records the contributions to the n aromachem-
icals from the naturals. I.e. take the column combination view of matrix vector multiplication.
If we concatenate the matrices horizontally, we have A ∈ [0, 1]n×(n+m) column stochastic and defined as

A = In B .

The ordinary least squares problem is

min kIn xac + Bxnat − yk22


x∈Rn+m
min kAx − yk22 s.t. 0n  xac  1n
x
or equivalently
s.t. x ∈ ∆n+m 0m  xnat  1m
xac> 1n + xnat> 1m = 1

3 Letting the Compositions of Naturals be Unknown


We want to drop assumption 2 from above. Rather than knowing the exact composition of natural i (say
lavender EO) as bi ∈ ∆n , say we know upper and lower bounds for each of its constituents. Say

b`i  bi  bui .

Of course, the proportions of materials in a natural has to sum to 1, so the possible lavenders given the
bounds are in the set
{b ∈ ∆n : b`i  bi  bui }
One approach is to formulate the new problem where we try to infer the composition of the naturals as

min kxac + Bxnat − yk22


x∈Rn+m
B∈Rn×m

s.t. b`i  bi  bui , ∀i ∈ [m]


0n  bi  1n , ∀i ∈ [m]
b>
i 1n = 1, ∀i ∈ [m]
0n  xac  1n
0m  xnat  1m
xac> 1n + xnat> 1m = 1

Suppose instead that we write the B from above (which can vary) as B ref + P where P is a matrix of
pertubations and is the same size as B ref . B ref can be the compositions of naturals you own. Of course,

2
we require that B ref + P is column stochastic. Letting pi be the ith column of P and putting arrows on
vectors, the term in the objective is
m
X m
X
Bxnat = (B ref + P )xnat = ~bi xi + p~i xi
i=1 i=1

We want to write the perturbation term (the second sum) without reference to x. We will substitute ~νi for
p~i xi and show how to change the constraints.
Writing the constraints for column i, substituting ~bi = ~bref
i + p~i ,

~b`  ~bi  ~bu ⇒ ~b`  ~bref + p~i  ~bu


i i i i i

0n  bi  1n ⇒ 0n  ~bref + p~i  1n
i

b>
i 1n =1 ⇒ (~bref
i + p~i )> 1n = 1

Putting in ~νi /xi = p~i ,

~b`  ~bi  ~bu ⇒ ~b` − ~bref  ~νi  ~bu − ~bref ⇒ xi (~b`i − ~bref νi  xi (~bui − ~bref
i i i i i i i )~ i )
xi
~νi
0 n  bi  1 n ⇒ −~bref
i   1n − ~bref
i ⇒ −xi~bref
i  ~νi  xi (1n − ~bref
i )
xi
 >
b> ⇒ ~bref + ~νi ⇒ ~νi> 1n = 0
i 1n = 1 i 1n = 1
xi

where the last statement is due to ~bi ∈ ∆n by assumption.


An equivalent formulation is
m 2
X
min xac + B ref xnat + ~νi − y
x∈Rn+m
~
νi ,i∈[m] i=1 2

s.t. xi (~b`i− ~bref


i )~νi  xi (~bui − ~bref
i ), ∀i ∈ [m]
ref ref
− xi~bi  ~νi  xi (1n − ~bi ), ∀i ∈ [m]
~νi> 1n = 0, ∀i ∈ [m]
0n  xac  1n
0m  xnat  1m
xac> 1n + xnat> 1m = 1

The program has a quadratic objective and linear constraints.

You might also like