Machine learning and content management

bit.ly/ml-palic

About me

Valery "valthebald" Lourie

Drupal Developer, FFW Agency

Agenda

What we will discuss

  • What is machine learning
  • Supervised learning
  • Linear regression
  • Online learning
  • References

Agenda

What we will not discuss

  • Use of commercial APIs
  • Auto tagging
  • Neural networks
  • Non-supervised learning
  • When robots will take over the world
Field of study that gives computers the ability to learn without being explicitly programmed
Arthur Samuel
A computer program is said to learn from experience E with respect to some class of tasks T and performance measure P, if its performance at tasks in T, as measured by P, improves with experience E.
Tom Mitchell
Umami

Supervised learning

We know "correct" values

  • Regression
  • Classification

Unsupervised learning

Unlabeled data

  • Clustering
  • Dimension reduction
  • Anomaly detection

Learning flow

  • Initialize hypothesis function parameters
  • Learn using training set
  • Validate using validation set

E - experimental data

Features

  • Position in the list
  • Visitor demographics (country/language)
  • Time (weekday/hour)
  • Previous history of actions
  • Current page data (taxonomy)
  • Update date of the content

Labels

  • Time on page
  • CTR
  • Probability of purchase
  • Average purchase amount

Linear regression

Experimental data

CTR dependent on position
Taken from https://www.wordstream.com/adwords-click-through-rate

Linear regression

Hypothesis

\[ h = \mathbf{θ}_0 + \mathbf{θ}_1 \times X\] CTR dependent on position
  • \[Light green: \mathbf{θ}_0 = 0; \mathbf{θ}_1 = 0.1; \]
  • \[Light blue: \mathbf{θ}_0 = -0.3; \mathbf{θ}_1 = 0.1; \]
  • \[Dark blue: \mathbf{θ}_0 = 9; \mathbf{θ}_1 = -0.05; \]

Linear regression

Hypothesis

\[ h = \mathbf{θ}_0 + \mathbf{θ}_1 \times X\]

Cost function

\[ J(θ) = \frac1{2m} \sum_{i=1}^m \left( h(X^{(i)}) - y^{(i)} \right)^2 \]

Gradient descent


Gradient descent

By Indeed123 - commons, CC BY-SA 3.0, https://commons.wikimedia.org/w/index.php?curid=5508870

Gradient descent

Formal definition

\[ \mathbf{θ}_j = \mathbf{θ}_j - α \frac{\partial}{\partial\mathbf{θ}_j} J(θ) \]
Repeat until convergence

Gradient descent with one feature

Derivatives

\[ \frac{\partial}{\partial\mathbf{θ}_0} J(θ) = \frac1m \sum_{i=1}^m \left( h(X^{(i)}) - y^{(i)} \right) \] \[ \frac{\partial}{\partial\mathbf{θ}_1} J(θ) = \frac1m \sum_{i=1}^m \left( h(X^{(i)}) - y^{(i)} \right) \times X^{(i)}\]

Linear regression

Experimental data

CTR dependent on position
You can't bring the cost to 0!

More features

\[h = \mathbf{θ}_0 + \mathbf{θ}_1 \times {X}_1 \]
\[h = \mathbf{θ}_0 + \mathbf{θ}_1 \times {X}_1 + \mathbf{θ}_2 \times {X}_2\]
\[h = \mathbf{θ}_0 + \mathbf{θ}_1 \times {X}_1 + ... \mathbf{θ}_N \times {X}_N\]
Let's agree that \[ {X}_0 = 0 \]
\[h = \mathbf{θ}_0 \times {X}_0 + \mathbf{θ}_1 \times {X}_1 + \mathbf{θ}_2 \times {X}_2 + ... \mathbf{θ}_N \times {X}_N\]
\[h = \mathbf{θ} \times {X}\]

Gradient descent multiple features

\[ \frac{\partial}{\partial\mathbf{θ}_0} J(θ) = \frac1m \sum_{i=1}^m \left( h(X^{(i)}) - y^{(i)} \right) \] \[ \frac{\partial}{\partial\mathbf{θ}_1} J(θ) = \frac1m \sum_{i=1}^m \left( h(X^{(i)}) - y^{(i)} \right) \times \mathbf{X}_1^{(i)}\] \[ \frac{\partial}{\partial\mathbf{θ}_2} J(θ) = \frac1m \sum_{i=1}^m \left( h(X^{(i)}) - y^{(i)} \right) \times \mathbf{X}_2^{(i)}\] ... \[ \frac{\partial}{\partial\mathbf{θ}_N} J(θ) = \frac1m \sum_{i=1}^m \left( h(X^{(i)}) - y^{(i)} \right) \times \mathbf{X}_N^{(i)}\]

Less features!

Validating the model

  • Training set: 60%
  • Cross validation set: 20%
  • Test set: 20%

Working with large data set

Stochastic gradient descent

Experiment
\[ θ = θ - α \times \frac1{2m} \sum_{i=1}^m \left( h(X^{(i)}) - y^{(i)} \right) \times X \]

Working with large data set

Stochastic gradient descent

Experiment
  • Shuffle training set
  • Perform gradient descent for single example
  • Choose low α!

Online learning

  • Process training examples as they go
  • Adjust parameters θ after every example
  • Throw examples immediately
  • Bonus: adjust to changing users preferences

Decision making

  • Visitor demographics (country/language)
  • Time (weekday/hour)
  • Previous history of actions
  • Current page data (taxonomy)
  • Update date of the content

Q & A