//

# Exercice perceptron

## logical AND

• Observations = 00:0 01:0 10:0 11:1
• Initial weights = 0

• How many epochs is needed for convergence ?
• What are the weights of the trained model ?

## logical XOR

• Observations = 00:0 01:1 10:1 11:0
• Initial weights = 0

• What happens ?

## unnormalized AND

• Observations = 00:0 02:0 20:0 22:1
• Initial weights = 0

• How many epochs is needed for convergence ?
• What are the weights of the trained model ?

# Exercice feature def.

## SPAM or not ?

• We want to detect whether an email is a spam. We have a corpus of 50,000 annotated examples.
• What is the main drawback of each of the following feature ?
• Number of times a given character occurs (normalized)
• Number of times each possible word 5-gram occurs
• What would be good features ?
• Which evaluation procedure would you use ?
• Which evaluation metric would you use ?

# Exercice Linear models

## Installation

• You need python, tar, gunzip installed !

## Nancy rain data

• rain in mm from sep 2016 to august 2017
• data: http://deeploria.gforge.inria.fr/cours/data.rain
# date pressure temperature rain_1h rain_2h rain_6h rain_12h rain_24h
20160901000000 102240 289.850000 0.000000 0.000000 0.000000 0.000000 0.000000
20160901030000 102300 287.450000 0.000000 0.000000 0.000000 0.000000 0.000000
20160901060000 102360 287.950000 0.000000 0.000000 0.000000 0.000000 0.000000
20160901090000 102360 296.650000 0.000000 0.000000 0.000000 0.000000 0.000000


## Linear regression

• rain(24h) = f(date)

## Linear regression

• rain(24h) = f(pressure)

## Nancy rain data

import numpy as np
nl = 2864
x=np.zeros((nl,2))
y=np.zeros((nl,))
co=0
for l in lines:
if not 'mq' in l:
s=l.split()
x[co,0]=float(s[1]) # pressure
x[co,1]=float(s[2]) # temperature
y[co]=float(s[-1]) # rain
co+=1

## Linear regression

• Compute in python/numpy the optimal linear regression pluie=f(temperature,pression) with MSE:
• use numpy.linalg.inv()
• use numpy.dot()
• use numpy.transpose()
• Compute square error of prediction on the train set
• Compare with a random linear regression model
• Which input is the most relevant ? Temperature of pression ?

## Linear classification

• Transform data to predict whether it’s going to rain (>2mm) at $$t+1$$:
• Compute the optimal linear classifier with square loss
• Compute the classification error on the train set
• Implement cross-validation to compute the classification error on unseen data

## Perceptron

Same classification as before on the rain data

• Training accuracy of the perceptron ?
• X-val accuracy ?
• What looks like the training curve ?

## GMM

On the Nancy rain data:

• Train a 3-class GMM for P(pluie,pression)
• Can you interpret each class ?
• Predict with computing the posterior P(pluie|pression) whether it’s going to rain
• Compute the accuracy

# Exercice Bayesian networks

## Exercice 2 (TP)

• test = last 10k lines
• dev = last 10k lines before test
• Model: predict 7-class punctuation (see TD2)

## Exercice 2 (TP)

• Implement this model in python/numpy for $$n=2$$
• Evaluate with F1 on dev and test
• Idem for $$n=5$$
• Evaluate with F1 on dev and test
• Evaluate with F1 and ROC curves

## Bayesian inference

• Dirichlet: $\pi_{\alpha}(\theta) = \frac{\Gamma\left(\sum_{j=1}^K \alpha_j\right)}{\prod_{j=1}^K \Gamma(\alpha_j)}\prod_{j=1}^K \theta_j^{\alpha_j-1}$
• with $$K=$$ dim of Multinomial and $\Gamma(1)=1$ $\Gamma(x+1)=x\Gamma(x)$

## Bayesian inference

• We consider the unigram case
• We assume $$\theta \sim \text{Dirichlet}(\alpha)$$ and $$X_t|\theta \sim \text{Multinomial}(\theta)$$
• What is the distribution of $$p(\theta|X)$$ ? (derive it)
• What is the probability of a word that has not been seen at training time ?

# Exercice CRF

## Implement CRF in pytorch

https://towardsdatascience.com/conditional-random-field-tutorial-in-pytorch-ca0d04499463

# Exercice backpropagation

• Draw a circuit graph of the function $f(x,y)=\frac{x^4-y+1}{x^4+x^2+2}$
• compute the forward pass and gradients for input $$(x,y)=(-1,2)$$

# Exercice pytorch

• implement a function in pytorch that computes $f(x)=2x^2$
• get the gradient with autograd for $$x=1$$ and $$x=2$$
• in numpy, define a function “genData(vecsize, nsamps)” that
• generates 70 vectors of dim=100 uniformly sampled in $$[0.3;0.5]$$
• idem, 30 vectors within $$[0.6;0.7]$$
• label the first with class 0, the last with class 1 and concatenate them
• in pytorch, define a generic MLP class with 1 hidden layer:
class Net(nn.Module):
def __init__(self,nins,nout):
• train a MLP on the previous toy data; shuffle the data before every epoch
• plot the training accuracy at every epoch; try different hyper-parameters and check their impact.
• In pytorch, define and train on the same data another model, an auto-encoder: it is a 1 hidden-layer MLP that reproduces its inputs. The hidden layer compresses the input.
• Use it to compress the original 100-dim vectors into 10-dim vectors, and train the previous MLP classifier on this compressed dataset.
• Show that you can compress down to 1-dim vector and still get 100% accuracy, but with more epochs

# Exercice RNN

## RNN in pytorch

• Fill in the XXXXXXXXXXXXXXXXXXXXXXXXXXXX
class RNNCell(nn.Module):
def __init__(self, input_size, hidden_size, output_size):
super(RNN, self).__init__()

self.hidden_size = hidden_size

self.i2h = nn.Linear(XXXXXXXXXXXXXXXXXXXXXXXXXXXX)
self.i2o = nn.Linear(XXXXXXXXXXXXXXXXXXXXXXXXXXXX)
self.softmax = nn.LogSoftmax()

def forward(self, input, hidden):
combined = torch.cat((input, hidden), 1)
hidden = self.i2h(combined)
output = self.i2o(combined)
output = self.softmax(output)
return output, hidden

def initHidden(self):
return Variable(torch.zeros(1, self.hidden_size))

## RNN in pytorch: Exercice

• See how to use and train this cell in this tutorial:
• http://pytorch.org/tutorials/intermediate/char_rnn_classification_tutorial.html

## RNN real case

• Write a pytorch program to predict the next character in a text
• Use the predifined pytorch class RNN()
• Use Embeddings to represent one character
• Load as data source the very own source code of your program
• plot accuracy per epoch

## Truncated BPTT

• Try and understant the example at:
• https://github.com/pytorch/examples/tree/master/word_language_model
• It illustrates a variant of BPTT that detaches hidden states after fixed-length history

# Exercice Attention

## Attention

• implement a python class for attention:
class Attention(nn.Module):
def __init__(self, nvecs): ...
def forward(self, x): ...
• inputs = batch of vectors sequences (batch_dim, seq_dim, vec_dim)
• outputs = batch of vectors
• it creates a context vector and shall be able to train it

# Exercice Seq2Seq

• Excellent tutorial that puts together attention and seq2seq. It also demonstrates important tricks such as detaching hidden states in RNNs:
• https://pytorch.org/tutorials/intermediate/seq2seq_translation_tutorial.html

# Exercice Transformers

• Very good tutorial that explains step by step the transformers model:
• http://nlp.seas.harvard.edu/2018/04/03/attention.html
• Finish implementing the transformer