//

Exercice perceptron

logical AND

  • Observations = 00:0 01:0 10:0 11:1
  • Initial weights = 0

  • How many epochs is needed for convergence ?
  • What are the weights of the trained model ?

logical XOR

  • Observations = 00:0 01:1 10:1 11:0
  • Initial weights = 0

  • What happens ?

unnormalized AND

  • Observations = 00:0 02:0 20:0 22:1
  • Initial weights = 0

  • How many epochs is needed for convergence ?
  • What are the weights of the trained model ?

Exercice feature def.

SPAM or not ?

  • We want to detect whether an email is a spam. We have a corpus of 50,000 annotated examples.
  • What is the main drawback of each of the following feature ?
    • Number of times a given character occurs (normalized)
    • Number of times each possible word 5-gram occurs
  • What would be good features ?
  • Which evaluation procedure would you use ?
  • Which evaluation metric would you use ?

Exercice Linear models

Installation

  • You need python, tar, gunzip installed !

Nancy rain data

  • rain in mm from sep 2016 to august 2017
  • data: http://deeploria.gforge.inria.fr/cours/data.rain
# date pressure temperature rain_1h rain_2h rain_6h rain_12h rain_24h
20160901000000 102240 289.850000 0.000000 0.000000 0.000000 0.000000 0.000000
20160901030000 102300 287.450000 0.000000 0.000000 0.000000 0.000000 0.000000
20160901060000 102360 287.950000 0.000000 0.000000 0.000000 0.000000 0.000000
20160901090000 102360 296.650000 0.000000 0.000000 0.000000 0.000000 0.000000

Linear regression

  • rain(24h) = f(date)

Linear regression

  • rain(24h) = f(pressure)

Nancy rain data

import numpy as np
with open(fich,"r") as f: lines=f.readlines()
nl = 2864
x=np.zeros((nl,2))
y=np.zeros((nl,))
co=0
for l in lines:
    if not 'mq' in l:
            s=l.split()
            x[co,0]=float(s[1]) # pressure
            x[co,1]=float(s[2]) # temperature
            y[co]=float(s[-1]) # rain
            co+=1

Linear regression

  • Compute in python/numpy the optimal linear regression pluie=f(temperature,pression) with MSE:
    • use numpy.linalg.inv()
    • use numpy.dot()
    • use numpy.transpose()
  • Compute square error of prediction on the train set
    • Compare with a random linear regression model
  • Which input is the most relevant ? Temperature of pression ?

Linear classification

  • Transform data to predict whether it’s going to rain (>2mm) at \(t+1\):
    • Compute the optimal linear classifier with square loss
    • Compute the classification error on the train set
    • Implement cross-validation to compute the classification error on unseen data

Perceptron

Same classification as before on the rain data

  • Training accuracy of the perceptron ?
  • X-val accuracy ?
  • What looks like the training curve ?

GMM

On the Nancy rain data:

  • Train a 3-class GMM for P(pluie,pression)
  • Can you interpret each class ?
  • Predict with computing the posterior P(pluie|pression) whether it’s going to rain
    • Compute the accuracy

Exercice Bayesian networks

Exercice 2 (TP)

  • Download UTF8 Data: http://deeploria.gforge.inria.fr/cours/frtweets.txt
    • test = last 10k lines
    • dev = last 10k lines before test
  • Download voc: http://deeploria.gforge.inria.fr/cours/voc.txt
  • Model: predict 7-class punctuation (see TD2)

Exercice 2 (TP)

  • Implement this model in python/numpy for \(n=2\)
    • Evaluate with F1 on dev and test
  • Idem for \(n=5\)
    • Evaluate with F1 on dev and test
  • Evaluate with F1 and ROC curves

Bayesian inference

  • Dirichlet: \[\pi_{\alpha}(\theta) = \frac{\Gamma\left(\sum_{j=1}^K \alpha_j\right)}{\prod_{j=1}^K \Gamma(\alpha_j)}\prod_{j=1}^K \theta_j^{\alpha_j-1}\]
  • with \(K=\) dim of Multinomial and \[\Gamma(1)=1\] \[\Gamma(x+1)=x\Gamma(x)\]

Bayesian inference

  • We consider the unigram case
  • We assume \(\theta \sim \text{Dirichlet}(\alpha)\) and \(X_t|\theta \sim \text{Multinomial}(\theta)\)
  • What is the distribution of \(p(\theta|X)\) ? (derive it)
  • What is the probability of a word that has not been seen at training time ?

Exercice CRF

Implement CRF in pytorch

https://towardsdatascience.com/conditional-random-field-tutorial-in-pytorch-ca0d04499463

Exercice backpropagation

  • Draw a circuit graph of the function \[f(x,y)=\frac{x^4-y+1}{x^4+x^2+2}\]
  • compute the forward pass and gradients for input \((x,y)=(-1,2)\)

Exercice pytorch

  • Use google search or pytorch docs to help you do the following:
  • implement a function in pytorch that computes \[f(x)=2x^2\]
  • get the gradient with autograd for \(x=1\) and \(x=2\)
  • in numpy, define a function “genData(vecsize, nsamps)” that
    • generates 70 vectors of dim=100 uniformly sampled in \([0.3;0.5]\)
    • idem, 30 vectors within \([0.6;0.7]\)
    • label the first with class 0, the last with class 1 and concatenate them
  • in pytorch, define a generic MLP class with 1 hidden layer:
class Net(nn.Module):
    def __init__(self,nins,nout):
  • train a MLP on the previous toy data; shuffle the data before every epoch
  • plot the training accuracy at every epoch; try different hyper-parameters and check their impact.
  • In pytorch, define and train on the same data another model, an auto-encoder: it is a 1 hidden-layer MLP that reproduces its inputs. The hidden layer compresses the input.
  • Use it to compress the original 100-dim vectors into 10-dim vectors, and train the previous MLP classifier on this compressed dataset.
  • Show that you can compress down to 1-dim vector and still get 100% accuracy, but with more epochs

Exercice RNN

RNN in pytorch

  • Fill in the XXXXXXXXXXXXXXXXXXXXXXXXXXXX
class RNNCell(nn.Module):
    def __init__(self, input_size, hidden_size, output_size):
        super(RNN, self).__init__()

        self.hidden_size = hidden_size

        self.i2h = nn.Linear(XXXXXXXXXXXXXXXXXXXXXXXXXXXX)
        self.i2o = nn.Linear(XXXXXXXXXXXXXXXXXXXXXXXXXXXX)
        self.softmax = nn.LogSoftmax()

    def forward(self, input, hidden):
        combined = torch.cat((input, hidden), 1)
        hidden = self.i2h(combined)
        output = self.i2o(combined)
        output = self.softmax(output)
        return output, hidden

    def initHidden(self):
        return Variable(torch.zeros(1, self.hidden_size))

RNN in pytorch: Exercice

  • See how to use and train this cell in this tutorial:
  • http://pytorch.org/tutorials/intermediate/char_rnn_classification_tutorial.html

RNN real case

  • Write a pytorch program to predict the next character in a text
    • Use the predifined pytorch class RNN()
    • Use Embeddings to represent one character
    • Load as data source the very own source code of your program
    • plot accuracy per epoch

Truncated BPTT

  • Try and understant the example at:
  • https://github.com/pytorch/examples/tree/master/word_language_model
    • It illustrates a variant of BPTT that detaches hidden states after fixed-length history

Exercice Attention

Attention

  • implement a python class for attention:
class Attention(nn.Module):
    def __init__(self, nvecs): ...
    def forward(self, x): ...
  • inputs = batch of vectors sequences (batch_dim, seq_dim, vec_dim)
  • outputs = batch of vectors
  • it creates a context vector and shall be able to train it

Exercice Seq2Seq

  • Excellent tutorial that puts together attention and seq2seq. It also demonstrates important tricks such as detaching hidden states in RNNs:
  • https://pytorch.org/tutorials/intermediate/seq2seq_translation_tutorial.html

Exercice Transformers

  • Very good tutorial that explains step by step the transformers model:
  • http://nlp.seas.harvard.edu/2018/04/03/attention.html
  • Finish implementing the transformer
    • read tutos, e.g. https://towardsdatascience.com/how-to-code-the-transformer-in-pytorch-24db27c8f9ec
  • Download the LANL corpus auth.txt.gz here: https://csr.lanl.gov/data/cyber1/
    • objective: analyze log files to detect hackers/attacks
    • you can make fast tests by generating local log files: dmesg > mylog.txt
  • generate the next line in the log file from the current line with a transformer network, using characters input