- Observations = 00:0 01:0 10:0 11:1
Initial weights = 0

- How many epochs is needed for convergence ?
What are the weights of the trained model ?

- Observations = 00:0 01:1 10:1 11:0
Initial weights = 0

What happens ?

- Observations = 00:0 02:0 20:0 22:1
Initial weights = 0

- How many epochs is needed for convergence ?
What are the weights of the trained model ?

- We want to detect whether an email is a spam. We have a corpus of 50,000 annotated examples.
- What is the main drawback of each of the following feature ?
- Number of times a given character occurs (normalized)
- Number of times each possible word 5-gram occurs

- What would be good features ?
- Which evaluation procedure would you use ?
- Which evaluation metric would you use ?

- You need python, tar, gunzip installed !

- rain in mm from sep 2016 to august 2017
- data: http://deeploria.gforge.inria.fr/cours/data.rain

```
# date pressure temperature rain_1h rain_2h rain_6h rain_12h rain_24h
20160901000000 102240 289.850000 0.000000 0.000000 0.000000 0.000000 0.000000
20160901030000 102300 287.450000 0.000000 0.000000 0.000000 0.000000 0.000000
20160901060000 102360 287.950000 0.000000 0.000000 0.000000 0.000000 0.000000
20160901090000 102360 296.650000 0.000000 0.000000 0.000000 0.000000 0.000000
```

- rain(24h) = f(date)

- rain(24h) = f(pressure)

```
import numpy as np
with open(fich,"r") as f: lines=f.readlines()
nl = 2864
x=np.zeros((nl,2))
y=np.zeros((nl,))
co=0
for l in lines:
if not 'mq' in l:
s=l.split()
x[co,0]=float(s[1]) # pressure
x[co,1]=float(s[2]) # temperature
y[co]=float(s[-1]) # rain
co+=1
```

- Compute in python/numpy the optimal linear regression pluie=f(temperature,pression) with MSE:
- use numpy.linalg.inv()
- use numpy.dot()
- use numpy.transpose()

- Compute square error of prediction on the train set
- Compare with a random linear regression model

- Which input is the most relevant ? Temperature of pression ?

- Transform data to predict whether it’s going to rain (>2mm) at \(t+1\):
- Compute the optimal linear classifier with square loss
- Compute the classification error on the train set
- Implement cross-validation to compute the classification error on unseen data

Same classification as before on the rain data

- Training accuracy of the perceptron ?
- X-val accuracy ?
- What looks like the training curve ?

On the Nancy rain data:

- Train a 3-class GMM for P(pluie,pression)
- Can you interpret each class ?
- Predict with computing the posterior P(pluie|pression) whether it’s going to rain
- Compute the accuracy

- Download UTF8 Data: http://deeploria.gforge.inria.fr/cours/frtweets.txt
- test = last 10k lines
- dev = last 10k lines before test

- Download voc: http://deeploria.gforge.inria.fr/cours/voc.txt
- Model: predict 7-class punctuation (see TD2)

- Implement this model in python/numpy for \(n=2\)
- Evaluate with F1 on dev and test

- Idem for \(n=5\)
- Evaluate with F1 on dev and test

- Evaluate with F1 and ROC curves

- Dirichlet: \[\pi_{\alpha}(\theta) = \frac{\Gamma\left(\sum_{j=1}^K \alpha_j\right)}{\prod_{j=1}^K \Gamma(\alpha_j)}\prod_{j=1}^K \theta_j^{\alpha_j-1}\]
- with \(K=\) dim of Multinomial and \[\Gamma(1)=1\] \[\Gamma(x+1)=x\Gamma(x)\]

- We consider the unigram case
- We assume \(\theta \sim \text{Dirichlet}(\alpha)\) and \(X_t|\theta \sim \text{Multinomial}(\theta)\)
- What is the distribution of \(p(\theta|X)\) ? (derive it)
- What is the probability of a word that has not been seen at training time ?

https://towardsdatascience.com/conditional-random-field-tutorial-in-pytorch-ca0d04499463

- Draw a circuit graph of the function \[f(x,y)=\frac{x^4-y+1}{x^4+x^2+2}\]
- compute the forward pass and gradients for input \((x,y)=(-1,2)\)

- Use google search or pytorch docs to help you do the following:
- implement a function in pytorch that computes \[f(x)=2x^2\]
- get the gradient with autograd for \(x=1\) and \(x=2\)

- in numpy, define a function “genData(vecsize, nsamps)” that
- generates 70 vectors of dim=100 uniformly sampled in \([0.3;0.5]\)
- idem, 30 vectors within \([0.6;0.7]\)
- label the first with class 0, the last with class 1 and concatenate them

- in pytorch, define a generic MLP class with 1 hidden layer:

```
class Net(nn.Module):
def __init__(self,nins,nout):
```

- train a MLP on the previous toy data; shuffle the data before every epoch
- plot the training accuracy at every epoch; try different hyper-parameters and check their impact.

- In pytorch, define and train on the same data another model, an auto-encoder: it is a 1 hidden-layer MLP that reproduces its inputs. The hidden layer compresses the input.
- Use it to compress the original 100-dim vectors into 10-dim vectors, and train the previous MLP classifier on this compressed dataset.
- Show that you can compress down to 1-dim vector and still get 100% accuracy, but with more epochs

- Fill in the XXXXXXXXXXXXXXXXXXXXXXXXXXXX

```
class RNNCell(nn.Module):
def __init__(self, input_size, hidden_size, output_size):
super(RNN, self).__init__()
self.hidden_size = hidden_size
self.i2h = nn.Linear(XXXXXXXXXXXXXXXXXXXXXXXXXXXX)
self.i2o = nn.Linear(XXXXXXXXXXXXXXXXXXXXXXXXXXXX)
self.softmax = nn.LogSoftmax()
def forward(self, input, hidden):
combined = torch.cat((input, hidden), 1)
hidden = self.i2h(combined)
output = self.i2o(combined)
output = self.softmax(output)
return output, hidden
def initHidden(self):
return Variable(torch.zeros(1, self.hidden_size))
```

- See how to use and train this cell in this tutorial:
- http://pytorch.org/tutorials/intermediate/char_rnn_classification_tutorial.html

- Write a pytorch program to predict the next character in a text
- Use the predifined pytorch class RNN()
- Use Embeddings to represent one character
- Load as data source the very own source code of your program
- plot accuracy per epoch

- Try and understant the example at:
- https://github.com/pytorch/examples/tree/master/word_language_model
- It illustrates a variant of BPTT that detaches hidden states after fixed-length history

- implement a python class for attention:

```
class Attention(nn.Module):
def __init__(self, nvecs): ...
def forward(self, x): ...
```

- inputs = batch of vectors sequences (batch_dim, seq_dim, vec_dim)
- outputs = batch of vectors
- it creates a context vector and shall be able to train it

- Excellent tutorial that puts together attention and seq2seq. It also demonstrates important tricks such as detaching hidden states in RNNs:
- https://pytorch.org/tutorials/intermediate/seq2seq_translation_tutorial.html

- Very good tutorial that explains step by step the transformers model:
- http://nlp.seas.harvard.edu/2018/04/03/attention.html

- Finish implementing the transformer
- read tutos, e.g. https://towardsdatascience.com/how-to-code-the-transformer-in-pytorch-24db27c8f9ec

- Download the LANL corpus auth.txt.gz here: https://csr.lanl.gov/data/cyber1/
- objective: analyze log files to detect hackers/attacks
- you can make fast tests by generating local log files: dmesg > mylog.txt

- generate the next line in the log file from the current line with a transformer network, using characters input