# Architecture particuliere

Dans ce tutoriel, nous allons voir comment combiner différentes entrées dans un réseau de neurones.

In [14]:
from collections import defaultdict
import pandas as pd
import numpy as np
import nltk
from nltk.tokenize import word_tokenize
from torch.utils.data import DataLoader, random_split
from torch import optim, nn
from torch.autograd import Variable
from pytoune.framework import Model, ModelCheckpoint, Callback, CSVLogger, EarlyStopping, ReduceLROnPlateau
from pytoune.framework.metrics import acc
import torch

# nltk.download('punkt')
torch.manual_seed(42)
np.random.seed(42)

In [15]:
cuda_device = 0
device = torch.device("cuda:%d" % cuda_device if torch.cuda.is_available() else "cpu")
batch_size = 32
learning_rate = 0.01
n_epoch = 10

In [16]:
data = pd.read_csv('./winemag-data-130k-v2.csv')
data.head(5)

Unnamed: 0.1,Unnamed: 0,country,description,designation,points,price,province,region_1,region_2,taster_name,taster_twitter_handle,title,variety,winery
0,0,Italy,"Aromas include tropical fruit, broom, brimston...",Vulkà Bianco,87,,Sicily & Sardinia,Etna,,Kerin O’Keefe,@kerinokeefe,Nicosia 2013 Vulkà Bianco (Etna),White Blend,Nicosia
1,1,Portugal,"This is ripe and fruity, a wine that is smooth...",Avidagos,87,15.0,Douro,,,Roger Voss,@vossroger,Quinta dos Avidagos 2011 Avidagos Red (Douro),Portuguese Red,Quinta dos Avidagos
2,2,US,"Tart and snappy, the flavors of lime flesh and...",,87,14.0,Oregon,Willamette Valley,Willamette Valley,Paul Gregutt,@paulgwine,Rainstorm 2013 Pinot Gris (Willamette Valley),Pinot Gris,Rainstorm
3,3,US,"Pineapple rind, lemon pith and orange blossom ...",Reserve Late Harvest,87,13.0,Michigan,Lake Michigan Shore,,Alexander Peartree,,St. Julian 2013 Reserve Late Harvest Riesling ...,Riesling,St. Julian
4,4,US,"Much like the regular bottling from 2012, this...",Vintner's Reserve Wild Child Block,87,65.0,Oregon,Willamette Valley,Willamette Valley,Paul Gregutt,@paulgwine,Sweet Cheeks 2012 Vintner's Reserve Wild Child...,Pinot Noir,Sweet Cheeks


Nous pouvons voir dans la cellules suivantes qu'il y a plusieurs doublons dans les données.

In [17]:
data[data.duplicated('description',keep=False)].sort_values('description').head(5)

Unnamed: 0.1,Unnamed: 0,country,description,designation,points,price,province,region_1,region_2,taster_name,taster_twitter_handle,title,variety,winery
67614,67614,US,"100% Malbec, it's redolent with dark plums, wi...",,87,20.0,Washington,Rattlesnake Hills,Columbia Valley,Sean P. Sullivan,@wawinereport,Roza Ridge 2010 Malbec (Rattlesnake Hills),Malbec,Roza Ridge
46540,46540,US,"100% Malbec, it's redolent with dark plums, wi...",,87,20.0,Washington,Rattlesnake Hills,Columbia Valley,Sean P. Sullivan,@wawinereport,Roza Ridge 2010 Malbec (Rattlesnake Hills),Malbec,Roza Ridge
119702,119702,US,"100% Sangiovese, this pale pink wine has notes...",Meadow,88,18.0,Washington,Columbia Valley (WA),Columbia Valley,Sean P. Sullivan,@wawinereport,Ross Andrew 2013 Meadow Rosé (Columbia Valley ...,Rosé,Ross Andrew
72181,72181,US,"100% Sangiovese, this pale pink wine has notes...",Meadow,88,18.0,Washington,Columbia Valley (WA),Columbia Valley,Sean P. Sullivan,@wawinereport,Ross Andrew 2013 Meadow Rosé (Columbia Valley ...,Rosé,Ross Andrew
73731,73731,France,"87-89 Barrel sample. A pleasurable, perfumed w...",Barrel sample,88,,Bordeaux,Saint-Julien,,Roger Voss,@vossroger,Château Lalande-Borie 2008 Barrel sample (Sai...,Bordeaux-style Red Blend,Château Lalande-Borie


Pour l'entraînement de notre réseau, nous allons nous assurer de conserver les lignes qui ont des données.

Nous pourrions utiliser des techniques vues précédemment pour gérer les valeurs nulles.

In [18]:
data = data.drop_duplicates('description')
data = data[pd.notnull(data.price)]
data = data[pd.notna(data.country)]
data = data[pd.notna(data.points)]
data = data[pd.notna(data.taster_name)]
data.shape

(88244, 14)

In [19]:
data.values

array([[1, 'Portugal',
        "This is ripe and fruity, a wine that is smooth while still structured. Firm tannins are filled out with juicy red berry fruits and freshened with acidity. It's  already drinkable, although it will certainly be better from 2016.",
        ..., 'Quinta dos Avidagos 2011 Avidagos Red (Douro)',
        'Portuguese Red', 'Quinta dos Avidagos'],
       [2, 'US',
        'Tart and snappy, the flavors of lime flesh and rind dominate. Some green pineapple pokes through, with crisp acidity underscoring the flavors. The wine was all stainless-steel fermented.',
        ..., 'Rainstorm 2013 Pinot Gris (Willamette Valley)',
        'Pinot Gris', 'Rainstorm'],
       [3, 'US',
        'Pineapple rind, lemon pith and orange blossom start off the aromas. The palate is a bit more opulent, with notes of honey-drizzled guava and mango giving way to a slightly astringent, semidry finish.',
        ...,
        'St. Julian 2013 Reserve Late Harvest Riesling (Lake Michigan Sh

In [20]:
num_in_train = round(0.8*len(data))
num_in_valid = round(0.1*len(data))
num_in_test = len(data) - (num_in_train + num_in_valid)
train, valid, test  = random_split(data.values, [num_in_train, num_in_valid, num_in_test])

In [21]:
len(train), len(valid), len(test)

(70595, 8824, 8825)

En premier lieu nous allons sélectionner le pays, la description et le nombre de points pour tenter de classifer le goûteur.

Il est laissé au lecteur d'utilier les autres colonnes pour peaufiner le modèle.

In [22]:
def filter_dataset(data):
    f = list()
    for example in data:
        # 1: Country
        # 2: Description
        # 4: Points
        # 9: Taster
        e = ((example[1], [w.lower() for w in word_tokenize(example[2])], example[4]/100), example[9])
        f.append(e)
    return f
                 
train_formatted = filter_dataset(train)
valid_formatted = filter_dataset(valid)
test_formatted = filter_dataset(test)

In [23]:
train_formatted[:5]

[(('Chile',
   ['starts',
    'out',
    'a',
    'little',
    'sharp',
    'and',
    'heavy',
    'on',
    'the',
    'olive',
    'aromas',
    ',',
    'but',
    'it',
    'brings',
    'enough',
    'freshness',
    'to',
    'override',
    'any',
    'roasted',
    ',',
    'herbal',
    'qualities',
    'that',
    'come',
    'with',
    'the',
    'variety',
    '.',
    'the',
    'palate',
    'is',
    'big',
    'and',
    'chunky',
    ',',
    'with',
    'sweet',
    ',',
    'jammy',
    'black',
    'fruit',
    'flavors',
    '.',
    'long',
    ',',
    'chocolaty',
    'and',
    'slightly',
    'herbal',
    'on',
    'the',
    'finish',
    '.'],
   0.87),
  'Michael Schachner'),
 (('France',
   ['on',
    'the',
    'rich',
    'side',
    ',',
    'this',
    'smooth',
    'wine',
    'discloses',
    'flavors',
    'of',
    'caramel',
    'and',
    'toasted',
    'almonds',
    'as',
    'well',
    'as',
    'fruit',
    '.',
    'it',
    'leaves',
 

Comme vu dans le précédent tutoriel, nous devons nous créer un vocabulaire pour toute donnée "non numérique"

In [24]:
description_vocab = set()
country_vocab = set()
taster_vocab = set()

for e in train_formatted:
    (country, description, points), taster = e
    country_vocab.add(country)
    for word in description:
        description_vocab.add(word)
        
# We make sure to catch all tasters
for e in (train_formatted + valid_formatted + test_formatted):
    (country, description, points), taster = e
    taster_vocab.add(taster)

word_to_idx = {
    '<PAD>': 0,
    '<UNK>': 1,
}

for word in sorted(description_vocab):
    word_to_idx[word] = len(word_to_idx)


country_to_idx = {country: i for i, country in enumerate(sorted(country_vocab))}
country_to_idx['<UNK>'] = len(country_to_idx)

taster_to_idx = {taster: i for i, taster in enumerate(sorted(taster_vocab))}

In [25]:
len(word_to_idx)

34879

In [26]:
country_to_idx

{'Argentina': 0,
 'Armenia': 1,
 'Australia': 2,
 'Austria': 3,
 'Bosnia and Herzegovina': 4,
 'Brazil': 5,
 'Bulgaria': 6,
 'Canada': 7,
 'Chile': 8,
 'China': 9,
 'Croatia': 10,
 'Cyprus': 11,
 'Czech Republic': 12,
 'England': 13,
 'France': 14,
 'Georgia': 15,
 'Germany': 16,
 'Greece': 17,
 'Hungary': 18,
 'India': 19,
 'Israel': 20,
 'Italy': 21,
 'Lebanon': 22,
 'Luxembourg': 23,
 'Macedonia': 24,
 'Mexico': 25,
 'Moldova': 26,
 'Morocco': 27,
 'New Zealand': 28,
 'Peru': 29,
 'Portugal': 30,
 'Romania': 31,
 'Serbia': 32,
 'Slovakia': 33,
 'Slovenia': 34,
 'South Africa': 35,
 'Spain': 36,
 'Switzerland': 37,
 'Turkey': 38,
 'US': 39,
 'Ukraine': 40,
 'Uruguay': 41,
 '<UNK>': 42}

In [27]:
taster_to_idx

{'Alexander Peartree': 0,
 'Anna Lee C. Iijima': 1,
 'Anne Krebiehl\xa0MW': 2,
 'Carrie Dykes': 3,
 'Christina Pickard': 4,
 'Fiona Adams': 5,
 'Jeff Jenssen': 6,
 'Jim Gordon': 7,
 'Joe Czerwinski': 8,
 'Kerin O’Keefe': 9,
 'Lauren Buzzeo': 10,
 'Matt Kettmann': 11,
 'Michael Schachner': 12,
 'Mike DeSimone': 13,
 'Paul Gregutt': 14,
 'Roger Voss': 15,
 'Sean P. Sullivan': 16,
 'Susan Kostrzewa': 17,
 'Virginie Boone': 18}

Ce vectorizer va nous servir à convertir toute donnée 'non numérique' en donnée numérique.

In [28]:
class Vectorizer:
    def __init__(self, word_to_idx, country_to_idx, taster_to_idx):
        self.word_to_idx = word_to_idx
        self.country_to_idx = country_to_idx
        self.taster_to_idx = taster_to_idx
        

    def vectorize_sequence(self, sequence, idx, remove_if_unk=False):
        if '<UNK>' in idx:
            unknown_index = idx['<UNK>']
            chars = [idx.get(tok, unknown_index) for tok in sequence]
            if remove_if_unk:
                return [w for w in chars if w != unknown_index]
            else:
                return chars

        else:
            return [idx[tok] for tok in sequence]

    def __call__(self, example):
        (country, description, points), taster = example
        vectorized_description = self.vectorize_sequence(description, self.word_to_idx)
        
        unknown_country = self.country_to_idx['<UNK>']
        vectorized_country = self.country_to_idx.get(country, unknown_country)
        
        vectorized_taster = self.taster_to_idx[taster]
        return (
            (vectorized_country, vectorized_description, points),
            vectorized_taster,
        )

vectorizer = Vectorizer(word_to_idx, country_to_idx, taster_to_idx)

In [29]:
train_data = [vectorizer(example) for example in train_formatted]
valid_data = [vectorizer(example) for example in valid_formatted]
test_data = [vectorizer(example) for example in test_formatted]

In [30]:
train_data[0]

((8,
  [29056,
   21967,
   1308,
   18052,
   27464,
   2279,
   14947,
   21678,
   31004,
   21640,
   2716,
   47,
   5379,
   16304,
   5024,
   10933,
   12934,
   31371,
   22165,
   2430,
   25992,
   47,
   15064,
   24606,
   30998,
   7611,
   34304,
   31004,
   32874,
   69,
   31004,
   22341,
   16276,
   3964,
   2279,
   6846,
   47,
   34304,
   30165,
   47,
   16400,
   4084,
   13026,
   12254,
   69,
   18160,
   47,
   6785,
   2279,
   28080,
   15064,
   21678,
   31004,
   12043,
   69],
  0.87),
 12)

Le concept de padding est extrêmement important. Il nous permet d'envoyer des tenseurs de longueurs différentes sur le GPU.

Nous prenons donc le tenseur le plus long de notre minibatch pour créer une matrice d'exemple.

In [31]:
import torch

def pad_sequences(vectorized_seqs, seq_lengths):
    seq_tensor = torch.zeros((len(vectorized_seqs), seq_lengths.max())).long()
    for idx, (seq, seqlen) in enumerate(zip(vectorized_seqs, seq_lengths)):
        seq_tensor[idx, :seqlen] = torch.LongTensor(seq[:seqlen])
    return seq_tensor

def collate_examples(samples):
    features, tasters = list(zip(*samples))
    countries, descriptions, points = list(zip(*features))
    descriptions_lengths = torch.LongTensor([len(s) for s in descriptions])
    padded_descriptions = pad_sequences(descriptions, descriptions_lengths)
    countries = torch.LongTensor(countries)
    points = torch.FloatTensor(points)
    tasters = torch.LongTensor(tasters)
    return (countries, padded_descriptions, points), tasters

In [32]:
from torch.utils.data import DataLoader, Dataset

batch_size = 64

train_loader = DataLoader(
    train_data,
    batch_size=batch_size,
    collate_fn=collate_examples,
    shuffle=True
)

valid_loader = DataLoader(
    valid_data,
    batch_size=batch_size,
    collate_fn=collate_examples,
    shuffle=False
)

test_loader = DataLoader(
    test_data,
    batch_size=batch_size,
    collate_fn=collate_examples,
    shuffle=False
)

In [33]:
b = next(iter(train_loader))
b

((tensor([39, 14, 39, 21, 39, 39, 35, 16, 30, 21, 39, 30, 39, 21,  0, 14, 21, 39,
           8, 35, 35,  2, 17, 39, 14, 39, 21, 39,  8, 39, 21, 16,  8, 39, 21, 39,
          14, 39, 39, 39, 21, 39, 36, 14, 14, 39, 14, 14, 39, 39, 21, 39, 21, 39,
          39, 39, 39, 14, 14,  0,  8,  8, 14, 21]),
  tensor([[31095, 16276, 31004,  ...,     0,     0,     0],
          [ 4655, 21081,  2279,  ...,     0,     0,     0],
          [34304, 31004,  1580,  ...,     0,     0,     0],
          ...,
          [11590, 20781, 21678,  ...,     0,     0,     0],
          [31004,  9502, 11641,  ...,     0,     0,     0],
          [ 2716, 21508, 25918,  ...,     0,     0,     0]]),
  tensor([0.8900, 0.8500, 0.8400, 0.8900, 0.8900, 0.8600, 0.9000, 0.9400, 0.8700,
          0.9300, 0.8600, 0.8500, 0.9100, 0.9000, 0.8400, 0.8700, 0.8800, 0.8900,
          0.8600, 0.8600, 0.8900, 0.9200, 0.8500, 0.8600, 0.9400, 0.8900, 0.8500,
          0.9000, 0.8600, 0.8300, 0.8800, 0.8900, 0.8300, 0.8700, 0.8400, 0.850

In [34]:
from torch import nn
from torch.nn import functional as F
from torch.nn.utils.rnn import pack_padded_sequence, pad_packed_sequence, pad_sequence

class TasterClassifier(nn.Module):
    def __init__(self, word_to_idx, country_to_idx, word_embedding_size, word_hidden_layer_size,
                country_embedding_size, points_hidden_size, hidden_size, num_tasters):
        super(TasterClassifier, self).__init__()
        
        self.word_embeddings = nn.Embedding(len(word_to_idx), word_embedding_size)
        self.word_rnn = nn.LSTM(word_embedding_size, word_hidden_layer_size)
        
        self.country_embeddings = nn.Embedding(len(country_to_idx), country_embedding_size)
        
        self.points_fully_connected = nn.Linear(1, points_hidden_size)
        
        self.fully_connected = nn.Linear(word_hidden_layer_size + country_embedding_size + points_hidden_size, hidden_size)
        
        self.last_fully_connected = nn.Linear(hidden_size, num_tasters)
        
        self.loss_function = nn.CrossEntropyLoss()
        self.metrics = ['acc']

    def forward(self, examples):
        
        countries, descriptions, points = examples
        
        # Description handling here
        seq_lengths, perm_idx = (descriptions > 0).sum(dim=1).sort(0, descending=True)
        _, rev_perm_idx = perm_idx.sort(0)
        
        # (batch_size, max_length)
        sorted_descriptions = descriptions[perm_idx]
        
        # (batch_size, max_length, embedding_size)
        embeds = self.word_embeddings(sorted_descriptions)
        packed_descriptions = pack_padded_sequence(embeds, seq_lengths, batch_first=True)
        
        # (1, batch_size, word_hidden_layer_size)
        _, (h_n, _) = self.word_rnn(packed_descriptions)
        h_n = h_n.squeeze(0)
        descriptions_rep = F.relu(h_n[rev_perm_idx])
        
        # Country handling here
        # (batch_size, countries_embeddings)
        countries_embeddings = self.country_embeddings(countries)
        
        # Points handling here
        # (batch_size, points_hidden_size)
        points_rep = F.relu(self.points_fully_connected(points.view(-1, 1)))
        
        # (batch_size, hidden_layer_size + countries_embeddings + points_hidden_size)
        combined_representation = torch.cat([descriptions_rep, countries_embeddings, points_rep], dim=1)
        
        # (batch_size, hidden_size)
        combined_representation = F.relu(self.fully_connected(combined_representation))
        
        # (batch_size, 1)
        out = self.last_fully_connected(combined_representation)
        
        return out.squeeze(1)

In [35]:
loaders = [train_loader, valid_loader, test_loader]

In [36]:
def train(name, pytorch_module):
    optimizer = optim.Adam(pytorch_module.parameters(), lr=learning_rate)
    
    # Pytoune Model
    model = Model(pytorch_module, optimizer, pytorch_module.loss_function, metrics=pytorch_module.metrics)

    # Send model on GPU
    model.to('cpu')

    # Train
    model.fit_generator(train_loader, valid_loader, epochs=n_epoch)
    
    return model

In [None]:
net = TasterClassifier(
    word_to_idx=word_to_idx,
    country_to_idx=country_to_idx,
    word_embedding_size=50,
    word_hidden_layer_size=20,
    country_embedding_size=20,
    points_hidden_size=20,
    hidden_size=50,
    num_tasters=len(taster_to_idx)
)
model = train('taster_classifier', net)

In [None]:
model.evaluate_generator(test_loader)

In [52]:
class TasterClassifierAttn(nn.Module):
    def __init__(self, word_to_idx, country_to_idx, word_embedding_size, common_hidden_size, num_tasters):
        super(TasterClassifierAttn, self).__init__()
        
        self.word_embeddings = nn.Embedding(len(word_to_idx), word_embedding_size)
        self.word_rnn = nn.LSTM(word_embedding_size, common_hidden_size)
        
        self.country_embeddings = nn.Embedding(len(country_to_idx), common_hidden_size)
        
        self.points_fully_connected = nn.Linear(1, common_hidden_size)
        
        self.attn_fully_connected = nn.Linear(common_hidden_size*3, 3)
        
        self.fully_connected = nn.Linear(common_hidden_size, common_hidden_size)
        
        self.last_fully_connected = nn.Linear(common_hidden_size, num_tasters)
        
        self.metrics = [self.acc]
        
        self.attention = False
        
    def loss_function(self, output, y):
        y_pred, attn = output
        return F.cross_entropy(y_pred, y)
    
    def acc(self, output, y):
        y_pred, attn = output
        return acc(y_pred, y)

    def forward(self, examples):
        
        countries, descriptions, points = examples
        
        # Description handling here
        seq_lengths, perm_idx = (descriptions > 0).sum(dim=1).sort(0, descending=True)
        _, rev_perm_idx = perm_idx.sort(0)
        
        # (batch_size, max_length)
        sorted_descriptions = descriptions[perm_idx]
        
        # (batch_size, max_length, embedding_size)
        embeds = self.word_embeddings(sorted_descriptions)
        packed_descriptions = pack_padded_sequence(embeds, seq_lengths, batch_first=True)
        
        # (1, batch_size, word_hidden_layer_size)
        _, (h_n, _) = self.word_rnn(packed_descriptions)
        h_n = h_n.squeeze(0)
        descriptions_rep = F.relu(h_n[rev_perm_idx])
        
        # Country handling here
        # (batch_size, countries_embeddings)
        countries_embeddings = self.country_embeddings(countries)
        
        # Points handling here
        # (batch_size, points_hidden_size)
        points_rep = F.relu(self.points_fully_connected(points.view(-1, 1)))
        
        # (batch_size, hidden_layer_size + countries_embeddings + points_hidden_size)
        combined_representation = torch.cat([descriptions_rep, countries_embeddings, points_rep], dim=1)
        
        attn_logits = self.attn_fully_connected(combined_representation)
        attn_pond = F.softmax(attn_logits, dim=1)
        if self.attention:
            attended_input = attn_pond[:, 0].view(-1, 1) * descriptions_rep + attn_pond[:, 1].view(-1, 1) * countries_embeddings + attn_pond[:, 2].view(-1, 1) * points_rep
        else:
            attended_input = descriptions_rep + countries_embeddings + points_rep 
        
        # (batch_size, hidden_size)
        combined_representation = F.relu(self.fully_connected(attended_input))
        
        # (batch_size, 1)
        out = self.last_fully_connected(combined_representation)
        
        return out.squeeze(1), attn_pond

    
class AttnActivation(Callback):
    def __init__(self, epoch_start=0):
        super().__init__()
        self.epoch_start = epoch_start
        
    def on_epoch_begin(self, epoch, logs):
        if self.epoch_start == epoch:
            print("Activating attention")
            self.model.model.attention = True


class GradientLogging(Callback):
    def __init__(self):
        super().__init__()
        self.gradient_logs = defaultdict(list)
        
    def on_backward_end(self, batch):
        # import pdb;pdb.set_trace()
        self.gradient_logs['word_embeddings'].append(self.model.model.word_embeddings.weight.grad.data.norm())
        self.gradient_logs['country_embeddings'].append(self.model.model.country_embeddings.weight.grad.data.norm())
        self.gradient_logs['points'].append(self.model.model.points_fully_connected.weight.grad.data.norm())
        
    def on_epoch_end(self, epoch, logs):
        print("Word Embeddings grad norm: {}".format(np.mean(self.gradient_logs['word_embeddings'])))
        print("Country Embeddings grad norm: {}".format(np.mean(self.gradient_logs['country_embeddings'])))
        print("Points grad norm: {}".format(np.mean(self.gradient_logs['points'])))
        self.gradient_logs['word_embeddings'] = list()
        self.gradient_logs['country_embeddings'] = list()
        self.gradient_logs['points'] = list()

            
    
def train_attn(name, pytorch_module, attn_activation=1):
    optimizer = optim.Adam(pytorch_module.parameters(), lr=learning_rate)
    
    callbacks = [GradientLogging(), AttnActivation(attn_activation)]
    
    # Pytoune Model
    model = Model(pytorch_module, optimizer, pytorch_module.loss_function, metrics=pytorch_module.metrics)

    # Send model on GPU
    # model.to(device)

    # Train
    model.fit_generator(train_loader, valid_loader, epochs=n_epoch, callbacks=callbacks)
    
    return model

In [53]:
net = TasterClassifierAttn(
    word_to_idx=word_to_idx,
    country_to_idx=country_to_idx,
    word_embedding_size=50,
    common_hidden_size=20,
    num_tasters=len(taster_to_idx)
)
n_epoch = 5
model = train_attn('taster_classifier', net)

Epoch 1/5Activating attention
Epoch 1/5 63.58s Step 1104/1104: loss: 0.946356, acc: 64.333168, val_loss: 0.904871, val_acc: 65.525839
Word Embeddings grad norm: 3.422738518565893e-05
Country Embeddings grad norm: 0.043636590242385864
Points grad norm: 0.0001556285424157977
Epoch 2/5 64.71s Step 1104/1104: loss: 0.907408, acc: 65.395566, val_loss: 0.899876, val_acc: 65.525839
Word Embeddings grad norm: 0.0014634561957791448
Country Embeddings grad norm: 0.037704743444919586
Points grad norm: 0.0004377620934974402
Epoch 3/5 63.48s Step 1104/1104: loss: 0.878339, acc: 65.843190, val_loss: 0.856257, val_acc: 66.829102
Word Embeddings grad norm: 0.022100228816270828
Country Embeddings grad norm: 0.046045031398534775
Points grad norm: 7.195908983703703e-05
Epoch 4/5 64.04s Step 1104/1104: loss: 0.841254, acc: 67.027410, val_loss: 0.834168, val_acc: 67.826383
Word Embeddings grad norm: 0.03915749490261078
Country Embeddings grad norm: 0.0646233782172203
Points grad norm: 0.0
Epoch 5/5 63.40s 

In [50]:
net = TasterClassifierAttn(
    word_to_idx=word_to_idx,
    country_to_idx=country_to_idx,
    word_embedding_size=50,
    common_hidden_size=20,
    num_tasters=len(taster_to_idx)
)
n_epoch = 5
model = train_attn('taster_classifier', net, attn_activation=2)

Epoch 1/5 63.53s Step 1104/1104: loss: 0.702741, acc: 75.511014, val_loss: 0.479272, val_acc: 85.086129
Word Embeddings grad norm: 0.08628664165735245
Country Embeddings grad norm: 0.12846454977989197
Points grad norm: 0.07020499557256699
Epoch 2/5Activating attention
Epoch 2/5 65.13s Step 1104/1104: loss: 0.340948, acc: 89.813726, val_loss: 0.300813, val_acc: 91.035811
Word Embeddings grad norm: 0.13509760797023773
Country Embeddings grad norm: 0.07974996417760849
Points grad norm: 0.01175676565617323
Epoch 3/5 64.27s Step 1104/1104: loss: 0.235133, acc: 93.013670, val_loss: 0.264922, val_acc: 92.327743
Word Embeddings grad norm: 0.14123034477233887
Country Embeddings grad norm: 0.07340138405561447
Points grad norm: 0.008250435814261436
Epoch 4/5 64.26s Step 1104/1104: loss: 0.199422, acc: 94.105815, val_loss: 0.237391, val_acc: 92.962375
Word Embeddings grad norm: 0.1365615427494049
Country Embeddings grad norm: 0.0728415995836258
Points grad norm: 0.005137351341545582
Epoch 5/5 64.6

In [54]:
test_loader = DataLoader(
    test_data,
    batch_size=1,
    collate_fn=collate_examples,
    shuffle=False
)

idx_to_country = {v: k for k, v in country_to_idx.items()}
idx_to_word = {v: k for k, v in word_to_idx.items()}
idx_to_taster = {v: k for k, v in taster_to_idx.items()}

for x, y in test_loader:
    country, description, points = x
    pred, attn = model.predict_on_batch(x)
    print("Taster (true) (pred): {} {}".format(idx_to_taster[int(y[0])], idx_to_taster[np.argmax(pred[0])]))
    print("({}) Country: {}".format(float(attn[0][1]), idx_to_country[int(country[0])]))
    print("({}) Points: {}".format(float(attn[0][2]), points[0]))
    print("({}) Description: {}".format(float(attn[0][0]), " ".join([idx_to_word[int(w)] for w in description[0]])))
    import pdb; pdb.set_trace()

Taster (true) (pred): Lauren Buzzeo Lauren Buzzeo
(0.022559083998203278) Country: France
(0.1913205236196518) Points: 0.8999999761581421
(0.7861202955245972) Description: this is vibrant and expressive , with upfront purple-flower fragrances of violet and iris mixed with fruity tones of boysenberry and raspberry sauce . the <UNK> texture offers medium tannins , ample acidity and skin-driven fruit flavors . roasted coffee bean and sweet smoke accents mark the lingering finish .
> <ipython-input-54-20e3eba99ac0>(12)<module>()
-> for x, y in test_loader:
(Pdb) c
Taster (true) (pred): Sean P. Sullivan Paul Gregutt
(0.4225955009460449) Country: US
(0.5669879913330078) Points: 0.8600000143051147
(0.010416511446237564) Description: the aromas of char , sawdust and vanilla seem to clash with the notes of herbs , leather and cedar . the fruit flavors are lighter in style with the oak ( 50 % new french along with <UNK> american ) taking over the show .
> <ipython-input-54-20e3eba99ac0>(12)<modul

BdbQuit: 

We could we pre-trained word representations for our own model.

Good word embeddings include those of [fasttext](https://fasttext.cc/), [GloVe](https://nlp.stanford.edu/projects/glove/) and [ELMo](https://allennlp.org/elmo).

Here is one way you load word vectors trained using fasttext and creating your own Embedding layer with pytorch.

In [None]:
from gensim.models import KeyedVectors

vec_model_path = './vectors.vec'
vec_model = KeyedVectors.load_word2vec_format(vec_model_path)

In [None]:
from torch import nn

class MyEmbeddings(nn.Embedding):
    def __init__(self, word_to_idx, embedding_dim):
        super(MyEmbeddings, self).__init__(len(word_to_idx), embedding_dim, padding_idx=0)
        self.embedding_dim = embedding_dim
        self.vocab_size = len(word_to_idx)
        self.word_to_idx = word_to_idx

    def set_item_embedding(self, idx, embedding):
        self.weight.data[idx] = torch.FloatTensor(embedding)

    def load_words_embeddings(self, vec_model):
        for word in vec_model.index2word:
            if word in self.word_to_idx:
                idx = self.word_to_idx[word]
                embedding = vec_model[word]
                self.set_item_embedding(idx, embedding)
                
embeddings_layer = MyEmbeddings(dataset['word_to_idx'], vec_model.vector_size)
embeddings_layer.load_words_embeddings(vec_model)