Report of empirical evaluation, observations and investigations on LSTM based Language Modeling
Model | Val. PPL | Test PPL | Last Epoch | Last LR | Seq_len | Tokenizer |
---|---|---|---|---|---|---|
HLSTM | 1.826 | 1.779 | 64 | 0.007813 | 25 | moses |
HLSTM | 1.952 | 1.907 | 143 | 2.98E-08 | 25 | Split |
HLSTM | 2.38 | 2.296 | 155 | 3.81E-06 | 35 | moses |
HLSTM | 2.847 | 2.782 | 113 | 0.00048828125 | 35 | Basic_English |
HLSTM | 21.98 | 20.39 | 139 | 2.98E-08 | 50 | moses |
HLSTM | 47.19 | 45.49 | 152 | 2.38E-07 | 50 | Basic_English |
HLSTM | 51.75 | 49.58 | 151 | 2.33E-10 | 50 | Split |
HLSTM | 68.75 | 63.34 | 122 | 3.81E-06 | 70 | moses |
Model | Val. PPL | Test PPL | Last Epoch | Last LR | Seq_len | Tokenizer |
---|---|---|---|---|---|---|
H2HLSTM | 3.069 | 2.987 | 145 | 4.66E-10 | 25 | Split |
H2HLSTM | 4.085 | 3.855 | 117 | 1.91E-06 | 35 | moses |
H2HLSTM-NTASGD | 32.32 | 31.01 | 108 | 9.53E-07 | 50 | Basic_English |
H2HLSTM | 34.8 | 33.59 | 97 | 6.10E-05 | 50 | Basic_English |
H2HLSTM | 38.15 | 35.18 | 88 | 2.98E-08 | 50 | moses |
H2HLSTM | 67.48 | 64.29 | 99 | 2.29E-05 | 50 | Split |
Model | Val. PPL | Test PPL | Last Epoch | Last LR | Seq_len | Tokenizer |
---|---|---|---|---|---|---|
HLSTM | 1.885 | 1.942 | 12(24h) | 0.5 | 25 | Split |
Model | Val. PPL | Test PPL | Last Epoch | Last LR | Seq_len | Tokenizer |
---|---|---|---|---|---|---|
HLSTM | 1.964 | 1.782 | 149 | 2.91E-11 | 25 | Split |
HLSTM | 2.651 | 2.651 | 2.355 | 1.49E-08 | 21 | Basic_English |
H2HLSTM | 3.94 | 3.44 | 123 | 1.19E-07 | 21 | Moses |
HLSTM | 87.17 | 79.64 | 78 | 3.81E-06 | 50 | Basic_English |
LSTM | 87.75 | 74.87 | 66 | 1.90E-06 | 21 | Moses |