GPT-2
What is all the hype about?
- Large language model trained on a large diverse dataset performs well on different domains and tasks
- Motivate multi-task learning and why narrow learning can cause problems
- General method of transfer
- A LM with sufficient capacity will begin to learn to infer and perform tasks demonstrated in natural language sequences in order to better predict them, regardless of method of procurement
- LM performs multi-task unsupervised learning.
- GPT-2 is underfitting the data at 1.5B parameters capacity.
- Zero-shot performance of GPT-2 gives SOTA on 7 out of 8 language modelling datasets
- Performs reasonably well, zero-shot on multiple different tasks ranging from baseline to SOTA
- Questions
- Perplexity?
- Byte Pair encoding?
- Transformer model?
- Bloom filters?