# Generative pre-trained transformers

This course was clear but quite short. It spent some time on n-gram models too, which was a useful way to build up to transformers, but meant that there was less time to cover transformers in detail. Overall, I felt [Generative AI with large language models](/technical-courses/generative-ai-with-large-language-models.md) was a better introduction to the topic, going into much more detail despite still being somewhat concise.

I only made Anki cards at the time rather than separate notes, so the ones below only cover the topics that stood out to me rather than everything in the course.

* Causal language models predict the next word, while masked language models predict a word that has been masked (hidden), which could be anywhere in the sentence, allowing the model to use both left and right context
* One option in language modelling is to use the Markov assumption, resulting in n-gram models
* n-gram models are particularly vulnerable to data sparsity issues
* Intrinsic evaluation: How well does the model predict hold-out data?
  * Often measured using perplexity: $$PP(W) = \sqrt\[N]{\frac{1}{P(w\_1, w\_2, ..., w\_N)}}$$
* Extrinsic evaluation: How well does the model do on specific downstream tasks?
* Search approaches
  * Greedy search
    * Always pick the highest probability token
    * May not result in the highest probability sequence
    * Can result in bland/uninteresting test
  * Beam search
    * Maintain the $$k$$ highest probability sequences ($$k$$ usually set somewhere between 2 and 20)

It then went on to talk about the structure of transformers and how they are trained. There was discussion about things like backpropagation, activation functions, padding, subword tokens, new word tokens, etc., but I haven't included notes here are things like [Understanding deep learning](/technical-books/understanding-deep-learning.md) go into much more detail.


---

# Agent Instructions: Querying This Documentation

If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the `ask` query parameter:

```
GET https://www.raoulharris.com/technical-courses/generative-pre-trained-transformers.md?ask=<question>
```

The question should be specific, self-contained, and written in natural language.
The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.
