Raoul Harris
  • Introduction
  • Technical books
    • Data engineering with Alteryx
    • Deep learning in Python
    • Generative AI in action
    • Generative deep learning
    • Outlier analysis
    • Understanding deep learning
    • Understanding machine learning: from theory to algorithms (in progress)
    • Review: Deep learning: foundations and concepts
  • Technical courses
    • Advanced SQL Server masterclass for data analytics
    • Building full-stack apps with AI
    • Complete Cursor
    • DataOps methodology
    • DeepLearning.AI short courses
    • Generative AI for software development
      • Introduction to generative AI for software development
      • Team software engineering with AI
      • AI-powered software and system design
    • Generative AI with large language models
    • Generative pre-trained transformers
    • IBM DevOps and software engineering
      • Introduction to agile development and scrum
      • Introduction to cloud computing
      • Introduction to DevOps
    • Machine learning in production
    • Reinforcement learning specialization
      • Fundamentals of reinforcement learning
      • Sample-based learning methods
      • Prediction and control with function approximation
  • Non-technical books
    • Management skills for everyday life (in progress)
  • Non-technical courses
    • Business communication and effective communication specializations
      • Business writing
      • Graphic design
      • Successful presentation
      • Giving helpful feedback (not started)
      • Communicating effectively in groups (not started)
    • Illinois Tech MBA courses
      • Competitive strategy (in progress)
    • Leading people and teams specialization
      • Inspiring and motivating individuals
      • Managing talent
      • Influencing people
      • Leading teams
Powered by GitBook
On this page
  1. Technical courses

Generative pre-trained transformers

https://www.coursera.org/learn/chatgpt

This course was clear but quite short. It spent some time on n-gram models too, which was a useful way to build up to transformers, but meant that there was less time to cover transformers in detail. Overall, I felt Generative AI with large language models was a better introduction to the topic, going into much more detail despite still being somewhat concise.

I only made Anki cards at the time rather than separate notes, so the ones below only cover the topics that stood out to me rather than everything in the course.

  • Causal language models predict the next word, while masked language models predict a word that has been masked (hidden), which could be anywhere in the sentence, allowing the model to use both left and right context

  • One option in language modelling is to use the Markov assumption, resulting in n-gram models

  • n-gram models are particularly vulnerable to data sparsity issues

  • Intrinsic evaluation: How well does the model predict hold-out data?

    • Often measured using perplexity: PP(W)=1P(w1,w2,...,wN)NPP(W) = \sqrt[N]{\frac{1}{P(w_1, w_2, ..., w_N)}}PP(W)=NP(w1​,w2​,...,wN​)1​​

  • Extrinsic evaluation: How well does the model do on specific downstream tasks?

  • Search approaches

    • Greedy search

      • Always pick the highest probability token

      • May not result in the highest probability sequence

      • Can result in bland/uninteresting test

    • Beam search

      • Maintain the kkk highest probability sequences (kkk usually set somewhere between 2 and 20)

It then went on to talk about the structure of transformers and how they are trained. There was discussion about things like backpropagation, activation functions, padding, subword tokens, new word tokens, etc., but I haven't included notes here are things like Understanding deep learning go into much more detail.

Last updated 8 months ago