Generative AI in action
Part 1: Foundations of generative AI
This was all pretty introductory. A lot of it would be useful to someone new to the topic, but I don't feel like I personally learned anything.
Part 2: Advanced techniques and applications
Chapter 6: Guide to prompt engineering
Self-consistency sampling improves the reliability of language model outputs by generating multiple independent responses and analyzing their consistency.
Indirect prompt injection embeds commands in seemingly normal content, for example by having the model read files containing new instructions.
Chapter 7: Retrieval-augmented generation
Data grounding involves connecting the model to external sources like databases, APIs, or knowledge graphs to improve accuracy and traceability.
Maximum Inner Product Search (MIPS) finds the vectors in a dataset that have the highest dot product with a query vector.
Marginalization involves considering multiple possible retrieved documents/passages when generating a response, rather than relying on just the top result, by aggregating or "marginalizing" over different pieces of retrieved evidence.
Sparse retrievers use traditional keyword-based methods like TF-IDF or BM25 that rely on exact word matches, while dense retrievers look at vector similarity in an embedding space.
Dense Passage Retrieval (DPR) is a neural information retrieval system that uses two BERT-based encoders to map both queries and passages into dense vector representations.
A vector index is a specialized data structure that organizes high-dimensional vectors for efficient similarity search using techniques like approximate nearest neighbour algorithms (e.g., HNSW, IVF).
Similarity measures:
Cosine similarity is ideal for text and document comparisons where vector orientation matters more than magnitude. It is commonly used in NLP tasks.
Euclidean (L2) distance measures the actual geometric distance between points in space, making it suitable for spatial and clustering applications.
The dot product is computationally efficient for high-dimensional sparse vectors and useful in recommendation systems where vector magnitudes are meaningful.
Hamming distance measures the number of positions at which two sequences differ, making it suitable for comparing binary strings or for error detection in data transmission.
Manhattan (L1) distance calculates the sum of absolute differences between coordinates, making it useful for grid-based pathfinding and when individual dimension differences need to be emphasized.
Chunking breaks documents into smaller segments for retrieval systems. Approaches should consider factors like semantic coherence, size consistency, and overlap. Common ones include fixed-length splits, sentence/paragraph/section boundaries, sliding windows, and semantic chunking based on NLP.
Adaptive chunking dynamically adjusts segment sizes based on content characteristics, semantic boundaries, and context requirements rather than using fixed-length splits.
A dynamic retrieval window adjusts the amount of retrieved context based on the complexity of the query.
Fallback strategies should be considered for cases where none of the retrieved chunks are relevant.
Chapter 8: Chatting with your data
This chapter was based around an example of applying RAG. There wasn't much in the way of new concepts.
Chapter 9: Tailoring models with model adaptation and fine-tuning
Part 3: Deployment and ethical considerations
This section is a mixture of big-picture stuff and lists of libraries, metrics, etc. that you might want to use. While a lot of it is valuable, there's nothing in particular that it seems to make sense to summarize here. I'll just check the book itself when applicable (or do some searching online for the areas like the list of libraries that will be hopelessly out of date within a few months).
Last updated