# Generative AI in action

## Part 1: Foundations of generative AI

This was all pretty introductory. A lot of it would be useful to someone new to the topic, but I don't feel like I personally learned anything.

## Part 2: Advanced techniques and applications

### Chapter 6: Guide to prompt engineering

**Self-consistency sampling** improves the reliability of language model outputs by generating multiple independent responses and analyzing their consistency.

**Indirect** prompt injection embeds commands in seemingly normal content, for example by having the model read files containing new instructions.

### Chapter 7: Retrieval-augmented generation

Data **grounding** involves connecting the model to external sources like databases, APIs, or knowledge graphs to improve accuracy and traceability.

**Maximum Inner Product Search (MIPS)** finds the vectors in a dataset that have the highest dot product with a query vector.

**Marginalization** involves considering multiple possible retrieved documents/passages when generating a response, rather than relying on just the top result, by aggregating or "marginalizing" over different pieces of retrieved evidence.

**Sparse** retrievers use traditional keyword-based methods like TF-IDF or BM25 that rely on exact word matches, while **dense** retrievers look at vector similarity in an embedding space.

**Dense Passage Retrieval (DPR)** is a neural information retrieval system that uses two BERT-based encoders to map both queries and passages into dense vector representations.

A vector **index** is a specialized data structure that organizes high-dimensional vectors for efficient similarity search using techniques like approximate nearest neighbour algorithms (e.g., HNSW, IVF).

Similarity measures:

* Cosine similarity is ideal for text and document comparisons where vector orientation matters more than magnitude. It is commonly used in NLP tasks.
* Euclidean (L2) distance measures the actual geometric distance between points in space, making it suitable for spatial and clustering applications.
* The dot product is computationally efficient for high-dimensional sparse vectors and useful in recommendation systems where vector magnitudes are meaningful.
* Hamming distance measures the number of positions at which two sequences differ, making it suitable for comparing binary strings or for error detection in data transmission.
* Manhattan (L1) distance calculates the sum of absolute differences between coordinates, making it useful for grid-based pathfinding and when individual dimension differences need to be emphasized.

**Chunking** breaks documents into smaller segments for retrieval systems. Approaches should consider factors like semantic coherence, size consistency, and overlap. Common ones include fixed-length splits, sentence/paragraph/section boundaries, sliding windows, and semantic chunking based on NLP.

**Adaptive** chunking dynamically adjusts segment sizes based on content characteristics, semantic boundaries, and context requirements rather than using fixed-length splits.

A **dynamic retrieval window** adjusts the amount of retrieved context based on the complexity of the query.

Fallback strategies should be considered for cases where none of the retrieved chunks are relevant.

### Chapter 8: Chatting with your data

This chapter was based around an example of applying RAG. There wasn't much in the way of new concepts.

### Chapter 9: Tailoring models with model adaptation and fine-tuning

Plenty of good stuff and sensible advice in this chapter, but nothing that I've not seen before in places such as [Generative AI with large language models](/technical-courses/generative-ai-with-large-language-models.md), [Machine learning in production](/technical-courses/machine-learning-in-production.md), and [DeepLearning.AI short courses](/technical-courses/deeplearning.ai-short-courses.md#improving-the-accuracy-of-llm-applications).

## Part 3: Deployment and ethical considerations

This section is a mixture of big-picture stuff and lists of libraries, metrics, etc. that you might want to use. While a lot of it is valuable, there's nothing in particular that it seems to make sense to summarize here. I'll just check the book itself when applicable (or do some searching online for the areas like the list of libraries that will be hopelessly out of date within a few months).


---

# Agent Instructions: Querying This Documentation

If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the `ask` query parameter:

```
GET https://www.raoulharris.com/technical-books/generative-ai-in-action.md?ask=<question>
```

The question should be specific, self-contained, and written in natural language.
The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.
