Advanced SQL Server masterclass for data analytics

https://www.udemy.com/course/advanced-sql-server-masterclass-for-data-analysis/

I thought this course gave a good overview of topics that aren't typically covered in a basic SQL course. There were also plenty of practical examples and exercises.

Some of the topics covered:

Window functions
Correlated subqueries
EXISTS and NOT EXISTS
PIVOT
Generating data series with Recursive CTEs
Using CTEs and temporary tables to break processes into manageable steps
Defining and manipulating tables with DDL and DML
Lookup tables
SQL optimization techniques
Procedural programming techniques like variables and IF statements
User-defined functions
Stored procedures
Dynamic SQL

Notes

Window functions perform calculations across a set of rows that are related to the current row. They operate on a window of data, which is a set of rows defined by a partition and an ordering. This allows for things like running totals, rankings, and moving averages without the need for self-joins or subqueries.

Scalar subqueries return a single value, while table-valued subqueries return multiple rows and columns. (You can also have column subqueries and row subqueries.)

Correlated subqueries depend on the outer query for their values. They are executed once for each row processed by the outer query.

Common table expressions (CTEs) are named temporary result sets. They can improve the readability and maintainability of complex queries by breaking them down into simpler parts. Recursive CTEs are useful for generating sequences.

You can also store intermediate steps in temporary tables. These allow you to persist the data for the duration of the session or to apply an index. They also help with debugging as you can check the values directly, but they can't be used recursively. You would use temporary tables over CTEs for large datasets or in cases where the same intermediate result is used multiple times/across multiple queries.

An index is a data structure that allows the database engine to locate rows more easily. While indexes can significantly speed up SELECT queries, they can slow down INSERT, UPDATE, and DELETE operations.

A clustered index determines the order in which the records are physically stored. This makes them useful for , but means that each table can only have one clustered index. A non-clustered index is stored separately with pointers to the data. This means that you can have multiple such indices and that they have less of an impact on INSERT, UPDATE, and DELETE operations, though they require additional storage.

Hash joins use a hash table to find matching rows, while merge joins sort both tables and iterate through them simultaneously. This means that merge joins are more memory efficient, but they may be slower if the tables aren't already sorted.

Concurrency control is important to ensure data integrity and consistency. Some potential issues:

Lost update: One transaction's changes are overwritten by another.
Dirty read: Reading uncommitted data from another transaction.
Non-repeatable read: Getting different results from repeated reads within a transaction.
Phantom read: New rows appear in a repeated read.

Two-phase locking ensures that all locks are acquired before any are released, which guarantees serializability but can lead to deadlocks. Optimistic concurrency control assumes conflicts are rare and checks for conflicts before committing. If a conflict is detected, the transaction is rolled back. Multiversion concurrency control keeps multiple copies of the data.

Views are virtual tables based on the result set of an SQL statement. They encapsulate queries and provide a layer of abstraction over the underlying table structure. They can simplify data access, enforce security by restricting access to certain columns or rows, and provide a consistent interface to the data even if the underlying table structure changes. Materialized (or indexed) views store the results, which can be useful from a performance perspective. Updatable views allow INSERT, UPDATE, and DELETE statements.

An execution plan is a step-by-step breakdown of how the database engine will execute a given query. It shows the operations performed, their order, and the estimated cost of each step, which is useful for query optimization.

Variables can be used to store intermediate results, control program flow, or parameterize queries. They are particularly useful in stored procedures and functions for creating more dynamic and flexible code.

User-defined functions (UDFs) are custom functions that encapsulate commonly used logic or calculations. They promote code reuse, improve maintainability, and can simplify complex queries, but can sometimes impact query performance.

Stored procedures are precompiled collections of SQL statements with optional control-flow statements.

Functions are more about "getting something", while stored procedures are more about "doing something".

Dynamic SQL statements are constructed at runtime, often by concatenating strings. This allows for more flexible queries that can adapt based on user input or changing conditions, but can make queries harder to debug and maintain. It is important to parameterize them or apply input validation in order to prevent SQL injection attacks. Dynamic SQL is often used in scenarios where the exact structure of a query isn't known until runtime.

Additional comments from Claude

I asked Claude 3.5 Sonnet if I'd missed anything or got anything wrong. Its suggestions are below. I don't recall covering indexes being mentioned in the course, but it did discuss the other points.

Subqueries: You could add that scalar subqueries can be used in SELECT, WHERE, and HAVING clauses, while table-valued subqueries are often used in FROM clauses or with IN operators.
Indexes: It might be worth mentioning that covering indexes (which include all columns needed for a query) can be particularly beneficial for performance.
Concurrency Control: You could add that these issues are often managed through different isolation levels in database transactions.
Views: It's worth noting that not all views are updatable; it depends on the complexity of the view definition.
Variables, UDFs, and Stored Procedures: It's worth noting that the exact implementation and capabilities can vary between different database systems.

Last updated 10 months ago