Learn Databases: Lecture 50 – Advanced Analytics with Window Functions. This lecture introduces Window Functions.
We’ve learned about aggregate functions like `COUNT()`, `SUM()`, and `AVG()`, which summarize data for an entire group. However, these functions collapse the rows into a single summary row. What if you want to perform a calculation across a set of related rows, but still return the individual rows?
This is where Window Functions come in. They perform calculations across a set of table rows that are somehow related to the current row. Unlike aggregate functions, window functions do not collapse the rows into a single output row; they return a value for each row.
The `OVER()` Clause
The key to a window function is the `OVER()` clause. This clause defines the “window” or set of rows on which the function operates. It can include:
- `PARTITION BY`: Divides the rows into groups (like `GROUP BY`, but doesn’t collapse rows).
- `ORDER BY`: Orders the rows within each partition.
Example: Ranking Students
Let’s use our `Enrollments` table. Imagine we want to rank students within each course based on their grade. We’ll need to convert grades to numbers for this.
SELECT
s.StudentName,
c.CourseName,
e.Grade,
CASE e.Grade
WHEN 'A' THEN 4
WHEN 'B' THEN 3
WHEN 'C' THEN 2
WHEN 'D' THEN 1
WHEN 'F' THEN 0
ELSE NULL
END AS GradeValue,
RANK() OVER (PARTITION BY c.CourseName ORDER BY GradeValue DESC) AS CourseRank
FROM Enrollments AS e
INNER JOIN Students AS s ON e.Student_ID = s.StudentID
INNER JOIN Courses AS c ON e.Course_ID = c.CourseID
WHERE e.Grade IS NOT NULL;
Let’s break down `RANK() OVER (PARTITION BY c.CourseName ORDER BY GradeValue DESC)`:
- `RANK()`: This is the window function. It assigns a rank to each row within its partition.
- `OVER()`: Defines the window.
- `PARTITION BY c.CourseName`: Tells the function to treat each `CourseName` as a separate group. The ranking will restart for each new course.
- `ORDER BY GradeValue DESC`: Tells the function to rank students within each course based on their `GradeValue` in descending order (highest grade gets rank 1).
Other Common Window Functions
- `ROW_NUMBER()`: Assigns a unique sequential integer to each row within its partition.
- `DENSE_RANK()`: Similar to `RANK()`, but assigns consecutive ranks without gaps.
- `NTILE(N)`: Divides the rows in a partition into N groups and assigns a group number.
- `LAG(column, offset, default)`: Accesses data from a previous row in the same result set.
- `LEAD(column, offset, default)`: Accesses data from a subsequent row in the same result set.
- `AVG() OVER (…)`, `SUM() OVER (…)`, etc.: You can use aggregate functions as window functions to get a running average, sum, etc.
Why are these useful?
Window functions are incredibly powerful for:
- Ranking: Top N lists, leaderboards.
- Running Totals/Averages: Calculating cumulative sums or moving averages.
- Comparisons: Comparing a row’s value to the previous or next row’s value.
- Advanced Analytics: Many complex business intelligence queries rely on window functions.
In the Next Lecture…
We will continue our final project by exploring more complex queries and scenarios within the Student Enrollment System, solidifying our understanding of joins, subqueries, and window functions.
Lecture 49: Organizing Complex Queries with CTEsv
