After learning about different joins, it’s now time to dive into self-joins! At first glance, self-joins might sound complicated, but they’re not. Think of it as a table joining itself, like a table of Amazon transactions comparing itself for insights. It’s simpler than it seems! 😉
In this tutorial, we’ll explore self-joins using the Goodreads dataset to create personalized book recommendations.
Self-Joins Example: Personalizing Book Suggestions
Imagine you’re part of the Goodreads team designing a recommendation system. Your goal is to suggest books based on a user’s preferences. For example, if a user loves romance, you’ll suggest more romantic books.
Finding Similar Books Within the Same Genre
Here’s a query that generates meaningful recommendations:
SELECT
b1.genre,
b1.book_title AS current_book,
b2.book_title AS suggested_book
FROM goodreads AS b1
INNER JOIN goodreads AS b2
ON b1.genre = b2.genre
WHERE b1.book_id != b2.book_id
ORDER BY b1.book_title;
Breakdown:
Self-Join on Genre: The
ON
clause matches rows with the same genre.Exclude Identical Books: The
WHERE
clause ensures no book is matched with itself.Output: A list of genres, current books, and suggested books.
Example Output:
current_genre | current_book | suggested_book |
Non-Fiction | Ace the Data Science Interview | Data Engineering Cookbook |
Non-Fiction | Ace the Data Science Interview | Building Data-Intensive Apps with Flask and SQLAlchemy |
Non-Fiction | Ace the Data Science Interview | Blink: The Power of Thinking Without Thinking |
Non-Fiction | Ace the Data Science Interview | The Power of Habit |
Non-Fiction | Ace the Data Science Interview | Storytelling with Data: A Data Visualization Guide |
Cool, right? 🤓 You’re crafting personalized recommendations for book enthusiasts!
Expanding the Suggestions: A Second Book
What if you wanted to recommend not one but two additional books? Here’s how you can level up your query:
Taking Book Recommendations to the Next Level
SELECT
b1.genre,
b1.book_title AS current_book,
b2.book_title AS suggested_book_1,
b3.book_title AS suggested_book_2
FROM goodreads AS b1
INNER JOIN goodreads AS b2
ON b1.genre = b2.genre
INNER JOIN goodreads AS b3
ON b1.genre = b3.genre
WHERE b1.book_id != b2.book_id
AND b1.book_id != b3.book_id
AND b2.book_id != b3.book_id
ORDER BY b1.book_title
LIMIT 50;
Breakdown:
Additional Self-Joins: The query includes two more self-joins to fetch a second suggestion.
Avoid Repeated Suggestions: Conditions ensure that all three books in the recommendation set are distinct.
Performance Tip: The
LIMIT
clause prevents performance issues with large datasets.
Example Output:
current_genre | current_book | suggested_book_1 | suggested_book_2 |
Non-Fiction | Ace the Data Science Interview | Python for Data Analysis | Designing Data-Intensive Applications |
Non-Fiction | Ace the Data Science Interview | Python for Data Analysis | Data Engineering with Python |
Non-Fiction | Ace the Data Science Interview | Python for Data Analysis | Fundamentals of Data Engineering |
Non-Fiction | Ace the Data Science Interview | Python for Data Analysis | Educated: A Memoir |
Non-Fiction | Ace the Data Science Interview | Python for Data Analysis | Data Science for Business |
And there you have it! Self-joins in action to make recommendations more insightful.
Self-Join SQL Interview Question
Well-Paid Employees
Here’s a classic SQL interview question: Identify employees who earn more than their direct managers.
Problem Setup:
You’re given a table employee
:
employee_id | name | salary | department_id | manager_id |
1 | Emma Thompson | 3800 | 1 | 6 |
2 | Daniel Rodriguez | 2230 | 1 | 7 |
3 | Olivia Smith | 7000 | 1 | 8 |
4 | Noah Johnson | 6800 | 2 | 9 |
5 | Sophia Martinez | 1750 | 1 | 11 |
6 | Liam Brown | 13000 | 3 | NULL |
7 | Ava Garcia | 12500 | 3 | NULL |
8 | William Davis | 6800 | 2 | NULL |
Query:
SELECT
e1.employee_id,
e1.name AS employee_name
FROM employee AS e1
INNER JOIN employee AS e2
ON e1.manager_id = e2.employee_id
WHERE e1.salary > e2.salary;
Breakdown:
Self-Join on Manager ID: Matches employees with their managers.
Condition: Filters rows where the employee’s salary is greater than their manager’s salary.
Example Output:
employee_id | employee_name |
3 | Olivia Smith |
Olivia Smith earns $7,000, which is more than her manager, William Davis, who earns $6,800.
Summary
Self-Joins allow a table to join itself, often to find relationships within the same dataset.
They’re useful for tasks like recommendations and hierarchical data analysis.
Practice with queries like book suggestions or comparing employee salaries to master self-joins confidently.