SQL Self-Joins with Examples

After learning about different joins, it’s now time to dive into self-joins! At first glance, self-joins might sound complicated, but they’re not. Think of it as a table joining itself, like a table of Amazon transactions comparing itself for insights. It’s simpler than it seems! 😉

In this tutorial, we’ll explore self-joins using the Goodreads dataset to create personalized book recommendations.


Self-Joins Example: Personalizing Book Suggestions

Imagine you’re part of the Goodreads team designing a recommendation system. Your goal is to suggest books based on a user’s preferences. For example, if a user loves romance, you’ll suggest more romantic books.

Finding Similar Books Within the Same Genre

Here’s a query that generates meaningful recommendations:

SELECT
  b1.genre,
  b1.book_title AS current_book,
  b2.book_title AS suggested_book
FROM goodreads AS b1
INNER JOIN goodreads AS b2
  ON b1.genre = b2.genre
WHERE b1.book_id != b2.book_id
ORDER BY b1.book_title;

Breakdown:

  • Self-Join on Genre: The ON clause matches rows with the same genre.

  • Exclude Identical Books: The WHERE clause ensures no book is matched with itself.

  • Output: A list of genres, current books, and suggested books.

Example Output:

current_genrecurrent_booksuggested_book
Non-FictionAce the Data Science InterviewData Engineering Cookbook
Non-FictionAce the Data Science InterviewBuilding Data-Intensive Apps with Flask and SQLAlchemy
Non-FictionAce the Data Science InterviewBlink: The Power of Thinking Without Thinking
Non-FictionAce the Data Science InterviewThe Power of Habit
Non-FictionAce the Data Science InterviewStorytelling with Data: A Data Visualization Guide

Cool, right? 🤓 You’re crafting personalized recommendations for book enthusiasts!


Expanding the Suggestions: A Second Book

What if you wanted to recommend not one but two additional books? Here’s how you can level up your query:

Taking Book Recommendations to the Next Level

SELECT
  b1.genre,
  b1.book_title AS current_book,
  b2.book_title AS suggested_book_1,
  b3.book_title AS suggested_book_2
FROM goodreads AS b1
INNER JOIN goodreads AS b2
  ON b1.genre = b2.genre
INNER JOIN goodreads AS b3
  ON b1.genre = b3.genre
WHERE b1.book_id != b2.book_id
  AND b1.book_id != b3.book_id
  AND b2.book_id != b3.book_id
ORDER BY b1.book_title
LIMIT 50;

Breakdown:

  • Additional Self-Joins: The query includes two more self-joins to fetch a second suggestion.

  • Avoid Repeated Suggestions: Conditions ensure that all three books in the recommendation set are distinct.

  • Performance Tip: The LIMIT clause prevents performance issues with large datasets.

Example Output:

current_genrecurrent_booksuggested_book_1suggested_book_2
Non-FictionAce the Data Science InterviewPython for Data AnalysisDesigning Data-Intensive Applications
Non-FictionAce the Data Science InterviewPython for Data AnalysisData Engineering with Python
Non-FictionAce the Data Science InterviewPython for Data AnalysisFundamentals of Data Engineering
Non-FictionAce the Data Science InterviewPython for Data AnalysisEducated: A Memoir
Non-FictionAce the Data Science InterviewPython for Data AnalysisData Science for Business

And there you have it! Self-joins in action to make recommendations more insightful.


Self-Join SQL Interview Question

Well-Paid Employees

Here’s a classic SQL interview question: Identify employees who earn more than their direct managers.

Problem Setup:

You’re given a table employee:

employee_idnamesalarydepartment_idmanager_id
1Emma Thompson380016
2Daniel Rodriguez223017
3Olivia Smith700018
4Noah Johnson680029
5Sophia Martinez1750111
6Liam Brown130003NULL
7Ava Garcia125003NULL
8William Davis68002NULL

Query:

SELECT
  e1.employee_id,
  e1.name AS employee_name
FROM employee AS e1
INNER JOIN employee AS e2
  ON e1.manager_id = e2.employee_id
WHERE e1.salary > e2.salary;

Breakdown:

  • Self-Join on Manager ID: Matches employees with their managers.

  • Condition: Filters rows where the employee’s salary is greater than their manager’s salary.

Example Output:

employee_idemployee_name
3Olivia Smith

Olivia Smith earns $7,000, which is more than her manager, William Davis, who earns $6,800.


Summary

  • Self-Joins allow a table to join itself, often to find relationships within the same dataset.

  • They’re useful for tasks like recommendations and hierarchical data analysis.

  • Practice with queries like book suggestions or comparing employee salaries to master self-joins confidently.