Click here to Skip to main content
15,881,938 members
Articles / Database Development

Joins vs Subqueries SQL Puzzle

Rate me:
Please Sign up or sign in to vote.
4.00/5 (1 vote)
7 Mar 2022CPOL5 min read 4.2K   6   4
Puzzle to teach joins vs subqueries in SQL
Solving puzzles is a great way to learn SQL. In this puzzle, we’re going to learn about joins vs subqueries.

In many queries, you can substitute joins and subqueries. Yet, since each has its strengths, it isn’t wise to do so. Once you have worked through this puzzle, you see some of the benefits and disadvantages of joins vs subqueries.

Solving puzzles is a great way to learn SQL. Nothing beats practicing what you’ve learned. Once you have figured out the puzzle, post your answer in the comments so we all can learn from one another.

SQL Puzzle Question – Joins vs Subqueries

The system admin would like a report of active users. Can you write a report that returns the name, phone number and most recent date for any user that has logged in the 30 days since March14th, 2017?

(You can tell a user has logged in if the action field in UserHistory is set to “Logged On”).

User

  • userID
  • name
  • phoneNumber

UserHistory

  • userID
  • actionDate
  • action

Every time a user logs in, a new row is inserted into the UserHistory table with userID, current date and action (where action = “Logged On”).

Question #1 – Write a SQL query without using Subqueries.

Question #2 – Write the SQL using subqueries.

As you answer the questions, think about the differences of joins vs subqueries. Which technique is better suited to solve the problem? Can you see advantages to either case?

To get you started, download this script which contains the table definitions and some test data.

Answer to Question #1 – INNER JOIN

Like most queries I write, I took this one step by step. Let me show you what I mean by walking you through my thought process.

Regardless of how this query turned out, I knew I would have to figure out how to get the logins 30 days since 2017-03-14. To do this, I wrote a simple query that used the DATEDIFF function to calculate the number of days from the actionDate to the 14th. If the days were less than or equal to 30, I knew the entry was within the window.

SQL
SELECT *,
       DATEDIFF(DAY, actionDate, '2017-03-14')
FROM   @UserHistory
WHERE  action = 'Logged On'
       AND DATEDIFF(DAY, actionDate, '2017-03-14') <= 30;

DATEDIFF Coding Hack

You’ll see that I included the result of the DATEDIFF in query result. I did this because I always get the parameter mixed up! By displaying the result, I could verify that I had the formula correct; therefore, being used in my WHERE clause as expected.

Here is what I saw:

DATEDIFF result

Now that I knew I could get actionDate within the window, I proceeded to join the UserHistory with User so I could include the PhoneNumber in my results. I joined the tables by userID.

SQL
SELECT U.Name,
       U.PhoneNumber,
       UH.actionDate AS RecentLogonDate
FROM   @User AS U
       INNER JOIN @UserHistory AS UH
       ON U.userID = UH.userID
WHERE  action = 'Logged On'
       AND DATEDIFF(DAY, actionDate, '2017-03-14') <= 30;

Expected Duplicates

Given there are several UserHistory entries per user, I would expect to see duplicates. I’ve highlighted some of those below:

Duplicate Entries Due to Join

The last task which remains is to only show the most recent login date for each user. For Bob, this would be 2017-03-02.

Notice, that for each user, the most recent date is the maxim date for that user. This is a big hint, as it points us towards using aggregate functions such as MAX.

Joins versus Subqueries - GROUBYBY hint

So, at this point, we can use GROUP BY to calculate the maximum date for each user.

SQL
SELECT   U.Name,
         U.PhoneNumber,
         MAX(UH.actionDate) AS RecentLogonDate
FROM     @User AS U
         INNER JOIN @UserHistory AS UH
         ON U.userID = UH.userID
WHERE    action = 'Logged On'
         AND DATEDIFF(DAY, actionDate, '2017-03-14') <= 30
GROUP BY U.Name, U.phoneNumber;

Whose result is:

 Final INNER JOIN Result

Key Takeaways

There are a couple of takeaways from this:

  • First, here is no shame is doing a query to get intermediate results. When using formulas, such as DATEDIFF, make sure you’re getting the calculation you expect.
  • Second, it can take more than one technique to solve a problem. In our case, we first join the items, and once we have the result, use GROUP BY to get the most recent date.
  • Last, keep it simple! At first, I thought I would have to do another join so I could compare dates, but I realized that was going to get complicated. Also, I figured I could partition the data and use a window function, but that seemed like overkill. Once I realized I simply had to calculate the MAX, I realized GROUP BY would work.

I’m sure there are other ways to solve this puzzle. So, what answer did you get? Please share in the comments.

Answer to Question #2 – Subquery

Let’s look at the second piece of the puzzle when it comes to joins vs subqueries; the subquery.

This query is elegant in that IN operator naturally remove duplicates, but gets ugly as another sub query is needed to pull in the most recent date.

What you’ll see is that subqueries are really great when you need to return rows from one table based on the existence of one or more conditions in another, but not so swift at combining data from one table with another.

To show you what I mean, check out this query which shows the user and phone number for an actionDate within the windows:

SQL
SELECT U.Name,
       U.PhoneNumber
FROM   @User AS U
WHERE  U.UserID IN (SELECT UserID
                    FROM   @UserHistory
                    WHERE  action = 'Logged On'
                    AND DATEDIFF(DAY, actionDate, '2017-03-14') <= 30);

Natural Uniqueness via Subquery

In this case, we naturally get a unique list of name and phone numbers as we are listing users.

The uniqueness is guaranteed by the fact the table naturally contains unique value (yes, I know I didn’t define a Primary Key on the table, but hopefully you see it could be one).

The subquery in the WHERE clause compiles a list of userIDs whose actionDate falls within the 30-day window.

Joins versus Subqueries - Subquery but no ActionDate

Of course, our query is only partially written, as it doesn’t display the latest actionDate.

If I could dream a bit (the dreamy bits are in bold), I would love to be able to write a query like:

SQL
SELECT U.Name,
       U.PhoneNumber,
       (SELECT Max(UH.actionDate)
        FROM   @UserHistory AS UH
        WHERE  action = 'Logged On'
        AND DATEDIFF(DAY, actionDate, '2017-03-14') <= 30
AND    U.userID = UH.userID) AS RecentLogonDate
FROM   @User AS U
WHERE  DATEDIFF(DAY, RecentLogonDate, '2017-03-14') <= 30
       AND RecentLogonDate IS NOT NULL

But this isn’t a properly written query, and SQL gets mad!

Msg 207, Level 16, State 1, Line 62
Invalid column name 'RecentLogonDate'.
Msg 207, Level 16, State 1, Line 63
Invalid column name 'RecentLogonDate'.

Forced To Use Two Subqueries!

What makes our subquery ugly is being forced to use two subqueries to retrieve the most recent actionDate. To do this, you must essentially repeat the query, but now as a correlated sub query. It just a match of the User table UserID to that in UserHistory.

SQL
SELECT U.Name,
       U.PhoneNumber,
       (SELECT Max(UH.actionDate)
        FROM   @UserHistory AS UH
        WHERE  action = 'Logged On'
               AND DATEDIFF(DAY, actionDate, '2017-03-14') <= 30
               AND U.userID = UH.userID) AS RecentLogonDate
FROM   @User AS U
WHERE  U.UserID IN (SELECT UserID
                    FROM   @UserHistory
                    WHERE  action = 'Logged On'
                    AND DATEDIFF(DAY, actionDate, '2017-03-14') <= 30);

Conclusion

Now that you’ve seen the differences of joins vs subqueries, I’m hoping you have a better understanding on when to use one versus the other. To summarize, joins excel at combining data from two tables, subqueries are best when testing for the existence of a value from one table found in another.

What are your experiences with joins versus subqueries? Which do you prefer to use when? Please share in the comments!

License

This article, along with any associated source code and files, is licensed under The Code Project Open License (CPOL)


Written By
Easy Computer Academy, LLC
United States United States
Hello my name is Kris. I’m here because I am passionate about helping non-techie people to overcome their fear of learning SQL.

I know what it is like to not know where to start or whether the time spent learning is worth the effort. That is why I am here to help you to:
- Get started in an easy to follow step-by-step manner.
- Use your time wisely so you focus on what is important to learn to get the most value from your time.
- Answer your questions. Really! Just post a comment and I’ll respond. I’m here to help.

It wasn’t long ago that I was helping a colleague with some reporting. She didn’t know where to start and soon got overwhelmed and lost as she didn’t know SQL.

I felt really bad, as she was under pressure to get some summary information to her boss, the built-in reports were falling short, and to make them better would require her to know SQL. At that time that seemed impossible! It in dawned on me, it doesn’t have to be that way.

Then I discovered a way for anyone with the desire to easily learn SQL. I worked with my co-worker, started to teach her what I learned and soon she was able to write reports and answer her boss’ questions without getting stressed or ploughing hours into manipulating data in Excel.

It hasn’t always been easy. Sometimes the information seems abstract or too conceptual. In this case I’ve found out that a visual explanation is best. I really like to use diagrams or videos to explain hard-to-grasp ideas.

Having video, pictures, and text really help to reinforce the point and enable learning.

And now I want to help you get the same results.

The first step is simple, click here http://www.essentialsql.com/get-started-with-sql-server/

Comments and Discussions

 
PraiseMessage Closed Pin
6-Apr-22 3:57
Member 126108696-Apr-22 3:57 
QuestionSolution Option for join Pin
Mark Runyan11-Mar-22 7:02
Mark Runyan11-Mar-22 7:02 
QuestionSimplified Sub Query, Date Compare Pin
David On Life9-Mar-22 10:03
David On Life9-Mar-22 10:03 
SuggestionCombination is key Pin
Minifuss9-Mar-22 1:52
Minifuss9-Mar-22 1:52 
SuggestionCte Pin
djhislop1498-Mar-22 3:45
djhislop1498-Mar-22 3:45 

General General    News News    Suggestion Suggestion    Question Question    Bug Bug    Answer Answer    Joke Joke    Praise Praise    Rant Rant    Admin Admin   

Use Ctrl+Left/Right to switch messages, Ctrl+Up/Down to switch threads, Ctrl+Shift+Left/Right to switch pages.