main banner

Development

How to improve SQL performance - Antipatterns

Ever come close to writing a perfectly looking and valid SQL just to see it take hours to run? SQL optimization might be the thing you need! For business and customers, time is limited and precious and as Software Developers, QA Engineers, DB Developers, etc we should strive to improve the efficiency of our queries to save their, and our time.

To improve the performance of your SQL query, you will need first to consider that the query goes through several steps at execution and is parsed into a “parse tree”. Then, the query is analyzed to validate its syntactical and semantic requirements. The parser creates an internal representation of the input query. This output is then passed on to the rewrite engine.

 

After these steps, the SQL optimizer tries to find the optimal execution plan, which defines the algorithm used for each operation and how the execution of the operations is orchestrated. The optimizer enumerates all possible execution plans, determines the quality or cost of each plan, takes information about the current database state, and then chooses the best one as the final execution plan. Because query optimizers can be imperfect, database users and administrators sometimes need to manually examine and tune the plans produced by the optimizer to get better performance.

 

The number of disk I/Os that are required to evaluate the plan, the plan’s CPU cost and the overall response time that can be observed by the database client, and the total execution time are essential factors to determine the quality or cost of each execution plan.

 

Next, the chosen execution plan is run, evaluated by the system’s execution engine, and the results of your query are returned.

 

The main “pain” points where issues might be present for optimization are:

  • The WHERE clause;

  • Any INNER JOIN or LEFT JOIN keywords; And,

  • The HAVING clause;

In this post, I will be focusing on the anti-patterns to try to avoid and the aspects to consider for improving the execution time of your queries.

WHERE Clause

Context is important. Whether you need to rewrite your query or not and how you would do it depends on the amount of data, the database, and the number of times you need to execute the query, among other things.

1. Only SELECT the columns you need, LIKE responsibly, and look out for DISTINCT results.

SELECT Statement

Always remove unnecessary columns from your SELECT statement so you make a habit to pull up data that serves your query goal.

 

If you have correlated subqueries that use EXISTS, try to use a constant in the SELECT statement of the subquery instead of selecting the value of an actual column. 

 

For example:

 

 SELECT security_id, name                                   

 FROM security

 WHERE EXISTS                                             

        (SELECT '1'                                              

        FROM rating                                             

        WHERE rating.security_id = security.security_id); 

Tip: You can always consider getting rid of correlated subqueries by rewriting them with an INNER JOIN:

 SELECT security_id, name                                   
 
FROM security                                             
 
INNER JOIN rating ON rating.security_alias = security.security_alias; 

 

DISTINCT Clause

 

The SELECT DISTINCT statement is used to return only different values for the record columns included in your query. You should try to avoid the use of DISTINCT if possible since its inclusion increases the execution time. 

 

LIKE Operator

 

When you use the LIKE operator in a query, the index isn’t used if the pattern starts with % or _. It will prevent the database from using an index - if it is available. 

 

If not used correctly, it might cause too many records to be retrieved that don’t necessarily satisfy your query goal. Your knowledge of the data stored in the database can help you formulate a pattern that will filter correctly through all the data to find only the rows that matter for your query.

 

2. Limit Your Results

 

TOP, LIMIT, And ROWNUM Clauses

 

You can add the LIMIT or TOP clauses to your queries to set a maximum number of rows for the result set. Here are some examples:

 

SQL Server

  SELECT TOP 3 * 

  FROM security;

 

MySQL

  SELECT * 

  FROM security

  LIMIT 3;

 

Oracle

  SELECT * 

  FROM security

  ORDER BY security_id

  FETCH FIRST 3 ROWS ONLY;

 

It is also possible to specify the PERCENT instead of a specific number if you change the first line of the query by SELECT TOP 50 PERCENT *.

 

SQL Server

  SELECT TOP 50 PERCENT * 

  FROM security;

 

MySQL

  SELECT * 

  FROM security

  WHERE RAND() < 0.5

 

Oracle

  SELECT * 

  FROM security

  ORDER BY security_id

  FETCH FIRST 50 PERCENT ROWS ONLY;

 

Additionally, you can also add the ROWNUM clause, which is equivalent to using LIMIT in your query:

 

  SELECT *

  FROM rating

  WHERE alpha = ‘A’ AND ROWNUM <= 3;

 

Data Type Conversions

 

Whenever possible, use the smallest data types possible for higher efficiency. However, consider that whenever you add data type conversion to your query, you increase the execution time.

 

An alternative is to avoid data type conversion as much as possible. Although it is not always possible to remove or omit the data type conversion from your queries, you should aim to be careful in including them and that when you do, you test the effect of the addition before you run the query.

 

3. Keep it simple

 

Queries can become complex quickly. Be careful and consider the next tips to keep them simple.

 

OR Operator

 

When you use the OR operator in your query, likely, you’re not using an index.

 

Remember that an index is a data structure that improves the speed of the data retrieval in your database table, but it comes at a cost: there will be additional writes and additional storage space is needed to maintain the index data structure. Indexes are used to quickly locate or lookup data without having to search every row in a database every time the database table is accessed. Indexes can be created by using one or more columns in a database table.

 

If you don’t make use of the indexes that the database includes, your query will inevitably take longer to run. That’s why it’s best to look for alternatives to using the OR operator in your query.

 

Consider the following query:

 

SELECT security_id, name

FROM security

WHERE security_id = 123456

OR security_id = 678910

OR security_id = 345678;

 

You can replace the operator with one of the following approaches:

 

  • A condition with IN operator; or


SELECT security_id, name

FROM security

WHERE security_id IN (123456, 678910, 345678);

 

  • Two SELECT statements with a UNION.

 

SELECT security_id, name

FROM security

WHERE security_id = 123456;

UNION

SELECT security_id, name

FROM security

WHERE security_id = 678910

UNION

SELECT security_id, name

FROM security

WHERE security_id = 345678;

 

Tip: Be careful not to unnecessarily use the UNION operation because you go through the same table multiple times. Also, you have to realize that when you use a UNION in your query, the execution time will increase. Alternatives to the UNION operation are: reformulating the query in such a way that all conditions are placed in one SELECT instruction or using an OUTER JOIN instead of UNION.

 

NOT Operator

 

When your query contains the NOT operator, likely, the index is not used, just like with the OR operator. This will inevitably slow down your query. If you don’t know what is meant here, consider the following query:

 

SELECT security_id, name

FROM security

WHERE NOT (creation_year > 2020);

 

This query will run slower than you would maybe expect, mainly because it’s formulated a lot more complex than it could be: in cases like this one, it’s best to look for an alternative. Consider replacing NOT by comparison operators, such as >, <> or!>; The example above might indeed be rewritten and become something like this:

 

SELECT security_id, name

FROM security

WHERE creation_year <= 2020;

 

AND Operator

 

The AND operator is another one that doesn’t make use of the index, and that can slow your query down if used in an overly complex and inefficient way, like in the example below:

 

SELECT security_id, name

FROM security

WHERE creation_year >= 2011 AND creation_year <= 2020;

 

It’s better to rewrite this query and use BETWEEN operator:

 

SELECT security_id, name

FROM security

WHERE creation_year BETWEEN 2011 AND 2020;

 

ANY and ALL Operators

 

Similar to the previously discussed OR, AND, and NOT operators, indexes configured in the database will not be used when using the ANY and ALL operators. Alternatives that will come in handy here are aggregation functions like MIN or MAX.

 

Tip: in cases where you make use of the proposed alternatives, you should be aware of the fact that all aggregation functions like SUM, AVG, MIN, MAX over many rows can result in a long-running query. In such cases, you can try to either minimize the number of rows to handle or pre-calculate these values. You see once again that it’s important to be aware of your environment, your query goal, … when you make decisions on which query to use!

 

Isolate Columns in Conditions

 

Also in cases where a column is used in a calculation or a scalar function, the index isn’t used. A possible solution would be to simply isolate the specific column so that it is no longer a part of the calculation or the function. Consider the following example:

 

SELECT security_id, name

FROM security

WHERE creation_date + 10 = 2020;

 

Instead, try reconsidering the calculation and rewriting the query to something like this:

 

SELECT security_id, name

FROM security

WHERE creation_date = 2010;

 

4. No Brute Force

 

This last tip means that you shouldn’t try to restrict the query too much because it can affect its performance. This is especially true for joins and the HAVING clause.

 

JOIN

 

The Order Of Tables

 

When you join two tables, it can be important to consider the order of the tables in your join. If you notice that one table is considerably larger than the other one, you might want to rewrite your query so that the biggest table is placed last in the join.

 

Redundant Conditions on Joins

 

When you add too many conditions to your joins, you force SQL to choose a particular path. It may be the case that this path isn’t the most efficient one.

 

HAVING Clause

 

HAVING is typically used with the GROUP BY clause to restrict the groups of returned rows to only those that meet certain conditions. However, if you use this clause in your query, the index is not used, which can result in longer execution times.

 

If you’re looking for an alternative, consider using the WHERE clause. Consider the following queries:

  SELECT type, COUNT(*)

  FROM security

  WHERE type IN ('FI', 'EQ')

  GROUP BY type

  ORDER BY type

 

  SELECT type, COUNT(*)

  FROM security

  GROUP BY type 

  HAVING state IN ('FI', 'EQ')

  ORDER BY type 

 

The first query uses the WHERE clause to restrict the number of rows that need to be summed, whereas the second query sums up all the rows in the table and then uses HAVING to throw away the sums it calculated. In these types of cases, the alternative with the WHERE clause is the better one, as you don’t waste any resources.

 

You will see that this is not about limiting the result set, but instead about limiting the intermediate number of records within a query.

 

Note that the difference between these two clauses lies in the fact that the WHERE clause introduces a condition on individual rows, while the HAVING clause introduces a condition on aggregations or results of a selection where a single result, such as MIN, MAX, SUM,… has been produced from multiple rows.

 

Evaluating the quality or cost of the execution plan, writing and rewriting of queries is not an easy job when you take into account that they need to be as efficient as possible. Avoiding anti-patterns and considering alternatives will also be a part of the responsibility when you write queries that you want to run on databases in a professional environment. Hopefully, this post will help beginners to improve their SQL skills. If you are interested to learn more about SQL and query optimization, I would recommend taking the EDX course: Querying with Transact-SQL, this will give you more insight into SQL and why it is important to provide value to users. 

 

Sources:

1 https://courses.edx.org/courses/course-v1:Microsoft+DAT201x+2015_T4/courseware/ae9138f9a2d54963a0d1e99d4cf1c433/d11902daaf0c42319e5e00da854839e3/

2 https://www.quest.com/community/blogs/b/database-management/posts/sql-query-optimization-best-practices-for-improved-performance

3 https://www.sqlshack.com/query-optimization-techniques-in-sql-server-the-basics/

Jorge C.

Tech and gaming enthusiast. He loves coffee, action/old/mecha movies, and anime.

Articles