We had a report of crashes occurring for certain users when accessing a system. From the stack data in the production logs, a timeout was occurring when running a specific stored procedure. This procedure was written around 5 years ago and is in use in many customer databases without issue. Why would the same SQL suddenly start timing out in one particular database?

The stored procedure in question is called for users with certain permissions to highlight outstanding units of work that their access level permits them to do, and is a fairly popular (and useful) feature of the software.

After obtaining session information from the crash logs, it was time to run the procedure on a copy of the live database with session details. The procedure only reads information, but doing this on a copy helps ensure no ... accidents occur.

sql
EXEC [Data].[GetX] @strSiteId = 'XXX', @strUserGroupId = 'XXX', @strUserName = 'XXX'

And it took... 27 seconds to return 13 rows. Not good, not good at all.

An example of a warning and explanation in a query plan
An example of a warning and explanation in a query plan

Viewing the query plan showed something interesting though - one of the nodes was flagged with a warning symbol, and when the mouse was hovered over it it stated

Type conversion in expression (CONVERT_IMPLICIT(nvarchar(50),[Pn].[SiteId],0)) may affect "CardinalityEstimate" in query plan choice

Time to check the procedure's SQL as there shouldn't actually be any conversions being done, let alone implicit ones.

I can't publish the full SQL in this blog, so I've chopped out all the table names and field names and used dummy aliases. The important bits for the purposes of this post are present though, although I apologize that it's less than readable now.

sql
CREATE PROCEDURE [Data].[GetX]
  @strSiteId nvarchar (50)
, @strUserGroupId varchar (20)
, @strUserName nvarchar (50)
AS
BEGIN

  SELECT [Al1].[X]
       , [Al1].[X]
       , [Al1].[X]
       , [Al1].[X]
    INTO [#Access]
    FROM [X].[X] [Al1]
   WHERE [Al1].[X] = @strUserName
     AND [Al1].[X] = @strUserGroupId
     AND [Al1].[X] = 1
     AND [Al1].[X] = 1

  SELECT DISTINCT [Pn].[Id] [X]
             FROM [Data].[X] [Pn]
       INNER JOIN [Data].[X] [Al2]
               ON [Al2].[X]      = [Pn].[Id]
              AND [Al2].[X]      = 0
       INNER JOIN [Data].[X] [Al3]
               ON [Al3].[X]      = [Al2].[Id]
              AND [Al3].[X]      = 0
       INNER JOIN [Data].[X] [Al4]
               ON [Al4].[X]      = [Al3].[Id]
              AND [Al4].[X]      = 0
       INNER JOIN [Data].[X] [Al5]
               ON [Al5].[X]     = [Al4].[Id]
              AND [Al5].[X]     = 0
              AND [Al5].[X]     = 1
              AND [Al5].[X]     = 0
       INNER JOIN [#Access]
               ON [#Access].[X] = [Al5].[X]
              AND [#Access].[X] = [Al2].[X]
              AND [#Access].[X] = [Al3].[X]
              AND [#Access].[X] = [Al4].[X]
            WHERE EXISTS (
                           SELECT [X]
                             FROM [X].[X] [Al6]
                            WHERE [Al5].[X]   = [Al6].[X]
                              AND [Al5].[X]   = [Al6].[X]
                              AND [Al6].[X]   = 1
                         )
              AND [Pn].[SiteId] = @strSiteId;
  
  DROP TABLE [#Access]

END;

The SQL is fairly straight forward - we join a bunch of different data tables together based on permissions, data status and where the [SiteId] column matches the lookup value, return return a unique list of core identifiers. With the exception of [SiteId] all those joins on [Id] columns are integers.

Yes, [SiteId] is the primary key in a table. Yes, I know it isn't a good idea using string keys. It was a design decision made over 8 years ago and I'm sure at some point these anomalies will be changed. But it's a side issue to what this post is about.

As the warning from the query plan is quite explicit about the column it's complaining about, it is now time to check the definition of the table containing the [SiteId] column. Again, I'm not at liberty to include anything other than the barest information to show the problem.

sql
CREATE TABLE [X].[X]
(
  [SiteId] varchar(50) NOT NULL CONSTRAINT [PK_X] PRIMARY KEY
  ...
);
GO

Can you see the problem? The table defines [SiteId] as varchar(50) - that is, up to 50 ASCII characters. The stored procedure on the other hand defines the @strSiteId parameter (that is used as a WHERE clause for [SiteId]) as nvarchar(50), i.e. up to 50 Unicode characters. And there we go, implicit conversion from Unicode to ASCII that for some (still unknown at this stage) reason destroyed the performance of this particular database.

After changing the stored procedure (remember I'm on a copy of the production database!) to remove that innocuous looking n, I reran the procedure which completed instantly. And the warning has disappeared from the plan.

A plan for the same procedure after deleting a single character
A plan for the same procedure after deleting a single character

The error probably originally occurred as a simple oversight - almost all character fields in the database are nvarchar's. Those that are varchar are ones that control definition data that cannot be entered, changed or often even viewed by end users. Anything that the end user can input is always nvarchar due to the global nature of the software in question.

Luckily, it's a simple fix, although potentially easy to miss, especially as you might immediately assume the SQL itself is to blame and try to optimize that.

The take away from this story is simple - ensure that the data types for variables you use in SQL match the data types of the fields to avoid implicit conversions that can cause some very unexpected and unwelcome performance issues - even years after you originally wrote the code.

Update History

  • 2016-05-06 - First published
  • 2020-11-21 - Updated formatting

Like what you're reading? Perhaps you like to buy us a coffee?

Donate via Buy Me a Coffee

Donate via PayPal


Comments