Searching for a procedure or diagnosis code in any position
All the data in this article is fake data generated artificially.
Searching for episodes coded with particular diagnosis or procedure codes (ICD or OPCS codes) is a bread-and-butter task for NHS information analysts. When the requestor is only interested in the primary diagnosis, or dominant procedure, the query is very straightforward: you can get everything you need from your episodes table.
However, the situation becomes a little more complex when you need to find episodes where the code could be in any position.
In this case, it helps to have a table which contains one row per procedure code, that looks something like this:
SELECT TOP 100
EpisodeID,
ProcedureNumber,
ProcedureCode
FROM [procedure]
ORDER BY EpisodeID, ProcedureNumber
Each EpisodeID is appears in multiple rows - one for each procedure coded as happening during the episode.
If you join this table on to your episodes table, and use the WHERE
clause, you might be OK in simple cases, but you risk duplication if the same procedure is coded more than once in the same episode. It’s also not easy to see how to extend the query to search for episodes with a particular combination of codes.
The EXISTS operator
One alternative that I like is the EXISTS
operator together with a correlated sub-query. An EXISTS
condition is true if the sub-query that follows it returns some rows. In our case, we’re going to check the procedures table with a sub-query to see if it contains the procedure code we’re interested in for each row of the episode table.
If we’re searching for an episode with a particular procedure code in any position, you can write something like this:
SELECT ipe.EpisodeID
,ipe.StartDate
,ipe.EndDate
FROM [episode] ipe
WHERE EXISTS (SELECT 1
FROM [procedure] ipproc
WHERE ipproc.EpisodeID = ipe.EpisodeID
AND ipproc.ProcedureCode = 'L291')
One advantage of this approach is that you can be completely confident every row of your output represents a single episode. Another is that it is straightforward to amend the subquery or add extra subqueries to search for a combination of diagnosis and procedure codes.
For example, to search for episodes with the procedure code L291 and the diagnosis code I501, you could write:
SELECT ipe.EpisodeID
,ipe.StartDate
,ipe.EndDate
FROM [episode] ipe
WHERE EXISTS (SELECT 1
FROM [procedure] ipproc
WHERE ipproc.EpisodeID = ipe.EpisodeID
AND ipproc.ProcedureCode = 'L291')
AND
EXISTS (SELECT 1
FROM [diagnosis] ipdiag
WHERE ipdiag.EpisodeID = ipe.EpisodeID
AND ipdiag.DiagnosisCode = 'I501')
This also gives a rather neat way to search for two consecutive procedure codes. This is a rare ask, but it does come up occasionally in reference to approach codes.
SELECT ipe.EpisodeID
,ipe.StartDate
,ipe.EndDate
FROM [episode] ipe
WHERE EXISTS (SELECT 1
FROM [procedure] ipproc
LEFT JOIN [procedure] ipproc2
ON ipproc.EpisodeID = ipproc2.EpisodeID
AND ipproc.ProcedureNumber + 1 =
ipproc2.ProcedureNumber
WHERE ipproc.EpisodeID = ipe.EpisodeID
AND ipproc.ProcedureCode = 'L291'
AND ipproc2.ProcedureCode = 'Y081'
)
Using a CTE
Another alternative is to prepare a CTE with the EpisodeIDs you need. This can also be a helpful approach if you want to find multiple cohorts of patients at the same time, such as different arms of a national clinical audit. For example, you could write:
WITH cohort AS (
SELECT DISTINCT EpisodeID
,'Cohort 1' AS 'CohortName'
FROM [procedure] ipproc
WHERE ipproc.ProcedureCode = 'L291'
UNION ALL
SELECT DISTINCT EpisodeID
,'Cohort 2' AS 'CohortName'
FROM [procedure] ipproc
WHERE ipproc.ProcedureCode = 'L292'
)
SELECT ipe.EpisodeID
,cohort.CohortName
,ipe.StartDate
,ipe.EndDate
FROM cohort
LEFT JOIN episode ipe
ON cohort.EpisodeID = ipe.EpisodeID
If you don’t want to duplicate episodes which belong to more than one cohort in the output, you could use a series of CTEs, de-duplicate with a window function, or pivot the table on the cohort name column.
What not to do
It’s not uncommon for NHS data warehouses to contain a “wide” version of the episodes table, with a separate column for each diagnosis code. One good reason to have a table like this is that it is closer to the input format required by the Local Payment Grouper. It looks something like this:
SELECT TOP 100
ipewide.EpisodeID
,ipewide.StartDate
,ipewide.EndDate
,[1], [2], [3], [4], [5]
,[6], [7], [8], [9], [10]
,[11], [12], [13], [14], [15]
,[16], [17], [18], [19], [20]
FROM episodewide ipewide
Don’t be tempted to try and use this table to search for a procedure in any position. I’ve come across two anti-patterns that can arise using this kind of table.
WHERE
clause including every column
/* I don't recommend this. */
SELECT ipewide.EpisodeID
FROM episodewide ipewide
WHERE ipewide.[1] = 'L291'
OR ipewide.[2] = 'L291'
OR ipewide.[3] = 'L291'
OR ipewide.[4] = 'L291'
-- ... etc etc
OR ipewide.[20] = 'L291'
The query is very long and verbose, even in simple cases. That makes it very awkward to read later. It’s also irritating to make edits to the search criteria, and it’s easy to introduce mistakes. It is not worth it.
Concatenate all columns and use LIKE
/* I don't recommend this. */
SELECT ipe.EpisodeID
FROM episodewide ipewide
WHERE CONCAT([1], [2], [3], [4], [5],
[6], [7], [8], [9], [10],
[11], [12], [13], [14], [15],
[16], [17], [18], [19], [20])
LIKE '%L291%'
Again this is a rather verbose query, and it will be rather slow.
Final thoughts
- Some people prefer using the
ANY
keyword instead ofEXISTS
- in many cases it’s straightforward to rewrite theANY
version asEXISTS
and vice versa. - Clinical coders are knowledgeable, hard-working and often under-appreciated. If a coder helps you out with an ad-hoc and you get some good feedback - think about sharing it with them!
- Searching all the procedures and diagnoses isn’t always the right thing to do. If you only want spells where the procedure in question was the most signficant thing to happen to the patient, it might be better just to search the dominant procedure of the spell.