Show all diagnoses or procedures in a single row
All the data in this article is fake data generated artificially.
It’s sometimes helpful to produce a listing of activity along with the full list of diagnoses or procedures associated with that activity. For inpatients, that usually means starting with an episode table, containing one row per inpatient episode, and a diagnosis or procedure table, containing one row per coded diagnosis or procedure. For each episode, there can be many diagnoses or procedures. The question is, what’s the best way to squash down the diagnoses or procedures into a single row?
Here are a couple of ways to go about this.
STRING_AGG
Since SQL Server 2017, Microsoft have provided a function that gives us what we need with no fuss.
SELECT TOP 100
ipe.EpisodeID
,STRING_AGG(ipdiag.DiagnosisCode, '; ')
WITHIN GROUP (ORDER BY ipdiag.DiagnosisNumber)
AS 'Diagnoses'
FROM [episode] ipe
LEFT JOIN [diagnosis] ipdiag
ON ipdiag.EpisodeID = ipe.EpisodeID
GROUP BY ipe.EpisodeID
EpisodeID | Diagnoses |
---|---|
1 | X225; T844; M4948; T869; W365; R457; K116 |
2 | S322; V250 |
3 | S2741; F317; C845; M6084; M9431; W142; W697; S3691; O842; M9995; M8890 |
4 | M8906; X213; M2419; I7010; M7268; M9432 |
5 | M2333; A491; M894; X204; M8935; Z889; W224; S758; A052; M660 |
6 | S364; W202; W012; W105; M1388; S650; A185; M5387; V909; D748; M844; M779; P962; B250; R098; R634; O200; Y821; Q355 |
7 | T222; X613; M1910; L918; M493; L669 |
8 | M023; M8642; M2587; K388; C090; M8387; V155; P711; M913; O894; T451; M7183 |
9 | C799; L282; P960; B838; D000; M8101; T799; M7696; M8325; M1023 |
10 | X899; M8930; I378; Y228; M0138; M810; M9433; C20X; M1229; T959 |
It’s straightforward to finesse this:
- to add code descriptions, left join to the lookup table
- to put each code on a separate line within an Excel cell, replace
'; '
with'; ' + CHAR(10)
.
STUFF FOR XML PATH
If the compatibility level of your database is too low to take advantage of STRING_AGG
, there is a hallowed trick to fall back on. I’m not sure who first came up with this, but I think I first saw it on Stack Overflow.
SELECT TOP 100
ipe.EpisodeID
,STUFF(
(SELECT '; ' + ipproc.ProcedureCode
FROM [procedure] ipproc
WHERE ipproc.EpisodeID = ipe.EpisodeID FOR XML PATH ('')), 1, 1, ''
) AS 'Procedures'
FROM episode ipe
GROUP BY ipe.EpisodeID
ORDER BY ipe.EpisodeID
EpisodeID | Procedures |
---|---|
1 | T691; F039; Z333; A572; H572; G798 |
2 | Z775; A621; K613; T898; S229; F558; Y848; W562; Z702; V465; K692; Y119 |
3 | H336; J098; A313; W253; M132; M158; J639; W881; J662; W324; W541; O381; A222; M668 |
4 | K528; K328; S039; G299; V462 |
5 | D069; K595; G352 |
6 | F095; Y892; Z291; Z982; S551; W742 |
7 | L744; L496; X224; W593; Z999; Z438; M669; R098; Y501; W509; G438; D031; D078 |
8 | Y418; A482; Y371; L715; X405; E438; V561; K072; O142; S129; O101; A041; C866; X304; D222; G801; X656; Y028 |
9 | F521; U011; M208; C491; S577; G013; W974 |
10 | C143; M295; O368; B372; X326; O408; S328; W041; K568; J348; Z098; E469; T968 |
If you want to understand exactly how this works, check out this excellent Stack Overflow answer. If I end up using a pattern like this in a query that I know others will have to use or maintain, I tend to leave a comment indicating what’s going on.
A less palatable option
I have seen attempts to achieve this effect using the following pattern:
/* I don't recommend this. */
SELECT TOP 100
ipe.EpisodeID
,CONCAT(ipdiag1.DiagnosisCode + '; ',
ipdiag2.DiagnosisCode + '; ',
-- etc etc
ipdiag19.DiagnosisCode + '; ',
ipdiag20.DiagnosisCode)
AS 'Diagnoses'
FROM episode ipe
LEFT JOIN diagnosis ipdiag1
ON ipdiag1.EpisodeID = ipe.EpisodeID
AND ipdiag1.DiagnosisNumber = 1
LEFT JOIN diagnosis ipdiag2
ON ipdiag2.EpisodeID = ipe.EpisodeID
AND ipdiag2.DiagnosisNumber = 2
LEFT JOIN diagnosis ipdiag3
ON ipdiag3.EpisodeID = ipe.EpisodeID
AND ipdiag3.DiagnosisNumber = 3
LEFT JOIN diagnosis ipdiag4
ON ipdiag4.EpisodeID = ipe.EpisodeID
AND ipdiag4.DiagnosisNumber = 4
-- etc, etc
LEFT JOIN diagnosis ipdiag19
ON ipdiag19.EpisodeID = ipe.EpisodeID
AND ipdiag19.DiagnosisNumber = 19
LEFT JOIN diagnosis ipdiag20
ON ipdiag20.EpisodeID = ipe.EpisodeID
AND ipdiag20.DiagnosisNumber = 20
This approach leaves you with a very verbose query - it’s pretty hard to debug a query when you can’t see the wood for the trees. It also forces you to decide the maximum number of diagnoses you want to include for each episode.