Show all diagnoses or procedures in a single row

All the data in this article is fake data generated artificially.

It’s sometimes helpful to produce a listing of activity along with the full list of diagnoses or procedures associated with that activity. For inpatients, that usually means starting with an episode table, containing one row per inpatient episode, and a diagnosis or procedure table, containing one row per coded diagnosis or procedure. For each episode, there can be many diagnoses or procedures. The question is, what’s the best way to squash down the diagnoses or procedures into a single row?

Here are a couple of ways to go about this.

STRING_AGG

Since SQL Server 2017, Microsoft have provided a function that gives us what we need with no fuss.


SELECT TOP 100 
        ipe.EpisodeID
        ,STRING_AGG(ipdiag.DiagnosisCode, '; ')
        WITHIN GROUP (ORDER BY ipdiag.DiagnosisNumber)
          AS 'Diagnoses'

FROM [episode] ipe
LEFT JOIN [diagnosis] ipdiag
ON ipdiag.EpisodeID = ipe.EpisodeID

GROUP BY ipe.EpisodeID
Table 1: Displaying records 1 - 10
EpisodeID Diagnoses
1 X225; T844; M4948; T869; W365; R457; K116
2 S322; V250
3 S2741; F317; C845; M6084; M9431; W142; W697; S3691; O842; M9995; M8890
4 M8906; X213; M2419; I7010; M7268; M9432
5 M2333; A491; M894; X204; M8935; Z889; W224; S758; A052; M660
6 S364; W202; W012; W105; M1388; S650; A185; M5387; V909; D748; M844; M779; P962; B250; R098; R634; O200; Y821; Q355
7 T222; X613; M1910; L918; M493; L669
8 M023; M8642; M2587; K388; C090; M8387; V155; P711; M913; O894; T451; M7183
9 C799; L282; P960; B838; D000; M8101; T799; M7696; M8325; M1023
10 X899; M8930; I378; Y228; M0138; M810; M9433; C20X; M1229; T959

It’s straightforward to finesse this:

  • to add code descriptions, left join to the lookup table
  • to put each code on a separate line within an Excel cell, replace '; ' with '; ' + CHAR(10).

STUFF FOR XML PATH

If the compatibility level of your database is too low to take advantage of STRING_AGG, there is a hallowed trick to fall back on. I’m not sure who first came up with this, but I think I first saw it on Stack Overflow.

SELECT TOP 100 
    ipe.EpisodeID 
    ,STUFF(
                 (SELECT '; ' + ipproc.ProcedureCode 
                 FROM [procedure] ipproc 
                 WHERE ipproc.EpisodeID = ipe.EpisodeID FOR XML PATH ('')), 1, 1, ''
           ) AS 'Procedures'

FROM episode ipe

GROUP BY ipe.EpisodeID

ORDER BY ipe.EpisodeID
Table 2: Displaying records 1 - 10
EpisodeID Procedures
1 T691; F039; Z333; A572; H572; G798
2 Z775; A621; K613; T898; S229; F558; Y848; W562; Z702; V465; K692; Y119
3 H336; J098; A313; W253; M132; M158; J639; W881; J662; W324; W541; O381; A222; M668
4 K528; K328; S039; G299; V462
5 D069; K595; G352
6 F095; Y892; Z291; Z982; S551; W742
7 L744; L496; X224; W593; Z999; Z438; M669; R098; Y501; W509; G438; D031; D078
8 Y418; A482; Y371; L715; X405; E438; V561; K072; O142; S129; O101; A041; C866; X304; D222; G801; X656; Y028
9 F521; U011; M208; C491; S577; G013; W974
10 C143; M295; O368; B372; X326; O408; S328; W041; K568; J348; Z098; E469; T968

If you want to understand exactly how this works, check out this excellent Stack Overflow answer. If I end up using a pattern like this in a query that I know others will have to use or maintain, I tend to leave a comment indicating what’s going on.

A less palatable option

I have seen attempts to achieve this effect using the following pattern:

/* I don't recommend this. */

SELECT TOP 100 
    ipe.EpisodeID
    ,CONCAT(ipdiag1.DiagnosisCode + '; ',
           ipdiag2.DiagnosisCode + '; ',
           -- etc etc
           ipdiag19.DiagnosisCode + '; ',
           ipdiag20.DiagnosisCode)
           AS 'Diagnoses'
    
FROM episode ipe
LEFT JOIN diagnosis ipdiag1
  ON ipdiag1.EpisodeID = ipe.EpisodeID
  AND ipdiag1.DiagnosisNumber = 1
LEFT JOIN diagnosis ipdiag2
  ON ipdiag2.EpisodeID = ipe.EpisodeID
  AND ipdiag2.DiagnosisNumber = 2
LEFT JOIN diagnosis ipdiag3
  ON ipdiag3.EpisodeID = ipe.EpisodeID
  AND ipdiag3.DiagnosisNumber = 3
LEFT JOIN diagnosis ipdiag4
  ON ipdiag4.EpisodeID = ipe.EpisodeID
  AND ipdiag4.DiagnosisNumber = 4
  
-- etc, etc

LEFT JOIN diagnosis ipdiag19
  ON ipdiag19.EpisodeID = ipe.EpisodeID
  AND ipdiag19.DiagnosisNumber = 19
LEFT JOIN diagnosis ipdiag20
  ON ipdiag20.EpisodeID = ipe.EpisodeID
  AND ipdiag20.DiagnosisNumber = 20

This approach leaves you with a very verbose query - it’s pretty hard to debug a query when you can’t see the wood for the trees. It also forces you to decide the maximum number of diagnoses you want to include for each episode.