How to take the pain out of summarising activity by date

All data in this article is fake data, generated artificially.

I work with time series most weeks. Modern electronic patient records (EPRs) record precise timestamps for many actions. But most often, I need to aggregate the figures in some way. There are rather a lot of different possible aggregations:

  • By calendar or financial year;
  • By calendar or financial month;
  • By week beginning or week ending (by the way - which day of the week is the start of the week?);
  • By day of the week;
  • By whether the day is a working day or a weekend or bank holiday;
  • etc etc.

So the question is, how do we convert the timestamp from the EPR into a field containing the week or months that the timestamp fell into?

Many of the StackOverflow solutions to this kind of problem involve writing pretty baroque SQL combining a variety of date functions (Example 1, Example 2, Example 3).

I tend to hesitate about using these kinds of snippets. I have to think hard to convince myself the code will work for any given date. Including obscure snippets creates an extra hurdle for the next analyst who has to read the code. The functions could be pretty slow, especially if you’re dealing with large tables like outpatients. And what’s the guarantee that a colleague would pick Monday as the start of the week rather than Sunday?

So instead, I tend to use a calendar table (also sometimes called a date dimension table). This is a table which contains one row for each calendar date. It might look something like this:

SELECT TOP 100 *
        
FROM lkp_calendar
Table 1: Displaying records 1 - 10
Date Financial.Year Month Week.Beginning Week.Ending Day.of.Week Working.Day
2018-01-01 2017/18 2018-01-01 01/01/2018 07/01/2018 Mon 0
2018-01-02 2017/18 2018-01-01 01/01/2018 07/01/2018 Tue 1
2018-01-03 2017/18 2018-01-01 01/01/2018 07/01/2018 Wed 1
2018-01-04 2017/18 2018-01-01 01/01/2018 07/01/2018 Thu 1
2018-01-05 2017/18 2018-01-01 01/01/2018 07/01/2018 Fri 1
2018-01-06 2017/18 2018-01-01 01/01/2018 07/01/2018 Sat 0
2018-01-07 2017/18 2018-01-01 01/01/2018 07/01/2018 Sun 0
2018-01-08 2017/18 2018-01-01 08/01/2018 14/01/2018 Mon 1
2018-01-09 2017/18 2018-01-01 08/01/2018 14/01/2018 Tue 1
2018-01-10 2017/18 2018-01-01 08/01/2018 14/01/2018 Wed 1

In real life, you tend to find a bunch more fields - but I’ve limited it to a few examples.

Summarising figures by date then becomes as easy as a LEFT JOIN:

SELECT dy.[Financial.Year]
        ,dy.[Month]
        ,COUNT(*) AS 'Finished Consultant Episodes'
        
FROM [episode] ipe
LEFT JOIN lkp_calendar dy
  ON ipe.[EndDate] = dy.[Date]

GROUP BY dy.[Financial.Year]
        ,dy.[Month]

ORDER BY dy.[Financial.Year]
        ,dy.[Month]
Table 2: Displaying records 1 - 10
Financial.Year Month Finished Consultant Episodes
2019/20 2019-01-09 286
2019/20 2019-01-10 2077
2019/20 2019-01-11 1993
2019/20 2019-01-12 2153
2019/20 2020-01-01 2075
2019/20 2020-01-02 1978
2019/20 2020-01-03 2045
2020/21 2020-01-04 2055
2020/21 2020-01-05 2146
2020/21 2020-01-06 2094

It’s also very easy to take a look at activity that takes place at the weekend or bank holidays (without having to remember when the latter are):

SELECT dy.[Financial.Year]
        ,dy.[Month]
        ,COUNT(*) AS 'Finished Consultant Episodes'
        
FROM [episode] ipe
LEFT JOIN lkp_calendar dy
  ON ipe.[EndDate] = dy.[Date]

WHERE dy.[Working.Day] = 0

GROUP BY dy.[Financial.Year]
        ,dy.[Month]

ORDER BY dy.[Financial.Year]
        ,dy.[Month]
Table 3: Displaying records 1 - 10
Financial.Year Month Finished Consultant Episodes
2019/20 2019-01-09 103
2019/20 2019-01-10 533
2019/20 2019-01-11 578
2019/20 2019-01-12 759
2019/20 2020-01-01 633
2019/20 2020-01-02 625
2019/20 2020-01-03 560
2020/21 2020-01-04 691
2020/21 2020-01-05 872
2020/21 2020-01-06 572

As an aside, it is worth being careful about the “index date” in the join to the calendar table. In the above query, that’s ipe.[EndDate], so my summary figures represent episodes which ended within each time period. This is the normal way to count spells and episodes, but generally speaking it’s worth making this choice consciously as it can make a noticeable difference to the figures.

A calendar table is also a neat way to keep control of special dates that the department needs to include in business logic. For example, adding a field to a calendar table is a good way to keep track of audit days when operating theatres are closed, or one-off events such as the junior doctors’ strikes back in 2016.

If you’re setting up a calendar table for the first time, there are lots of articles out there about good ways to get going. One consideration though is to make sure you have fields like [Month] and [Week Beginning] in datetime format, not just as a formatted string. Excel is much better at sorting dates if they are formatted as date, and if you need a particular formatting applied, you can always stick an Excel custom format on to finish the job.