Transpose Rows To Columns
One very common problem when making reports from a RDBMS, is that you need to show, data that’s stored in rows, in columns. For example, from data like this:
1 CREATE TABLE #data_a (
2 cat varchar(20) NOT NULL,
3 mon date NOT NULL,
4 val int NOT NULL
5 );
6 INSERT INTO #data_a
7 VALUES
8 ('Foo', '2011-02-01', '752'),
9 ('Qux', '2011-03-01', '700'),
10 ('Qux', '2011-03-01', '936'),
11 ('Bar', '2011-02-01', '899'),
12 ('Foo', '2011-01-01', '482'),
13 ('Qux', '2011-01-01', '349'),
14 ('Foo', '2011-03-01', '67'),
15 ('Bar', '2011-02-01', '999'),
16 ('Bar', '2011-01-01', '183');
You would need to show, for example, the first three months, of the current year, in a table with a column for each month. A straightforward way to achieve this would be:
1 SELECT
2 d.cat,
3 SUM(CASE WHEN MONTH(d.mon) = 1 THEN d.val ELSE 0 END) AS january_sum,
4 SUM(CASE WHEN MONTH(d.mon) = 2 THEN d.val ELSE 0 END) AS february_sum,
5 SUM(CASE WHEN MONTH(d.mon) = 3 THEN d.val ELSE 0 END) AS march_sum
6 FROM #data_a AS d
7 WHERE
8 YEAR(d.mon) = YEAR(CURRENT_TIMESTAMP) AND
9 MONTH(d.mon) IN (1, 2, 3)
10 GROUP BY d.cat
11 ORDER BY d.cat;
Works fine, easy to use and understand. A bit more advanced solution would be with PIVOT():
1 SELECT
2 p.cat,
3 COALESCE(p.January, 0) AS january_sum,
4 COALESCE(p.February, 0) AS february_sum,
5 COALESCE(p.March, 0) AS march_sum
6 FROM (
7 SELECT
8 d.cat,
9 DATENAME(MONTH, d.mon) AS col,
10 d.val
11 FROM #data_a AS d
12 WHERE
13 YEAR(d.mon) = YEAR(CURRENT_TIMESTAMP) AND
14 MONTH(d.mon) IN (1, 2, 3)
15 ) AS d
16 PIVOT(
17 SUM(d.val)
18 FOR d.col IN (
19 January,
20 February,
21 March
22 )
23 ) AS p
24 ORDER BY p.cat;
But I don’t feel like there’s a good reason to use it, no benefit and more awkward syntax. I’ll try to give a short explanation on what goes into PIVOT(). SUM(d.val) says to aggregate d.val by summing it, FOR d.col means that you want to move values from d.col to new columns, IN (January, February, March) determines which values get moved to new columns (and the name of those columns). Another thing to note, p.cat in SELECT, not d.cat.
OK, how about a bit different situation:
1 CREATE TABLE #data_b (
2 cat varchar(20) NOT NULL,
3 mon date NOT NULL,
4 val char(5) NOT NULL,
5 UNIQUE (cat, mon)
6 );
7 INSERT INTO #data_b
8 VALUES
9 ('Foo', '2011-01-01', 'CDHDE'),
10 ('Qux', '2011-01-01', 'DTDLI'),
11 ('Qux', '2011-03-01', 'PIQXG'),
12 ('Bar', '2011-02-01', 'TVDDI'),
13 ('Foo', '2011-02-01', 'VJKRC'),
14 ('Bar', '2011-03-01', 'XWMAG'),
15 ('Qux', '2011-02-01', 'GCIPP');
As we can’t aggregate those values, we end up with something like this:
1 SELECT
2 d.cat,
3 j.val AS january_val,
4 f.val AS february_val,
5 m.val AS march_val
6 FROM (
7 SELECT DISTINCT d.cat
8 FROM #data_b AS d
9 ) AS d
10 LEFT JOIN #data_b AS j ON
11 j.cat = d.cat AND
12 YEAR(j.mon) = YEAR(CURRENT_TIMESTAMP) AND
13 MONTH(j.mon) = 1
14 LEFT JOIN #data_b AS f ON
15 f.cat = d.cat AND
16 YEAR(f.mon) = YEAR(CURRENT_TIMESTAMP) AND
17 MONTH(f.mon) = 2
18 LEFT JOIN #data_b AS m ON
19 m.cat = d.cat AND
20 YEAR(m.mon) = YEAR(CURRENT_TIMESTAMP) AND
21 MONTH(m.mon) = 3
22 ORDER BY d.cat;
Not pretty, but works fine and is very simple.
Now to what I actually wanted to write down. While these three solutions work perfectly for some cases, they aren’t useful for others. What if you didn’t know the columns beforehand? Say, maybe you need to have a column for every day in a month? In one month you would have 31 column, in other 28 columns. Also, you have to have relative column names. In previous examples, you can’t name a column like “2011-01-01”, because that would work only for this year. That’s not a big problem with dates, but what if you’re moving user entered text into columns? It wouldn’t be possible to make sense of the result. So, here’s a decent solution for all that:
1 DECLARE @s varchar(MAX);
2 DECLARE @j varchar(MAX);
3
4 SELECT
5 @s = COALESCE(@s + ', ', '') + '[' + t.col + '].val AS [' + t.col + ']',
6 @j = COALESCE(@j + ' ', '') + 'LEFT JOIN #data_b AS [' + t.col + '] ON [' + t.col + '].cat = d.cat AND [' + t.col + '].mon = ''' + CAST(t.mon AS varchar) + ''''
7 FROM (
8 SELECT DISTINCT
9 d.mon,
10 LOWER(DATENAME(MONTH, d.mon)) + '_val' AS col
11 FROM #data_b AS d
12 WHERE
13 YEAR(d.mon) = YEAR(CURRENT_TIMESTAMP) AND
14 MONTH(d.mon) IN (1, 2, 3)
15 ) AS t
16 ORDER BY t.mon;
17
18 EXEC('
19 SELECT
20 d.cat,
21 ' + @s + '
22 FROM (
23 SELECT DISTINCT d.cat
24 FROM #data_b AS d
25 ) AS d
26 ' + @j + '
27 ORDER BY d.cat;
28 ');
I wouldn’t go as far as calling a dynamic SQL solution elegant, but I do think it’s simple and easy to fallow. Basically, you just dynamically create a query similar to the third solution. The cute trick here is how COALESCE() and SELECT @foo = works, it concatenates each row into one string. No need to use ugly WHILE loops, that I see everywhere. That’s it. BTW, as EXEC() executes in it’s own context, to get data out of it, you may want to use a global temporary table.
