layer:11

an addiction to or an obsession with acquiring, manipulating, and sharing information

Transpose Rows To Columns

One very common problem when making reports from a RDBMS, is that you need to show, data that’s stored in rows, in columns. For example, from data like this:

 1 CREATE TABLE #data_a (
 2   cat varchar(20) NOT NULL,
 3   mon date NOT NULL,
 4   val int NOT NULL
 5 );
 6 INSERT INTO #data_a
 7 VALUES
 8   ('Foo', '2011-02-01', '752'),
 9   ('Qux', '2011-03-01', '700'),
10   ('Qux', '2011-03-01', '936'),
11   ('Bar', '2011-02-01', '899'),
12   ('Foo', '2011-01-01', '482'),
13   ('Qux', '2011-01-01', '349'),
14   ('Foo', '2011-03-01', '67'),
15   ('Bar', '2011-02-01', '999'),
16   ('Bar', '2011-01-01', '183');

You would need to show, for example, the first three months, of the current year, in a table with a column for each month. A straightforward way to achieve this would be:

 1 SELECT
 2   d.cat,
 3   SUM(CASE WHEN MONTH(d.mon) = 1 THEN d.val ELSE 0 END) AS january_sum,
 4   SUM(CASE WHEN MONTH(d.mon) = 2 THEN d.val ELSE 0 END) AS february_sum,
 5   SUM(CASE WHEN MONTH(d.mon) = 3 THEN d.val ELSE 0 END) AS march_sum
 6 FROM #data_a AS d
 7 WHERE
 8   YEAR(d.mon) = YEAR(CURRENT_TIMESTAMP) AND
 9   MONTH(d.mon) IN (1, 2, 3)
10 GROUP BY d.cat
11 ORDER BY d.cat;

Works fine, easy to use and understand. A bit more advanced solution would be with PIVOT():

 1 SELECT
 2   p.cat,
 3   COALESCE(p.January, 0) AS january_sum,
 4   COALESCE(p.February, 0) AS february_sum,
 5   COALESCE(p.March, 0) AS march_sum
 6 FROM (
 7   SELECT
 8     d.cat,
 9     DATENAME(MONTH, d.mon) AS col,
10     d.val
11   FROM #data_a AS d
12   WHERE
13     YEAR(d.mon) = YEAR(CURRENT_TIMESTAMP) AND
14     MONTH(d.mon) IN (1, 2, 3)
15 ) AS d
16 PIVOT(
17   SUM(d.val)
18   FOR d.col IN (
19     January,
20     February,
21     March
22   )
23 ) AS p
24 ORDER BY p.cat;

But I don’t feel like there’s a good reason to use it, no benefit and more awkward syntax. I’ll try to give a short explanation on what goes into PIVOT(). SUM(d.val) says to aggregate d.val by summing it, FOR d.col means that you want to move values from d.col to new columns, IN (January, February, March) determines which values get moved to new columns (and the name of those columns). Another thing to note, p.cat in SELECT, not d.cat.

OK, how about a bit different situation:

 1 CREATE TABLE #data_b (
 2   cat varchar(20) NOT NULL,
 3   mon date NOT NULL,
 4   val char(5) NOT NULL,
 5   UNIQUE (cat, mon)
 6 );
 7 INSERT INTO #data_b
 8 VALUES
 9   ('Foo', '2011-01-01', 'CDHDE'),
10   ('Qux', '2011-01-01', 'DTDLI'),
11   ('Qux', '2011-03-01', 'PIQXG'),
12   ('Bar', '2011-02-01', 'TVDDI'),
13   ('Foo', '2011-02-01', 'VJKRC'),
14   ('Bar', '2011-03-01', 'XWMAG'),
15   ('Qux', '2011-02-01', 'GCIPP');

As we can’t aggregate those values, we end up with something like this:

 1 SELECT
 2   d.cat,
 3   j.val AS january_val,
 4   f.val AS february_val,
 5   m.val AS march_val
 6 FROM (
 7   SELECT DISTINCT d.cat
 8   FROM #data_b AS d
 9 ) AS d
10 LEFT JOIN #data_b AS j ON
11   j.cat = d.cat AND
12   YEAR(j.mon) = YEAR(CURRENT_TIMESTAMP) AND
13   MONTH(j.mon) = 1
14 LEFT JOIN #data_b AS f ON
15   f.cat = d.cat AND
16   YEAR(f.mon) = YEAR(CURRENT_TIMESTAMP) AND
17   MONTH(f.mon) = 2
18 LEFT JOIN #data_b AS m ON
19   m.cat = d.cat AND
20   YEAR(m.mon) = YEAR(CURRENT_TIMESTAMP) AND
21   MONTH(m.mon) = 3
22 ORDER BY d.cat;

Not pretty, but works fine and is very simple.

Now to what I actually wanted to write down. While these three solutions work perfectly for some cases, they aren’t useful for others. What if you didn’t know the columns beforehand? Say, maybe you need to have a column for every day in a month? In one month you would have 31 column, in other 28 columns. Also, you have to have relative column names. In previous examples, you can’t name a column like “2011-01-01”, because that would work only for this year. That’s not a big problem with dates, but what if you’re moving user entered text into columns? It wouldn’t be possible to make sense of the result. So, here’s a decent solution for all that:

 1 DECLARE @s varchar(MAX);
 2 DECLARE @j varchar(MAX);
 3 
 4 SELECT
 5   @s = COALESCE(@s + ', ', '') + '[' + t.col + '].val AS [' + t.col + ']',
 6   @j = COALESCE(@j + ' ', '') + 'LEFT JOIN #data_b AS [' + t.col + '] ON [' + t.col + '].cat = d.cat AND [' + t.col + '].mon = ''' + CAST(t.mon AS varchar) + ''''
 7 FROM (
 8   SELECT DISTINCT
 9     d.mon,
10     LOWER(DATENAME(MONTH, d.mon)) + '_val' AS col
11     FROM #data_b AS d
12     WHERE
13       YEAR(d.mon) = YEAR(CURRENT_TIMESTAMP) AND
14       MONTH(d.mon) IN (1, 2, 3)
15 ) AS t
16 ORDER BY t.mon;
17 
18 EXEC('
19   SELECT
20     d.cat,
21     ' + @s + '
22   FROM (
23     SELECT DISTINCT d.cat
24     FROM #data_b AS d
25   ) AS d
26   ' + @j + '
27   ORDER BY d.cat;
28 ');

I wouldn’t go as far as calling a dynamic SQL solution elegant, but I do think it’s simple and easy to fallow. Basically, you just dynamically create a query similar to the third solution. The cute trick here is how COALESCE() and SELECT @foo = works, it concatenates each row into one string. No need to use ugly WHILE loops, that I see everywhere. That’s it. BTW, as EXEC() executes in it’s own context, to get data out of it, you may want to use a global temporary table.

2011-09-07 21:52

SQLSQL Server

sp_lock Replacement

Here’s what I’ve been using instead of sp_lock:

 1 SELECT
 2   l.request_session_id AS sid,
 3   r.status,
 4   d.name AS [database],
 5   o.name AS object,
 6   l.request_mode,
 7   l.request_status,
 8   r.blocking_session_id AS blocking_sid,
 9   r.wait_type,
10   r.wait_time,
11   r.total_elapsed_time,
12   r.percent_complete,
13   p.cpu,
14   p.physical_io,
15   p.memusage,
16   p.loginame,
17   p.hostname,
18   p.program_name,
19   t.text
20 FROM sys.dm_tran_locks AS l
21 INNER JOIN sys.databases AS d ON d.database_id = l.resource_database_id
22 INNER JOIN sys.objects AS o ON o.object_id = l.resource_associated_entity_id
23 LEFT JOIN sys.dm_exec_requests AS r ON r.session_id = l.request_session_id
24 LEFT JOIN sys.sysprocesses AS p ON p.spid = l.request_session_id
25 CROSS APPLY sys.dm_exec_sql_text(r.sql_handle) AS t
26 WHERE
27   d.database_id = DB_ID() AND
28   l.resource_type = 'OBJECT' AND
29   r.session_id != @@SPID
30 ORDER BY
31   d.name,
32   l.request_session_id,
33   o.name

2010-01-17 04:21

SQLSQL Server

String Aggregation

Say you have the following data and you want to concatenate all queries for each year.

 1 CREATE TABLE years (year smallint);
 2 INSERT INTO years VALUES
 3   (2008),
 4   (2007),
 5   (2009);
 6 
 7 CREATE TABLE top_queries (year smallint, place smallint, query varchar(50));
 8 INSERT INTO top_queries VALUES
 9   (2009,  4, 'twitter'),
10   (2008,  2, 'beijing 2008'),
11   (2008,  9, 'euro 2008'),
12   (2007,  8, 'second life'),
13   (2008,  8, 'wer kennt wen'),
14   (2009,  6, 'new moon'),
15   (2007,  3, 'facebook'),
16   (2007,  4, 'dailymotion'),
17   (2009,  8, 'windows 7'),
18   (2007,  7, 'ebuddy'),
19   (2009,  9, 'dantri.com.vn'),
20   (2009, 10, 'torpedo gratis'),
21   (2009,  3, 'tuenti'),
22   (2008,  1, 'sarah palin'),
23   (2008,  6, 'obama'),
24   (2009,  1, 'michael jackson'),
25   (2007,  1, 'iphone'),
26   (2009,  5, 'sanalika'),
27   (2007, 10, 'club penguin'),
28   (2007,  9, 'hi5'),
29   (2009,  2, 'facebook'),
30   (2008,  4, 'tuenti'),
31   (2007,  6, 'youtube'),
32   (2008,  7, 'nasza klasa'),
33   (2008,  3, 'facebook login'),
34   (2009,  7, 'lady gaga'),
35   (2007,  5, 'webkinz'),
36   (2007,  2, 'badoo'),
37   (2008,  5, 'heath ledger'),
38   (2008, 10, 'jonas brothers');

For SQL Server, a recursive CTE is good option:

 1 WITH
 2   q AS (
 3     SELECT
 4       ROW_NUMBER() OVER(PARTITION BY year ORDER BY place) AS nr,
 5       year,
 6       query
 7     FROM top_queries
 8   ),
 9   t AS (
10     SELECT
11       year,
12       1 AS query_nr,
13       CONVERT(varchar(8000), '') AS queries
14     FROM years
15     UNION ALL
16     SELECT
17       t.year,
18       t.query_nr + 1,
19       CONVERT(varchar(8000),
20         t.queries +
21         CASE WHEN LEN(t.queries) > 0 THEN ', ' ELSE '' END +
22         q.query
23       )
24     FROM t
25     INNER JOIN q ON
26       q.year = t.year AND
27       q.nr = t.query_nr
28   )
29 SELECT
30   year,
31   queries
32 FROM t
33 WHERE
34   query_nr IN (
35     SELECT MAX(nr) + 1 FROM q GROUP BY year
36   )
37 ORDER BY year;

Or maybe, although I never understood how exactly this works, FOR XML clause:

 1 SELECT
 2   year,
 3   LEFT(queries, LEN(queries) - 2) AS queries
 4 FROM (
 5   SELECT
 6     year,
 7     (
 8       SELECT q.query + ', ' AS [text()]
 9       FROM top_queries AS q
10       WHERE q.year = y.year
11       ORDER BY q.place
12       FOR XML PATH('')
13     ) AS queries
14   FROM years AS y
15   GROUP BY year
16 ) AS t
17 ORDER BY year;

In PostgreSQL, pretty much the same CTE query would work, but you’re better off with array_agg function:

1 SELECT
2   y.year,
3   array_to_string(array_agg(query), ', ') AS queries
4 FROM (
5   SELECT * FROM top_queries ORDER BY place DESC
6 ) AS q
7 INNER JOIN years AS y ON y.year = q.year
8 GROUP BY y.year
9 ORDER BY y.year;

Generating a Range of Values

For example, lets generate a set of months, from 2010-01-01 to 2011-01-01.

In SQL Server:

1 WITH t AS (
2   SELECT CONVERT(date, '2010-01-01') AS d
3   UNION ALL
4   SELECT DATEADD(MONTH, 1, d) FROM t WHERE d < '2010-12-01'
5 )
6 SELECT * FROM t ORDER BY d;

Same in PostgreSQL:

1 WITH RECURSIVE t AS (
2   SELECT '2010-01-01'::timestamp AS d
3   UNION ALL
4   SELECT d + '1 months' FROM t WHERE d < '2010-12-01'
5 )
6 SELECT * FROM t ORDER BY d

In PostgreSQL, you can also use generate_series() function, but it’s useful only in these basic cases:

1 SELECT * FROM generate_series('2010-01-01'::timestamp, '2010-12-01', '1 months');
| Archive | RSS | E-mail | Twitter | Alvis Mikovs