Thom's memory leak: 2010

Tuesday 16 November 2010

count_if (expression)

Someone just posted a problem on a PostgreSQL mailing list. They had several boolean columns, but they wanted to get how many values were true for each column, but in 1 query. There are several ways to do this, but none of which are particularly elegant.

Say we have the following table definition:

CREATE TABLE subscribers
(
    subscriber text PRIMARY KEY,
    news boolean DEFAULT false,
    sport boolean DEFAULT false,
    promotions boolean DEFAULT false,
    horrible_spam boolean DEFAULT false
);

INSERT INTO subscribers (subscriber, news, sport, promotions, horrible_spam)
VALUES
('person_a', true, false, false, false),
('person_b', true, true, true, false),
('person_c', true, false, true, true),
('person_d', false, false, false, false),
('person_e', true, true, false, false),
('person_f', true, false, true, true),
('person_g', true, false, false, false);

Now we want to get a count of how many of each subscription category has someone subscribing to it. We can use a case statement for each column:

SELECT sum(CASE news WHEN true THEN 1 END) AS "news count",
       sum(CASE sport WHEN true THEN 1 END) as "sport count",
       sum(CASE promotions WHEN true THEN 1 END) as "promotions count",
       sum(CASE horrible_spam WHEN true THEN 1 END) as "horrible_spam count"
FROM subscribers;

Which produces:

 news count | sport count | promotions count | horrible_spam count 
------------+-------------+------------------+---------------------
          6 |           2 |                3 |                   2
(1 row)

This looks almost unreadable and is very cumbersome. There's another way however, although it's no better. We use 4 subqueries:

SELECT (SELECT count(news) FROM subscribers WHERE news = true) AS "news count",
       (SELECT count(sport) FROM subscribers WHERE sport = true) AS "sport count",
       (SELECT count(promotions) FROM subscribers WHERE promotions = true) AS "promotions count",
       (SELECT count(horrible_spam) FROM subscribers WHERE horrible_spam = true) AS "horrible_spam count";

 news count | sport count | promotions count | horrible_spam count 
------------+-------------+------------------+---------------------
          6 |           2 |                3 |                   2
(1 row)

Arghhh.. my eyes! Okay, we can try casting booleans to integers:

SELECT count(news::int) AS "news count",
       count(sport::int) AS "sport count",
       count(promotions::int) AS "promotions count",
       count(horrible_spam::int) AS "horrible_spam count"
FROM subscribers;

 news count | sport count | promotions count | horrible_spam count 
------------+-------------+------------------+---------------------
          7 |           7 |                7 |                   7
(1 row)

Hmmm.. obviously not.

Surely there's a better way? Well no, there isn't... unless we create an aggregate function which will count based on an expression.

CREATE OR REPLACE FUNCTION countif_add(current_count int, expression bool)
RETURNS int AS
$BODY$
  SELECT CASE expression
    WHEN true THEN
      current_count + 1
    ELSE
      current_count
  END;
$BODY$
LANGUAGE SQL IMMUTABLE;

CREATE AGGREGATE count_if (boolean)
(
  sfunc = countif_add,
  stype = int,
  initcond = 0
);

Now we can use a far tidier query.

SELECT count_if(news) AS "news count",
       count_if(sport) AS "sport count",
       count_if(promotions) AS "promotions count",
       count_if(horrible_spam) AS "horrible_spam count"
FROM subscribers;

 news count | sport count | promotions count | horrible_spam count 
------------+-------------+------------------+---------------------
          6 |           2 |                3 |                   2
(1 row)

But this function lets us do more than count true values in boolean columns. It's evaluating expressions. So you can have queries like:

SELECT count_if(my_column > 7) AS "things bigger than 7"
       count_if(this_column = that_column) AS "stuff that matches"
FROM my_table;

Thursday 7 October 2010

PGDay.EU 2010 Call for Papers ending soon!

Dave Page posted this on the European PostgreSQL mailing list, but thought I'd also put this out there...

This year's European PostgreSQL Conference is due to be held on the 6 -
8th December 2010 in Stuttgart Germany (http://2010.pgday.eu/).
Following on from last years extremely successful event in Paris, we
hope to make the show bigger and better than before, with:

* A 4 star hotel venue (with spa facilities in the complex)

* A special guest keynote speaker (to be announced Real Soon Now :-) )

* Multiple talk tracks in English and German

* Talks for PostgreSQL hackers and users, as well as decision makers

* A day of training sessions

Of course, to make that happen we need talk proposals from our
community members and users. We're looking for presentations aimed at
developers, users and decision makers on topics such as:

- Developing applications for PostgreSQL
- Administering large scale PostgreSQL installations
- Case studies of PostgreSQL deployments
- PostgreSQL tools and utilities
- PostgreSQL hacking
- Community & user groups
- Tuning the server
- Migrating from other systems
- Scaling/replication
- Benchmarking & hardware
- PostgreSQL related products
- Deployment of PostgreSQL from a business perspective
- Applications and FOSS projects that use PostgreSQL

So if you can make it to Germany in December, please take a look at
the call for papers page on the website
(http://2010.pgday.eu/callforpapers), and send in your proposal before
the 11th October!

Please forward this message to any colleagues or friends that you
believe may be interested in presenting at the conference.

Wednesday 22 September 2010

PostgreSQL docs reloaded

If you regularly reference the online PostgreSQL docs, you may have noticed that they have just had a face-lift in time for the big 9.0 release. If not, take a look. This went through quite a few revisions with input from many folk in the Postgres community. As with all things, you won't please everyone, and this is no less true in the case of this redesign. But look at this as a pilot change, where we've come up with a first version with possible "bug-fixes" to follow. I've also provided similar changes for the French version, although there's only been feedback from one person for that. Regardless, they appear to have gone with it anyway.

I'm no designer, and neither are the majority of the people who contributed their recommendations, but the original cause of me wanting to make some changes is so that the examples which appear throughout the docs would be distinct from the main body text. Previously they appeared to all merge into one which made it difficult to pick out the various elements on the page. This involved simple indentation, but I thought that while I was there, I'd get rid of the horrible default borders on the tables, and then show warnings in a red box, and notes in beige... etc.. etc.

So now I'm soliciting feedback. I firmly believe that PostgreSQL seriously has some of the best documentation out there; clear, comprehensive and organised. So design changes mustn't be distracting, should aid readability and help communicate. Is it confusing? Does it make you want to cry?.. with joy.. with pain?

Sunday 5 September 2010

Why I refuse to use transporter technology

As you are aware, transporter technology has allowed us to deliver objects across great distances almost instantaneously. Not only does this mean we don't have to wait for items we've ordered to take a long journey across land and sea, but we also saves us a huge amount of money in postage costs.

However, while people have started to use this to travel across the globe, I can't bring myself to do it. The reason is simple. I will be killed off forever. Here's my rationale...

When the Transmat sequence begins, it starts by converting your matter to energy and during this process, every molecule and electro-chemical reaction in your body is recorded with the exact position, electrical charge, temperature... everything copied down into the network's distributed memory. This data is then sent to the rematerialisation hub nearest to your destination. The hub then generates the required energy to reproduce you, and then proceeds to convert your energy back into matter. Now in theory, this data could be duplicated and 2 copies of you could be generated. And here's where the disturbing truth comes in to play: you're no longer yourself.

Imagine you hacked a Transmat program to run through its routine, but instead of taking your matter and energy, it just copies it into memory. It then contacts a rematerialisation hub, say the one nearest you, to generate another you from that data. That "you" that has arrived will be completely the same as you, and he'll think "I'm a copy?". And you'll look at him knowing that's not you as you're looking from your original self. So that person *isn't* you. But in the normal Transmat process, you would have been converted to energy and this other "you" will go about convinced he's still the original, just arriving in a new location.

So you can see, transporter technology kills people and then creates people, but does it in a way to give the illusion of "transportation". It should really be called "destroying and cloning". I'll stick with the subterranean highways for now, at least until this promising portal technology becomes mainstream.

Friday 6 August 2010

GMail + Lab Features = perfect PostgreSQL Mailing list manager

I've been following many PostgreSQL mailing lists for some time now, and because so many emails arrive in my inbox every hour, they need organising. Until recently I've used a filter on my email to label everything to and from *@postgresql.org with the label "PostgreSQL", to mark it as read and to archive it (i.e. skip the inbox). That way I could just click on the PostgreSQL label and view all the emails. But I do subscribe to a LOT of the groups, and they're all bundled into the same label.

So comes Gmail Lab features to the rescue, namely "Nested Labels" and "Hide Labels from Subjects". This does require a fair amount of initial setup, but only needs to be done once. Basically Nested Labels allows for a tree-like structure of labels defined by /parent/child/grandchild/ etc. For example, I've set up a new filter for all emails to pgsql-general to be labelled PostgreSQL/General which puts it in the General label beneath the PostgreSQL label. (see the image to the right) In my case, I've also given them different shades depending on how hardcore the category is, with Novice as light blue, General as blue and Hackers as dark blue, but that's just purely cosmetic.

If you're at all familiar with GMail labels, you'll know that they appear next to the subject line of messages too, but with this new setup, it can become problematic. Not only am I still continuing to apply the original PostgreSQL label to all of them, but I'm applying at least 1 new one and potentially several others if the message was also sent to other groups. This means it becomes difficult to read the subject lines. And this is where "Hide Labels from Subjects" comes in, which prevents tags from taking up subject space by removing them from it completely.

I should also mention that there's another GMail Lab Feature I've enabled for using these mailing lists too: "Default 'Reply to all'" as whenever you're replying to someone on a mailing list, one should always copy in the entire list too, otherwise only the sender sees your message.

And not strictly related to mailing lists, but any email, another Lab Feature I find worryingly useful is "Undo Send". This, by default, gives you 10 seconds to change your mind about sending an email. It provides an "Undo" link upon sending a message. If you click it within 10 seconds of sending, it goes back into edit mode and doesn't send it. However, if you navigate elsewhere in your inbox during this time, you'll lose that opportunity. I didn't think I'd use it that much to start off with, but I'm surprised at how often I've looked at my email after sending and realised there was either a mistake or something I missed out.