Packing Them In

Up to this point the grid visualisations have taken a very simple approach to space: dividing it up equally among the data points, and then using hue and brightness to show attributes such as shelf metres and items. This has the advantage of simplicity, but it has a major disadvantage too: it's attempting to represent size (shelf metres or number of items) using other means. Why not just use size for size? Read on for the blow-by-blow account, or skip straight to the end result: the latest interactive sketch.

Before Christmas I had a first stab at this problem. The approach was basic, as usual. Maintaining the chronological ordering of the series, I drew each series as a square with area proportional to number of items. The packing procedure was simply: starting where the previous series is, step through the grid until we find a big enough space to draw the current series. The result looked like this:

After weeks of regular grids, this was a sight to see. The distribution of the sizes of series (overall and through time) is instantly apparent. This ultra-simple packing method is far from perfect, though, as you can see from all the black gaps. Because it tiles one series at a time, in strict sequence, and only searches forwards through the grid, gaps appear whenever a large square comes up as the search scrolls along to find a free space.

The main restriction here is the chronological ordering of the series. I need to maintain that ordering, but at the same time I need to be able to pack the squares more efficiently, which means changing the order. Luckily there's a loophole: as the first histogram showed, many series share the same start date. So we can change the sequence of those same-year series, without disrupting the overall order. We can pack them starting with the biggest squares and pack in the smaller ones around them. The latest sketches use this method, which can be described in pseudocode:

  1. Make a list of series with a given start year
  2. Working from biggest to smallest, pack each series into the grid, from a given start point: restart the search from the start point each time.
  3. Keep track of the latest point in the grid that this group occupies. For the following year, start from this point.
This improves the packing dramatically:

In this image square area is mapped to shelf metres; as in the earlier sketch hue is derived from the series prefix (roughly A = red, Z = blue). One artefact is apparent here - those lines of squares graded by size occur when nothing gets in the way of the packing process. As a byproduct of this, the biggest squares in those sequences often mark the start of a new year in the grid.

The latest sketches integrate both shelf metres and described items, and finally add interaction to this visualisation. To combine metres and items the squares are drawn as above, with area proportional to shelf metres; then overlaid with a second grey square, whose size is inversely proportional to the number of items in the series. The result is that series with many items are full of colour, and series with few items have large "hollows" and narrow coloured borders.

Again, there are relations between series here that are instantly apparent. It's easy to see those series that have lots of shelf metres but relatively few items, as well as even medium-sized series with many items. I couldn't find A1 in the earlier grids (though Tim Sherratt from the Archives could); it is much more prominent here. Tim also pointed out that B2455, one of the big series of WWI service records, didn't jump out of the grids: it's very prominent here. As well that cluster of post-War migration series spotted in the items grid reappears here. Promising signs for the usefulness of this visualisation.

All this is best demonstrated in the interactive version, which like the previous grids adds a caption overlay and some year labels on the vertical axis. Browse around and see what you can find - feedback very welcome.

3 comments:

These are looking great. As you say, the expected landmarks are there, but also lots that's unexpected and intriguing to encourage exploration.

One thing I'm not clear on is the relationship between the inner and outer squares. When I see that A1 has no inside square, I would assume that that means it is fully described. But I know it can't be saying that as there is no way of knowing if a series is fully described. So what is it saying? Is it just that for a series of its shelf dimensions A1 has a high rate of items described?

Similarly with B2455, I saw the hole in the middle and immediately wondered how there could be a hole when, as far as I knew the series was fully described. But again, it can't really be a hole as there is no 'total' number of items to compare to the numbers described.

I like being able to see the shelf metres and the number of items together, but at first glance the 'holes' seem rather misleading.

19 January 2009 at 2:21 pm  

Thanks Tim. You're right, the problem with this method is that a solid square suggests completion, when the data can't tell us that. And yes, it comes down to a ratio between described items and shelf metres - for some (high) value of that ratio, there will be no hole. Initially I am just aiming to show the range of values of that ratio. Maybe there is another way to visualise this without suggesting it's an absolute quantity? I'll think about it... An easy solution would be to just alter the ratio so that all the squares have a hole (ie make wall thickness map directly to number of described items). This doesn't address the underlying metaphor though...

19 January 2009 at 4:08 pm  

I am pretty impressed to see how does this issue work, in fact, I have never thought that things like this were possible nowadays. I am also very concerned about some Generic Viagra which is said that helps people who suffer some kind of issue.

26 August 2010 at 11:35 pm  

Template based on Cutline port by Blogcrowds