In the previous post I introduced our Discover the Queenslander project for the SLQ, and mentioned that we used the AngularJS web framework. That process has got me thinking about some of the technical challenges in creating rich collection interfaces, and the different approaches in play, and I'll report on these in the next two posts. In this one I'll focus on AngularJS, and in the next, some broader questions on working with collection data on the client side.

AngularJS is a Javascript-based framework that focuses on extending HTML to deal with dynamic content. Angular "binds" data to HTML elements; so change the data, and the HTML updates. Even better, the bindings are two-way: interacting with an HTML element can also change its bound data. Angular implements a MVC (Model View Controller) architecture, where the data structure is the Model, the HTML document is the View, and a Javascript Controller links the two together.

Our previous web-based collections projects (TroveMosaic, Manly Images, Prints and Printmaking) were built in plain JS and jQuery. The general approach is pretty straightforward: load and manipulate some collection metadata (either from an API or a static JSON file), then build the HTML dynamically (adding and styling elements according to the data). jQuery makes handling interactions with the HTML pretty straightforward. It also (in my experience) makes for a verbose mess. Because all the HTML is built dynamically there's a lot of code devoted to creating elements, setting attributes, then stuffing them into the DOM. Code that loads and munges data gets tangled with code that builds the document and code handling interactions. Some elements get styled with static CSS, others are styled with hard-coded attributes. It all works fine - jQuery is very robust - but under the surface, it's bad code.

AngularJS tidies this process up quite a bit. Here's a quick example showing how straightforward it is to bind some collection data to some HTML. Say we have a JSON array items where each item looks something like:
{ "id":"702692-19340823-s002b",
 "title":"Illustrated advertisement from The Queenslander, 23 August, 1934",
 "description":["Caption: Practical garments","An Advertisement for women's clothing sewing patterns acquired through mail order from The Queenslander Pattern Service."],
 "subjects":["women's clothing & accessories","advertisements"],
 "thumbURL":"702692-19340823-s002.jpg",
 "year":"1934"
}
To create a HTML list where each item appears as a list element:
<ul>
 <li ng-repeat="i in items"> 
  <h1>{{i.title}}</h1>
  <img ng-src="{{i.thumbURL}}"/>
 </li> 
</ul>

Angular lets us iterate over a list of elements with the ng-repeat directive; it will simply generate a <li> for each element in the items array. Attributes of each item i are easily bound to the HTML using the {{moustache}} notation - so the item title will appear inside the h1. Apart from the compact, HTML-based rendering syntax, the killer feature here is that the HTML stays bound to the data: in order to change the display, we simply change the contents of items. No jQuery-style DOM manipulation; the data drives the document.


So rendering items in a list is trivially easy; but what about more complex displays? It's a matter of creating the data structures you need, then binding them to HTML in the same way. The Queenslander grid interface (above) includes a histogram showing items per year. In HTML this is simply another list, where each column is a list element. To create the data structure we sort the items into a new array where each element contains both the year, and a count of its items. Then as in the example, we run through the array with Angular building an element (this time a column) for each year. Angular's ng-style directive lets us create a custom height for each element, based on the number of items in the year list. With an array yearTable, where each year y has a totalCount
<ul>
     <li ng-repeat="y in yearTable">
           <div ng-style="{height: y.totalCount+'px'}"
           ng-click="setYearFilter(y.year);" >
     </li>
</ul>
Here Angular is doing some rudimentary data vis, linking variables in the data to the dimensions of each HTML element. Note also that each column element has an ng-click directive, calling a function that filters the items displayed. The term clouds for subjects and creators work the same way.

Hopefully this gives a hint of how AngularJS can be applied to cultural collection interfaces. From a developer's perspective, there are a number of big advantages. Compared to our previous jQuery process, Angular simplifies the page-building process immensely; the templating approach encourages a separation of concerns and more organised, maintainable code. Angular's data-centric binding also provides some big wins. Data structures (models) become more important; Angular requires that you get your data organised before binding it to the DOM. Coming from the free-wheeling procedural world of jQuery, this data-centric approach was the biggest conceptual challenge. The bottom line is: manipulate the data, not the HTML. The payoff is that the work of keeping the HTML and the data coordinated just disappears. Angular's modular architecture and active developer community also bring benefits: in the Queenslander project for example we used ngStorage, a module that made the favourites incredibly easy to build.

Compared to standard web interfaces, the big difference here is that all the collection data (in this case some 1000 items worth) is in the browser, on the client side. No server calls, pulling down a few items at a time - instead we load the whole set up front, and build the interface dynamically based on that data. The biggest payoff for this approach is responsiveness - filtering and exploration are lightning fast - but there are problems too; search engines can't index this dynamic content, and it requires modern browsers with fast JS engines. Some would argue that this approach is just plain wrong; abusing the client/server architecture of the web. I'm more of a pragmatist, but there are certainly some technical issues to consider, and in the next post I'll go a bit deeper into this notion of client-side data for digital cultural collections.

Discover the Queenslander

Discover the Queenslander is our latest generous interface project, commissioned by the State Library of Queensland to showcase their collection of covers and pages from The Queenslander newspaper. Published 1866-1939, The Queenslander was the illustrated weekend supplement to the Brisbane Courier Mail. This collection includes around 1000 covers, advertisements and illustrations - a beautiful slice of Australian pre-WWII visual culture. Geoff Hinchcliffe and I developed a web-based interface that builds on our previous approaches - rich overview, browsing and visual exploration - and adds some new techniques. Here I'll provide a quick outline; in the following post I will focus on the web framework we used - AngularJS - which I think has some interesting applications for digital collections.


The Mosaic view provides a chronological overview of the collection - each tile represents items from a single year. Like the Manly Images mosaic, the tiles gradually reveal their contents - in this case they are also directly navigable. The Grid view is a more general-purpose explorer for browsing subjects, creators and years as well as colours. Both Grid and Mosaic interfaces link to a detailed item view. There's nothing radically new here - though there are a few new elements that extend on our generous interfaces repertoire.

Inspired by the qualities of the collection images and the related work happening at Cooper Hewitt, Geoff and I were keen to experiment with using colour to explore the collection. The process was (surprise!) more complex than we expected, but ultimately rewarding. Using some palette extraction code that Geoff developed, we first pre-built a palette for each item. These colours are stored in the collection metadata, and act much like any other metadata field. The interface then dynamically builds an "overview" palette revealing the colours in the current set of items, and both the item palette and the overview palette act in turn as filters; rinse and repeat for open-ended colour-browsing. Note also how the filters and facets in the grid view interact; selecting a colour will also reveal corresponding dates, creators and subjects (and vice versa).


This project also introduces some simple personalisation, with the ability to curate and share a collection of favourite items. We opted for a lightweight, no-login approach using HTML5 web storage (essentially fancy cookies) to simply track item IDs. Sharing a collection is a simple as sharing a URL with a list of IDs baked in; and because collections operate within the standard grid view they get filters and facets too.

Finally a little feature that I am particularly fond of is the Trove link on the item page; a simple demonstration of how we might start to link up collections across institutional boundaries. In this case, the State Library of Queensland has high-res images of covers and illustrations, while the NLA's Trove publishes the full contents of The Queenslander (albeit with low-res scans). Using the Trove API we simply harvested the full list of issue dates and corresponding Trove IDs, then matched them against the SLQ items. So each Queenslander item also provides a link to its source issue, providing additional context as well as opening onto further exploration.



Over the past twelve months we have been developing some new approaches to the challenge of providing rich, revealing interfaces to cultural collections. The key idea here is the notion of generous interfaces - an argument that we can (and should) show more of these collections than the search box normally allows; and that there's a zone between conventional web design and interactive data visualisation, where generous interfaces might happen. There's more on this concept in my NDF 2011 presentation, or (in a more formal mode) in the paper I presented at the recent ICA conference.

Here I want to introduce an experimental "generous interface" prototype. Manly Images is an explorer for the Manly Local Studies Image Library - a collection hosted by the Manly Library. This is a collection of around 7000 images, documenting the history of the Manly region from the 1800s to the 1990s. The aim here was to develop a "generous," exploratory, non-search interface to the collection, delivered in HTML.


The original intention here was simply to adapt our CommonsExplorer work into HTML - CommonsExplorer uses a linked combination of thumbnails and title words to provide a dense overview of an image collection. But to "show everything" would mean 7000 elements, a stretch even for modern browsers; and I wanted to experiment with some new approaches to overview which remains the key problem here - a really juicy one. Given 7000 images with titles and little else, how can we provide a compact but revealing representation of the whole collection?

Here, the strategy was to break the collection into smaller segments based on either terms in the title, or date, and to draw each segment as a simple HTML div, where the size of the box reflects the number of items in that segment.  These segments also act as navigational elements, opening a "slider" type display for browsing through specific records, and finally a lightbox for larger images, with links to canonical URLs on both Trove and the Manly site.

As a visualisation, it's a bit like a treemap (without the heirarchy), or a reconfigured histogram. But a collection like this is more than a list of quantities; the texture and character of the images is crucial. So as well as showing quantity, the segments become windows revealing (fragments of) the images inside them in a rolling slideshow. We get a visual core-sample of each segment, revealing the character of that group; and across the collection as a whole, a shifting mosaic that reveals diversity (and consistency), and invites further exploration. An interesting side effect is that it becomes possible to surf through the whole collection without doing a thing; it will (eventually) just roll past. This might not be realistic in a traditional browser context, but that traditional, "sit-forward" user model is not what it used to be - as Marian Dork argues, the leisurely drift of the information flaneur might be more apt.


So, a rich exploratory interface to 7000 images, without search, and delivered entirely in HTML; we have shown that it's possible, but is it any good? I'll write up my own evaluation with some technical documentation shortly; meantime, feedback on the prototype is very welcome - and if you are interested in building on it, or adapting it for other collections, the source is up on GitHub.

Finally some acknowledgements: this project was funded by the State Library of New South Wales and supported by Cameron Morley and Ellen Forsyth; thanks to John Taggart of Manly Library for permission to use the image collection. The collection data is harvested from the excellent Trove API, developed by the National Library of Australia.

I recently gave this presentation at the National Digital Forum 2011 in Wellington. It proposes a way to think about collection interfaces through the concept of generosity - "sharing abundantly". The presentation argues that collection interfaces dominated by search are stingy, or ungenerous: they don't provide adequate context, and they demand the user make the first move. By contrast, there seems to be a move towards more open, exploratory and generous ways of presenting collections, building on familiar web conventions and extending them. This presentation features "generous interfaces" by developers including Icelab, Tim Sherratt and Paul Hagon, and it includes a preview of some work I am currently doing with the National Gallery of Australia's Prints and Printmaking collection, in collaboration with Ben Ennis Butler.

commonsExplorer

Although the Visible Archive project wound up months ago, its visualisation techniques live on. In particular I've been developing and adapting the title-word-frequency interface of the A1 Explorer, and trying it out on a range of different datasets. One of these spinoff projects - the commonsExplorer - has finally launched. Here, some documentation, reflection and rationale.

commonsExplorer 1.0
My colleague Sam Hinton and I began work on this as a project for MashupAustralia late last year. Our initial focus was the Flickr set of the State Library of NSW, and our aim was a rich, dynamic, "show everything" interface, building on the A1 Explorer work, but with image-based content. Some months later, having totally missed our original deadline, the scope had broadened out to the whole (amazing) Flickr Commons.

The explorer consists of a three-pane interface. The term cloud shows the 150 most frequently occurring words in the titles (not tags) of the current set of images. This will look familiar to anyone who's played with the A1 Explorer. It uses the same co-occurrence visualisation, and the same blocking / focusing navigation, with a few UI refinements. After some strong user feedback, I added a "back" button to step the navigation back one state. It also uses left and right-clicks, rather than modifier keys, to block or focus words. Applying this title-word approach to different sets has shown up its strengths, and a few weaknesses.


Its strengths are that titles and co-occurrence are a reliably rich cue for content, and that for most collections, thanks to the wonder of Zipf's law, the top-level cloud of 150 words will "cover" (refer to) more than 75% of the images in the set - even in a collection numbering in the thousands. Often, in smaller collections, the coverage is more than 95%. One question I haven't answered yet is how to communicate this idea of coverage to the user, and how to make those images not in the top level cloud, more immediately discoverable. Because after all, sometimes it's the outliers or exceptions in a collection, that we are interested in.

The bottom pane is the thumbnail grid, which is where most of the new stuff is. The grid is an attempt at a "show everything" image visualisation that can scale from tens to thousands of elements. As the number of elements grows, the grid size decreases to fit in the available space. Rather than scale images down, we simply crop the thumbnails - the intention isn't to represent the whole image but to provide some rich but unstructured visual clues: a sort of visual core sample through the whole set. The results show how this can help reveal structure within the collection. Different photographic processes are instantly apparent - monochrome, sepia, cyanotype, stereoscopic, Kodachrome. Other similarities also pop out, even in small tiles - landscapes vs portraits, for example.


This "clue" approach actually sums up our visualisation approach nicely. The Explorer presents us with a rich mass of partial information - or rather data: linked fragments of titles, and of images. Moments of discovery come when we see those fragments unified in a source image: the fragments are contextualised and become more meaningful. This contextual information then propagates back to the fragmentary display - when it works best there is a feedback loop from discovery to context and back to discovery. I've argued for a distinction between data and information, which is relevant here: these fragments are data points, abstracted and decontextualised. Information occurs only when we link and interpret those fragments - and it happens strictly on the human side of the screen.

Another feature of the grid that isn't immediately obvious is chronological sorting. Many collections, including the SLNSW set we started with, include dates in image titles. We look for those dates and sort dated images first in the grid. This approach is simple, and prone to the occasional false positive, but it degrades gracefully, and adds a usable layer of structure to the grid layout. Why not use Flickr's "date taken" field instead? Most Commons collections don't set it, so instead it gives the date uploaded. For the same reason we decided not to use tags, or attempt to scrape data from descriptions: these fields are inconsistent across the Commons - some images have no tags, others have dozens. Title and thumbnail seem to be the richest data that is always available.


Sam Hinton did the heavy programming work that makes the grid go. The main technical challenge we faced was memory usage: loading 700 tiny images just eats memory in Processing / Java. Sam devised a system for stashing the square thumbnails locally, optimising memory and acting as a cache to speed up loading. Drawing thousands of little images to the screen also raised performance issues - we draw to a single offscreen PGraphics context, then draw that to the screen.

In the end I think we've done what we set out to do - make a rich experience that encourages an understanding of context, and enables discovery in large collections. We've also shown that this approach is broadly applicable - if you've got a large image collection where you think it might apply, let us know. Most importantly though, try it out and let us know what you think.

Download commonsExplorer for Mac | Windows | Linux (1Mb)

Template based on Cutline port by Blogcrowds