Tuesday, December 09, 2008

Table lens view of data matrix

Among the many weaknesses of my challenge demo is the way it simply dumps out a list of sequences (see comments on the demo. I decided to take a look at table lens after reading BiblioViz: a system for visualizing bibliography information -- see also Rao and Card's 1994 paper (doi:10.1145/191666.191776, there is a free PDF on Ramono Rao's web site), and DateLens (another product of the University of Maryland's Human -Computer Interaction Lab, who also gave us treemaps). I've hacked together some crude Javascript and CSS, taking some suggestions on Stack Overflow as a starting point (seems to work in Safari and Firefox, doesn't in IE6).

The idea is to display a table in a fixed space. As you mouse over a cell, the contents of the cell, and the relevant row and column labels become visible. This enables you to get an overview of the full table, but still see individual items:


It's easier to show than explain. For example, take a look at The amphibian tree of life, or watch this short screencast:





There are some things to fix. Firstly, I group all sequences by NCBI taxon and gene "features". If there's more than one sequence for the same gene and taxon, I just show one of them (an obvious solution is to add a popup menu if there's more than one sequence). Secondly, the gene "names" are extracted from GenBank feature tables, and will include synonyms and duplicates (for example, a sequence may have a gene feature "RAG-1" and a CDS feature "recombination activating protein 1"). I've stored all of these as not every sequence is consistently labelled, so excluding one class of feature may loose all labels from a sequence. At some point it would be useful to cluster gene names (a task for another day).

1 comment:

blOg said...

I like the new gene matrix feature much better than the list. You can see at a glance the level of matrix completion.
Whilst browsing, I came across a funny little typo. In a study of "Evolutionary history of Lake Tanganyika's scale-eating cichlid fishes." http://iphylo.org/~rpage/challenge/www/uri/3de601628f7a05eeafd47b8adc06de63 there is a sequence of an uncultured bacterium from the feces of an elderly human (AY920092). This is most likely a typo somewhere in the article where they meant AY930092 which is a sequence from a cichlid (Cheilochromis euchilus).