649week09 Readings in InfoVis Evaluation

InfoVis has been around long enough for the community’s attention to fasten on evaluation as a topic, rather than to subscribe to the evaluation methods inherited from constituent disciplines. The pair of papers by Plaisant (2004) and Shneiderman (2006) illustrate this questioning and the synthesis of a technique, MILC, for multidimensional indepth longterm case studies, arising from this questioning. The 2004 paper describes several existing techniques with different strengths and weaknesses, and promoting field studies. The 2006 paper provides some detail for integrating various techniques into a new, but potentially much more expensive technique, MILC. What do you think of the predicted trajectory: modest MILCs followed by more ambitious ones. Is it reasonable to assume that popularity of the technique will lead to developments that reduce its cost? What are the ideal conditions for MILC to succeed?

vast-eval

Plaisant (2008), in what may be the most useful paper you will see in this class, describes an InfoVis contest and, in Section 4.6.1, describes evaluation challenges. This section includes many provocative statements worthy of discussion. For example, Plaisant admits that it’s hard to keep track of different, mostly visual, artifacts when judging. How would you address this problem? What do you think of Plaisant’s proposed solution (shared environment)? Another issue pervading not only this section but the entire contest has to do with the magnitude of what is being evaluated. The effects being studied may be overwhelmed by other environmental features. Plaisant, by the way, refers to the VAST reading we discussed previously and whose evaluation model is illustrated above.

Another approach to evaluation is described by Tory (2005). Heuristic evaluation should be painfully familiar to most of you, and this should be an interesting opportunity to see a different community adopting them. Given your experience, do you see anything missing from this paper? (Hint: How does Nielsen justify the particular set of heuristics he describes?) On a related note, how might you answer (or integrate) the criticisms in Thimbleby (2007)? This paper, like Tory’s, may suffer a little from a sketchy understanding of user-centered design and beliefs that are enhanced by a lack of frequent contact with it. For example, what is your view of the iterative design cycle shown in Figure 9?

Buring (2006) shows how you could evaluate an interaction design on a small device without the small device. It is for you to consider whether the simulation described, using a device for which we have a good proxy in DL1, overcomes the problem of not actually using the device. This paper provides a good introduction to methodology (but see Amar (2005) for an example of how to push beyond the technique. It shows a very common set of priorities in looking at task completion time and preference.

analytic-gaps

How can we enrich evaluations? Kobsa (2001) evaluated three commercial InfoVis systems using an experiment and a method that’s a good model for understanding how InfoVis features lead to outcomes. It’s worthwhile to look at this study to see how you can overcome some of the methodological problems your own intuition may suggest. Nevertheless, there are limitations to evaluating InfoVis artifacts in this way. Amar (2005) provide some insight into these limitations. Amar and Stasko (2005) criticize InfoVis evaluation in general as focused on representational primacy: how well do you get the information via the information representation? They introduce two kinds of gaps left unaddressed by evaluations respecting representational primacy, a worldview gap (what is the right data? what is the right presentation design?) and a rationale gap (how strong are the relationships shown? how confident are we in the usefulness of relationships shown?). They show how the Kobsa evaluation could benefit from considering these two gaps.

Leave a Reply

You must be logged in to post a comment.