Archive for March, 2009

649week09 Divided Discussion on Evaluation

0 Comments

Since we divided the InfoVis class into two discussion groups on evaluation, I wanted to share a little bit about my group’s discussion and invite others to comment on what they took away.

One thing that stood out for me in evaluation was that not every situation calls for InfoVis. There was a prominent story in the news this week about the effectiveness of Jon Stewart’s presentation on CNBC’s financial predictions, linked below. The stories I saw emphasized that Stewart used no access to CEOs or experts, just widely available video clips and briefly stated, widely reported statements, shown simply in white type on a black background. This was certainly a case where information overload challenged a careful study of quantitative information, but certainly not a case where visualization of results was needed. I can easily imagine creating a visualization from the given evidence, but I can’t imagine how a visualization could be more effective than the simple presentation linked below.

(picture links to video on comedy central)

(picture links to video on comedy central)

One student brought up visual cues and the challenge of describing or evaluating them. The same student raised contrasting views of InfoVis as something to interact with casually, one time, say in an online news source providing surprise, and a long-term system needing to support continual surprise over years.

Domain knowledge is a big issue and an issue that looks bigger when making InfoVis for analysts whose domain knowledge is elaborate and acquired over years of expertise development. Herbert Simon claimed that anyone could be a world class expert in any subject given 15 years of devotion, by the way.

When should we not use InfoVis? Could machine learning be more appropriate sometimes? (it was this issue, raised by a student, that made me think about the Daily Show incident mentioned above)

We discussed the issues of choosing alternative techniques and supporting exploration and the subject of surprisingness.

An acute observation by one student was that social computing technologies allow us to find out about what people are thinking without having to ask them.

649week10 Interaction and Tacoma Crime Data

0 Comments

Consider Yi’s list with respect to a particular project, the Tacoma Crime Data project.

It’s typical to view crimes and density of crimes. What about victims and their traffic patterns? Where do they go and how long do they stay there? I’ve lived in several cities where the crime downtown at night is high, but nonexistent during the workday. Two completely different populations seem to inhabit the same space. I’ve also lived in one city where a main artery to downtown was a crime-ridden street with cheap motels leading from a freeway exit. As a result, many people drove through a high-crime area in commuting, but never stopped. Can you follow a group of victims for 24 hours before they became victims? Can you follow them and follow others? (select)

Are crimes related? A recent Economist article about legalized prostitution in two countries contrasted the results, saying that one country still experienced high levels of other crime associated with prostitution while the other country did not. One key difference was that the method of legalization in one case required the prostitutes to affiliate with an institution while the other did not. The country with the institutional requirement found that organized crime fulfilled it. This suggests that a visualization of types of crime may be more informative if it is coupled with a visualization of types of criminals. (explore)

Do you have any way of sorting on different variables? Does the value of sorting suggest a non-cartographic representation? (reconfigure)

How many values does each variable have? Do some variables map better to color encoding and others map better to shape? Should you use multiple modalities to encode the same information? For example, should severity be shown by a combination of color and size? (encode)

I use Google maps at different levels of detail in planning a single trip. Suppose I were to do that with crime data overlaid. How would that change when I zoom in and out? (abstract / elaborate)

We’ve discussed hierarchies and whether some are somehow more natural or more privileged than other categories. Suppose you can let the user filter the data to be visualized. Can you establish some kind of ranking of the most important or most likely criteria for filtering? Would it change your design? Suppose, for instance, you could filter on attributes of the victim or of the criminal. Which would be the default? (filter)

Crime data seems so complicated that cries out for techniques like brushing, where you select (brush) an item in one view to highlight it in another view. For example, time and location are most easily in different representations, but I may want to know when crimes occur in a given location or where crimes occur at a certain time. (connect)

649week10 Interaction in Information Visualization

0 Comments

Interaction in Information Visualization

Yi et al. (2007) begins by dividing InfoVis into two components, representation and interaction. Do you believe that distinction? Why do you suppose I assert four components: information and users in addition to Yi’s components? Where does context fit into either of these models? Note that there is a community of researchers on the theme of User Modeling and that they often try to personalize applications by understanding user context. That community includes some of the scholars such as Alfred Kobsa whose papers you have read in this course. You may want to consider how congruent their main objective is with the notion of using InfoVis to reduce information overload.

We’ve previously discussed varying definitions of InfoVis and the tendency of some authors to relax the interaction requirement to define static images as InfoVis. What do you think of Yi’s description of interactions with static images, including rotating, looking closer / further, and annotating? What about the idea that a passive interaction changes or enhances the mental model of a data set. For example, I showed you a static image of i-Schools generated by a multidimensional scaling of terms in course descriptions, reproduced below.

zoom-to-lis-4

I told you that we can see two communities and a crescent of schools whose identities are more disparate. You can perhaps imagine that a particular viewer’s mental model of this data set will change based on level of familiarity with one or the other community, or with the issues facing the disparate schools.

How do you define interaction? There is probably a typo in Yi et al. where they mention “direction manipulation” and presumably mean direct manipulation. You all have experience with direct manipulation. Do you restrict your frame of reference to this interaction style? What other interaction styles have you learned and possibly used?

Yi et al. draw on prior work to suggest three ways to evaluate models: their ability to describe a relevant set of alternatives (needs generality), their ability to inform evaluations of alternatives (needs concreteness), and their ability to help create new designs (needs openness). It might be useful to look at the following model with this criteria in mind. Why do you think they claim that existing models don’t satisfy these three criteria simultaneously? What role do you think affinity diagramming might have played in the leap from (A) listing techniques to (B) settling on user intent as a grouping mechanism?

Yi’s main contribution is a detailed discussion of the following list of interaction categories.

  • Select: mark something as interesting
  • Explore: show me something else
  • Reconfigure: show me a different arrangement
  • Encode: show me a different representation
  • Abstract/Elaborate: show me more or less detail
  • Filter: show me something conditionally
  • Connect: show me related items

    How do these categories relate to your projects? Can you use these categories to examine your existing design choices or to help you make new ones?

  • 649week09 Farnsworth Treemap Re-revisited

    0 Comments

    As a followup exercise, I’d like to revisit the Farnsworth treemap you designed last time. This time I would like you to work on the computer, using an application of your choice, and design a color scheme for your treemap. Please produce a one page summary of your activity in pdf format and put it in the shared folder with your names on it. This page should have four or five images on it, as follows: there should be two or three images of the treemap in different states, there should be a legend, showing what the colors mean, and there should be an image of a color scheme you copy and paste from an external source, to be explained next.

    colorbrewer

    Begin by looking at ColorBrewer, illustrated above. This tool helps you choose related colors, offering five-color palettes showing colors related as sequential, diverging, or qualitative. Pick one or more and copy it into the document you will share with us. If you don’t like any of these, visit kuler.adobe.com and pick a palette from there. Put that into the pdf you will share with us.

    bathymetric-legend

    Next, extend the palette. Add at least two colors to it. Assume that you need at least seven colors and you’ve been given five colors to work with. You may notice a problem, depending on the kind of scale you’ve chosen and the relationships in the data you intend to explore. Consider the legend above, reprinted from the given source by Tufte. You probably perceive a linear relationship between the colors in this legend. You may be surprised if you apply Digital Color Meter or Art Director’s Toolkit to analyze the color components. What you will see is that, to achieve the perception of linearity, non-linear amounts of components have been added (or subtracted) at each level.

    Finally, create two or three images of the treemap in different states to illustrate how the color scheme you’ve chosen supports the goals you’ve chosen. Put these on the same display as the legend and the original palette and save as pdf and put into our shared folder. Feel free to modify aspects of this exercise as long as you keep to the spirit of (1) making choices about color and (2) extending widely available canned color choices.

    649week09 Readings in InfoVis Evaluation

    0 Comments

    InfoVis has been around long enough for the community’s attention to fasten on evaluation as a topic, rather than to subscribe to the evaluation methods inherited from constituent disciplines. The pair of papers by Plaisant (2004) and Shneiderman (2006) illustrate this questioning and the synthesis of a technique, MILC, for multidimensional indepth longterm case studies, arising from this questioning. The 2004 paper describes several existing techniques with different strengths and weaknesses, and promoting field studies. The 2006 paper provides some detail for integrating various techniques into a new, but potentially much more expensive technique, MILC. What do you think of the predicted trajectory: modest MILCs followed by more ambitious ones. Is it reasonable to assume that popularity of the technique will lead to developments that reduce its cost? What are the ideal conditions for MILC to succeed?

    vast-eval

    Plaisant (2008), in what may be the most useful paper you will see in this class, describes an InfoVis contest and, in Section 4.6.1, describes evaluation challenges. This section includes many provocative statements worthy of discussion. For example, Plaisant admits that it’s hard to keep track of different, mostly visual, artifacts when judging. How would you address this problem? What do you think of Plaisant’s proposed solution (shared environment)? Another issue pervading not only this section but the entire contest has to do with the magnitude of what is being evaluated. The effects being studied may be overwhelmed by other environmental features. Plaisant, by the way, refers to the VAST reading we discussed previously and whose evaluation model is illustrated above.

    Another approach to evaluation is described by Tory (2005). Heuristic evaluation should be painfully familiar to most of you, and this should be an interesting opportunity to see a different community adopting them. Given your experience, do you see anything missing from this paper? (Hint: How does Nielsen justify the particular set of heuristics he describes?) On a related note, how might you answer (or integrate) the criticisms in Thimbleby (2007)? This paper, like Tory’s, may suffer a little from a sketchy understanding of user-centered design and beliefs that are enhanced by a lack of frequent contact with it. For example, what is your view of the iterative design cycle shown in Figure 9?

    Buring (2006) shows how you could evaluate an interaction design on a small device without the small device. It is for you to consider whether the simulation described, using a device for which we have a good proxy in DL1, overcomes the problem of not actually using the device. This paper provides a good introduction to methodology (but see Amar (2005) for an example of how to push beyond the technique. It shows a very common set of priorities in looking at task completion time and preference.

    analytic-gaps

    How can we enrich evaluations? Kobsa (2001) evaluated three commercial InfoVis systems using an experiment and a method that’s a good model for understanding how InfoVis features lead to outcomes. It’s worthwhile to look at this study to see how you can overcome some of the methodological problems your own intuition may suggest. Nevertheless, there are limitations to evaluating InfoVis artifacts in this way. Amar (2005) provide some insight into these limitations. Amar and Stasko (2005) criticize InfoVis evaluation in general as focused on representational primacy: how well do you get the information via the information representation? They introduce two kinds of gaps left unaddressed by evaluations respecting representational primacy, a worldview gap (what is the right data? what is the right presentation design?) and a rationale gap (how strong are the relationships shown? how confident are we in the usefulness of relationships shown?). They show how the Kobsa evaluation could benefit from considering these two gaps.

    649week09 Evaluating Information Visualization

    0 Comments

    How can we evaluate a given InfoVis artifact? How do we know what a given InfoVis is good for, if anything? Can we use intuition? Without any study at all, we might just start thinking about what can be measured. We can ask someone to look at an InfoVis and observe them. We can ask them afterward if they liked it. We can ask them questions that we think the InfoVis might help answer and see if they can answer the questions better than if they did not use any InfoVis. We can look at the advertising for a given InfoVis, then look around and see if we see something that advertises the same thing, but is not InfoVis. Then we could compare them, using the above list.

    If we try to do any of the things listed above, we will soon find many pitfalls plaguing every one. You can probably envision better and worse ways to do every single thing listed above. If we discussed them, you will soon find that there are pitfalls you didn’t think of, but that your classmate brought up. You might think gathering a group of people would improve the ideas about how to evaluate InfoVis. What if all the people you gathered were cognitive psychologists and software engineers? Do you think they would systematically catch some problems and overlook others? Do you think they would be better equipped to handle some problems than others?

    top-papers

    InfoVis people are, by training, a much more diverse group today than they were ten years ago. Consider the above picture, from the InfoVis 2004 Contest to visualize the history of InfoVis (Fekete, J.-D., Grinstein, G., Plaisant, C., IEEE InfoVis 2004 Contest, the history of InfoVis, www.cs.umd.edu/hcil/iv04contest (2004).). There were several winning entries, including the one from which this picture and the following one are drawn. The picture above answers the question of who wrote the most frequently cited papers. The subsequent one combines several features to arrive at some idea of influence. Although we see that George Furnas and George Robertson wrote the two most cited papers in the picture above, we see from the subsequent picture that Shneiderman has written the most papers and has the most coauthors. Independently, we may find that he has graduated far and away the most Ph.D. students, increasing both his number of papers and number of coauthors. We can also see the strong tie between Card, Mackinlay, and Robertson in the subsequent picture. And so on. Pictures like this strive to show a community. You could probably imagine taking this another step and checking on the field in which the central players obtained their training. That might give us a very good clue as to which evaluation tools are valued in the community.

    coauthor-history

    In my next entry, I’ll discuss some of the readings, none of which explicitly address this issue, but which is something you may want to keep in mind as you look at examples of evaluation and commentary on evaluation itself.

    Uncategorized Obtaining Last.fm data

    0 Comments

    An enterprising student decided to use Processing to visualize her Last.fm listening history. To make that happen, she can use the Last.fm API. One way to do so is to use the Python binding for this API, developed by Amr Hassan. I obtained it by saying

     svn checkout http://pylast.googlecode.com/svn/ 

    at a terminal prompt. Then I said

    cd svn/trunk
    python setup.py install

    and put the following into a file called recent01

    import pylast
    API_KEY = '12345'
    print API_KEY
    API_SECRET = '54321'
    print API_SECRET
    import atexit
    import os
    import readline
    import rlcompleter
    historyPath = os.path.expanduser("~/.pyhistory")
    def save_history(historyPath=historyPath):
        import readline
        readline.write_history_file(historyPath)
    if os.path.exists(historyPath):
        readline.read_history_file(historyPath)
    atexit.register(save_history)
    del os, atexit, readline, rlcompleter, save_history, historyPath
    username = raw_input("Please enter your username: ")
    md5_password = pylast.md5(raw_input("Please enter your password: "))
    session_key = pylast.SessionKeyGenerator(API_KEY, API_SECRET).get_session_key(username, md5_password)
    print session_key
    

    Bear in mind that I’ve previously joined Last.fm so I have a username. I registered at Last.fm to try the API and, in the course of doing so, I received an API key and an API secret. I copied this locally and pasted them into the above script in place of the numbers 12345 and 54321.

    One annoyance was that I did not know much about Python, such as how to save and reload commands given interactively. The above lines from atexit to del os … serve to save history and let me copy it to another file between Python sessions. There are undoubtedly better ways. The lines after that serve to get the Session key, the third item we’ll need to get our recently played tracks. I ran this script by saying

    execfile('recent01')

    at a Python prompt. I then cut and pasted the Session key into the following script.

    import pylast
    API_KEY = '12345'
    print API_KEY
    API_SECRET = '54321'
    print API_SECRET
    session_key = 'abcde'
    print session_key
    import atexit
    import os
    import readline
    import rlcompleter
    historyPath = os.path.expanduser("~/.pyhistory")
    def save_history(historyPath=historyPath):
        import readline
        readline.write_history_file(historyPath)
    if os.path.exists(historyPath):
        readline.read_history_file(historyPath)
    atexit.register(save_history)
    del os, atexit, readline, rlcompleter, save_history, historyPath
    curruser = pylast.User('mickmcq', API_KEY, API_SECRET, session_key)
    rtracks = pylast.User.get_recent_tracks(curruser)
    print rtracks
    

    As before, the part from import atexit to the line beginning del os, is just a quick and dirty way to get a command history. The payload of this script is rtracks, a listing of recently played tracks, along with the time they were played. Obviously, you need to substitute your user name in the line beginning curruser. The contents of rtracks looks as follows when you print it. You can format it, but this is what I got by default:

    [Cyanotic - Transhuman played at 8 Mar 2009, 00:18, My
    Life with the Thrill Kill Kult - Hottest Party in Town
    played at 8 Mar 2009, 00:16, Hanzel und Gretyl - Fukken
    Uber Death Party played at 8 Mar 2009, 00:11, Flesh
    Field - This Broken Dream played at 8 Mar 2009,
    00:04]

    649week08 Visualizing Filesystems

    0 Comments

    I’d like to continue our exploration of TreeMaps this week with a followon exercise. While I was planning this, Jakob Hilden contributed an interesting alternative filesystem visualization I’d like to share. The following picture is an annotated screenshot of Liquifile.

    (picture links to Liquifile website)

    (picture links to Liquifile website)

    Jakob also points out a Quicktime video you can see to understand better what’s going on here. Jakob adds that he especially likes the horizontal time dimension, which he thinks can be useful to find files from a certain timeframe.

    This raises an important issue for our treemap exercise. How do people use the treemaps you designed last time? What kinds of questions can people answer? I’d like to explore this by asking you to redesign your treemaps to accommodate a fictional user, Professor Farnsworth. Farnsworth is constantly running out of space on his laptop and tries to organize his files in a way that allows him to keep track of things kept locally and remotely.

    Farnsworth has a number of kinds of files and has tried to create a hierarchy allowing him to find a given file without resorting to googling. In particular, he’d like to find files related to research topics, research projects, classes he’s teaching, skills he’s trying to acquire, and administrative work. Instead of searching for related keywords, he’d like to be able to browse for things created at the same time or while working on the same topic, project, class, or skill. Many of these turn out to overlap, so he keeps changing the hierarchy of folders, sometimes using the four headings above, but also using headings indicating the source of files, whether created by Farnsworth or others. Another heading has to do with file types. Farnsworth takes a lot of photos and creates a lot of videos. These files threaten limited laptop space more drastically than text files. A further complication is that Farnsworth takes a lot of baby photos and baby videos that are not part of his work but wind up on his laptop. They’re large and numerous and need to be organized largely by time so that he does not bore people by showing them the same images over and over.

    Finally, Farnsworth needs to share some files and keep others private. He’d like to be able to easily put sections of the filesystem on a shared server, so that the appropriate groups of faculty, students, and staff can see only what they’re supposed to see. Examples of things that must be kept private include budgets for projects, grades for classes, evaluations of staff members, evaluations of prospective students and faculty, and plans with personnel implications. In all of these cases, it’s difficult and complicated (technically not philosophically) to know exactly who can and can’t see each file, but it is possible. The current form of organization is problematic in that files for different audiences reside in the same folders.

    So the exercise is to create a TreeMap visualization of Farnsworth’s laptop-based filesystem that helps solve these problems more than just what a basic TreeMap application does.

    649week08 Visualization beyond the Desktop

    1 Comment

    We have a guest this week, Erik Hofer, whose various infovis and scivis projects can be seen on the third floor at SI North. Erik gave an initial title of Beyond the Desktop, but that may have changed slightly. In any event, the emphasis is the same. We’ve talked about users, and Erik’s talk will place users in environments outside the office.

    Consider Shoemaker (2001). This paper is primarily concerned with users who work together and have different information needs. I have personally been in group situations where a single display showed a log of individual activities. In my case, representatives of multiple groups working together, it would have appealed to individuals to have glasses like those described in this paper, allowing some members of one group to shut out the logs of the other groups, while members assigned to bridging across groups needed to take a superficial look at everything. This scenario differs a little from Shoemaker’s private vs. public setup. Can you think of other situations where it might be advantageous to have a single display, but have parts hidden from some users? One issue to consider is motivation. In the example given above, the users would have considered information hiding a benefit. In the paper, privacy preferences were considered an issue.

    What about the private / public distinction? Research by Mark Ackerman showed that people altered their work habits (in a good way) when they knew some limited information about what their collaborators were doing. If you agree, can you think of ways to extend this public / private display to stimulate this alteration?

    Baudisch (2002) demonstrates the value of a focus plus context display that doesn’t require you to switch views during the course of work. One limitation of this study is that it employs very expensive displays that you could not afford to distribute on a one-on-one basis to everyone. The authors give a few examples of scenarios for use, including web designers, graphic designers, GIS specialists, chip designers, drivers, and gamers. It might be worthwhile to think about scenarios in which you might use these displays. Suppose you had one in a conference room, dining room, classroom, lobby, or other location where many people congregate. Can you give an example of how a focus plus context display could help in one of these environments?

    North Campus has numerous tiled displays and opportunities for tiled displays in public spaces. North Quad, SI’s future home, may have tiled displays in public spaces. Ball (2007) explores such displays with respect to physical navigation. Although they use a somewhat elaborate setup, the past year has seen at least one very inexpensive package for gesture interaction with a display. It’s realistic to imagine interactive building directories, for instance, appearing on public displays. One interesting issue has to do with how multiple people approach public displays at the same time. Do they take turns? Do they respond to each other based on “eavesdropping”? Is there a way to represent situational information on a large tiled display that adds more than a linear combination of individual interactions?