Edward Tufte: Presenting Data and Information

I had a chance to attend an Edward Tufte class this past week and it truly was a pleasure. He has published a number of beautiful books on presentation and the visualization of data. So, it was quite a treat to sit in on a presentation by someone that teaches about giving presentations for a living. The class was engaging, full of content, and certainly left me with a sense of excitement.

the class

One big take away for me was that clutter and the sense of being overwhelmed by data is not an attribute of too much information, but rather a consequence of poor design. How many times have you looked at an information dashboard or a chart in a meeting only to get a headache from trying to grasp what was trying to be communicated? But yet we are capable of navigating and internalizing large amount of information if it is properly displayed and explained; those are the truly elegant presentational designs.

The class covers the basic principles you would want to follow to present your data in such a way to tell a story — a persuasive one. Things like how we can layout and present data to facilitate the basic intellectual process that one goes through when considering and weighing a proposal or story. I went into the class thinking I would learn some better ways to visualize and display complex datasets. I think I have some better ideas in this area, but only as a result of the bigger insight I walked away with on how to make a better presentation.

The other majorly cool bonus was being less than a foot away from a 1st edition Galileo printing. This along with an early printing of Euclid’s helped demonstrate the power of “breaking out of flatspace” by bringing something physical into a meeting.

bringing it home

So after all this excitement I went home and tried to look for ways to integrate Tufte’s design principles into my own presentations and reports. Edward Tufte’s website has a rich forum called Ask E.T., which contains information about presentation in a number of areas including project management. One Ask E.T. thread lead me to a project on Google Code that contains a Tufte-inspired LaTeX layout.

I expect to play around with some of these designs and see how I might better polish my own reports.

Another Amazon EC2 Beowulf Cluster Joins The Grid

Have you ever been working with a dataset, started crunching some numbers and said to yourself, “damn, I should distribute this across the cluster,” only to realize that your cluster is already saturated with your last job and will be for the next day or two? If you answered yes, then we probably share the same data-craving/slicing/mining sickness.Well the above scenario happens to me and often enough for me to pose the question to others. I could simply invest in a larger cluster — an expensive investment, especially since the scenario often only requires bursts of compute time. This would make an on-demand cluster a perfect solution.

On-Demand Beowulf

I had heard some chatter about Peter Skomoroch’s ElasticWulf and found myself walking through his series on creating an on-demand beowulf cluster using Amazon’s EC2. You can find his very helpful posts here and here (with another on the way). ElasticWulf is a package of Python tools and machine images that allow you to create and manage a beowulf cluster on Amazon’s EC2 service. Peter has done the heavy lifting for you: the machine images come loaded with your essential computational Python packages like SciPy as well as cluster middleware so you can get up and running with minor configuration.

The Results

After running through Amazon’s EC2 Getting Started Guide, and Peter’s posts I was up and running with a new beowulf cluster in well under an hour. I pushed up and distributed some tests and it seems to work. Now, it’s not fast compared to even a low-end contemporary HPC, but it is cheap and able to scale up to 20 nodes with only a few simple calls. That’s nothing to sneeze at and I don’t have to convince the wife or the office to allocate more space to house 20 nodes.

I don’t currently have any hard numbers to back up my ephemeral cluster’s performance, but it is something I am curious about. How much can these virtualized Opteron 250s dish out? It looks like Peter’s third installment will address benchmarking performance, which is something that I will look forward to. In the meantime I might just push up High Performance Linpack (HPL) and see how it stacks up (in the abstract) against my existing clusters.

Now that I’m finally up and running a cluster on EC2, I plan on immersing myself in more data. It will also be a nice place to experiment with other cluster technologies I have been meaning to investigate like Hadoop; in fact there are already public Amazon Machine Images for Hadoop nodes.

Exciting stuff…

FogBugz: One Hot Ride

So, a few months ago I came across a video by Fog Creek demonstrating the latest version of their project management software FogBugz. Wow. It looked great and appeared to have many of the features I had been searching for across the project management software terrain.

It was not until a couple of weeks ago that I actually set up a trial account. I signed up and began configuring FogBugz to manage a small project that would likely run its full course over three to four 2 week iterations. Let me start by saying that FogBugz has been a pleasure to work with.

Simple and Intuitive

The folks over at Fog Creek have done a good job minimizing the amount of clicks required to accomplish some basic tasks like creating new cases/features/bugs. Lists of bugs and features can be viewed by various characteristics through filters or by easily accessible reports. It might seem like these features would be difficult to get excited about in a world filled with things like Django and Rails, but FogBugz managed to get its factors right making the process “flow” really well.

Oh, The Integration

Wiki, bug tracker, project management/prioritization view, customer email (support), and source control all talking to one another… Again, while other applications might offer these same tools and integrate some subset of them, FogBugz goes a step further. The integration is tight and using these utilities feels like you are just dealing with another dimension of the same data as opposed to using two independent applications that are similarly branded and share some ability to markup references.

That being said, I still have not yet figured out how to integrate source control. You can download a plugin for Visual Studio if that’s your tool of choice, but my experiments have been taking place in a Linux environment. There are a pair of shell scripts available for download that should solve my problem, but it was not immediately clear to me how to use them. That’s my next mission.

Evidence-based Planning

I am a huge proponent of evidence-based planning and it looks like they’ve done a good job bringing this capability to those who might not be familiar with the process. The gist of it is that by capturing both project estimates (at the feature level) and actual time taken you can build a model of your estimate accuracy, development velocity and then in turn generate release forecasts based on historical data. So with a click of a button (and disciplined engineers who enter their estimates and elapsed times) you can have nice reports and charts showing when you are most likely to deliver your releases and with what features — all based on historical data. It is a powerful concept and well executed here.

FogBugz goes as far as to model the estimation error of individual developers. This sounds great and is a good story, but I wonder how useful this actually is in reality where developers learn from past experiences and self-correct; without some Bayesian filtering I could see these values become non-informative fairly quickly.

I am also still playing around with the best way of capturing my existing process where we use team-based story point estimation. It seems like it will be straightforward, but I am waiting for a few gotchyas lurking in the darkness.

Things That Make Me Go “Hmmm”

I have struggled a bit to fit my Scrum-based backlog management mindset into FogBugz. The bugtracker allows features and bugs to be prioritized. Those priority levels are totally configurable and could be defined for whatever level of detail you need. One thing I am missing is the ability to deal with micro sprint-level priorities (i.e. what order should this set of features be implemented in to minimize overall project risk?). I realize I could define as many priority levels as I need, but it would start getting a bit tedious defining that many sub-levels. Fortunately, the project I am currently using FogBugz for is small enough that I can manage without that functionality. It does however become important when you start increasing the team size and dealing with projects where you really do have a rank ordering of feature priorities. It could be that ScrumWorks by Danube Technologies has just spoiled me with their Scrum-based backlog system.

Another thing that has detracted from my experience is its response latency, but let me say that it is not bad. I think it might be the case that the application’s simplicity and overall experience allows me to forget that it is a remotely hosted application. I wonder how responsive the server license version is, where you host the code and the server is just working with your projects and perhaps on-site. Maybe I will find out one day.

Conclusion

My preliminary run through FogBugz has been overwhelmingly positive. It is a powerfully usable application that offers tight integration amongst core tools used in not only the development, but also the support of software applications. Check out the official highlighted feature list.

If you are looking around for some project management software then definitely check FogBugz out and see how it fits with your current process. I am still waiting to see how FogBugz will fit into our overall process once it has had a chance to run the full cycle — so far, it is looking good.