Hello World Update

It’s been awhile since my last real update and a lot’s been going on. First, the good people of Popular Mechanics reached out to me earlier this year about the squirrel/sentry gun project and did a little write-up about it in their September 2012 issue. Having grown up on Popular Mechanics it was very cool and flattering. (I’m still tempted to send away for plans to build a flying ship out of ordinary household vacuum cleaners.) I think my favorite part was getting a cartoon rendering of myself. And to all the people who write to me about their ongoing wildlife battles: stay strong!

Popular Mechanics

Since my last real update, I also left my amazing team to start a new adventure. I am now the Director of Technology at Pruvop. As a digital products laboratory we work on a wide-range of projects, from building functional prototypes for early-stage startups to helping larger organizations streamline their internal processes by integrating intelligent automation software. I’m back in the mud again, designing and building all sorts of cool projects in Downtown Durham. It has been great being surrounded by a cross-functional team (marketing, business, developers) who all appreciate the strengths of agile and lean methodologies.

I hope to be able to share some new developments in the coming months!

GHAPACK: A Library for the Generalized Hebbian Algorithm

I recently joined a new open source project called GHAPACK. The project currently provides the functionality and the means to use the Generalized Hebbian Algorithm. I came across this project after banging my head against some of the practical limitations of Singular Value Decomposition (SVD). GHA is a Hebbian-based neural network-like algorithm that approximates SVD’s ability to perform eigen decomposition. Its added bonus is that it allows for incremental training so you can refine your model with new data without having to recompute using the entire dataset.

Your Trusty SVD Tool

SVD is one of those tools that every machine learning practitioner and computational geek will pull out at some time or another. It’s a powerful matrix factorization technique that allows you to get at the matrix’s eigenvectors and eigenvalues. One of reason it tends to be used so often is the fact that it can be used on those pesky M x N matrices, which us data junkies tend to generate.

For most small problems I can just use scipy and numpy’s svd and never give it a second thought. LAPACK’s suite of SVD routines power the svd functions of scipy, numpy, and MATLAB among others. It is developed for dense matrices and processes them in their entirety. What happens when you start dealing with problems in high-dimensional space? Those dense representations and full processing are expensive. So, when your problem space is better suited for sparse matrices you tend to run into not enough memory, non-convergence…no SVD.

At the time I was considering a problem that would be well-suited for incremental training, meaning I did not want to have to rerun the entire dataset through SVD after adding a small set of new data; GHA lets you avoid that sort of inconvenience and approximates the same outcome (as far as my problem was concerned).

GHAPACK

GHAPACK is written by Genevieve Gorrell and based on her work using GHAs to perform Latent Semantic Analysis (LSA).

“Offline” Calculations

My first order of business upon joining the project was to get the offline training working. This allows you to compute a pseudo-SVD based on a massive matrix without having to load the whole thing into memory. No more out-of-memory segfaults. Now, you’re just limited by the resiliency of your hardware. This is now working.

Memory Management

I addressed a few memory leaks, but will likely do some restructuring to optimize memory management.

Resource Library

I would like the core of the GHA magic to be extracted into a library that others could embed in their own projects. So, I intend to move core functionality into a library and restructure the existing apps into commandline tools that utilize those libs.

Performance

GHA, off-the-bat, is not known for its speed compared to some other eigen decomposition approaches. Besides that, there is room for some major gains in performance. Let’s see what we can squeeze out of GHAPACK and perhaps lean on things like BLAS.

Testing Framework

A testing framework that objectively keeps track of performance gains, while ensuring computational integrity through unit testing always makes refactoring work that much less stressful.

Lots of work to do and a hearty thanks to Professor Gorrell for letting me join her efforts.

Other SVD Resources

There are other SVD libraries out there that will carry you farther if SVD is what you really want and not necessarily the means to an end.

ScaLAPACK has parallel SVD code, which creams LAPACK’s performance when you have access to multiple cores and/or MPI. ARPACK and SVDPACK both offer Lanczos-based SVD solutions for sparse matrices with ARPACK being well-suited for parallel processing.

Agile Scapegoating and People over Process

Last week James Shore posted an article The Decline and Fall of Agile that generated quite a bit of discussion. He points out many of atrocities and failures in software development and software project management done under guise of “Agile” or “Scrum” are often not true implementations. You have these teams who say they are doing Scrum, but the only things that actually get adopted are sprints and scrums. So many of these failed groups ignore the important and difficult aspects of Scrum like self-organization, shippable product goals, and self-reflection & improvement. As Jason says, they are “having their dessert every night and skipping their vegetables.”

Ken Schwaber replied to Jason’s article:

When Scrum is mis-used, the results are iterative, incrementa​l death marches. This is simply waterfall done more often without any of the risk mitigation techniques or people and engineerin​g practices needed to make Scrum work.

The article also sparked Bob Martin to write an essay “Dirty Rotten ScrumDrels” in response to some of the Scrum scapegoating that has been going on recently. Check out the comments for some Uncle Bob-Shore dialog.

Things like self-reflection and process and work reviews are all practices that people naturally adopt to improve themselves. I think successful developers tend to do this anyways in order to stay ahead of the curve. When you have mediocre teams flailing about who are looking for a silver bullet and turn to things like Scrum who don’t already do this sort of thing, it is easy for them to brush that off as a triviality; if they haven’t already seen the value of it, it just gets lost in the noise.

I share Mr. Shore’s frustration in seeing Agile and Scrum being blamed for the shortcomings of teams and improper implementations, when you’ve seen the real thing work over and over.