Thursday, August 7, 2008

Object Lesson in Single Points of Failure

So, I bought my wife a new iPhone. Before we attached it to her computer, I decided to update all the software, and (of course) the machine choked and died. I tried everything I could think of, but I couldn't get it back up and running. We took it to the Genius Bar, and the genius in question tried everything he could think of. No luck. According to our tests, the hard drive was fine, but Leopard would not install properly.

I decided to re-partition and reformat the hard drive. But, before doing that, I wanted to make sure the user drive was backed up. I took the 500 G disk we were using for Time Machine backups and attached it directly to my wife's machine, and copied over her user directory. Then I partitioned, reformatted the disk and reloaded Leopard. Everything went well.

Until...

My lovely wife walked by, accidentally snagged her hand on the USB cable for the hard drive, and knocked it to the floor. Now it only fell a few feet onto soft carpet, but when I plugged it back in, it refused to mount and only made a strange clicking sound.

All our backups were on that drive. The time machine backups. The new backups I'd just made. Everything. All of it gone. Poof.

Now, I have secondary backups for my most important files. But, alas, my wife did not.

It's easy to feel over-confident when you have backups. But backups can fail. Whenever you only have a single copy of your data, even if it's only for a split second, you're at risk.

Lesson learned. It's not a new lesson, but one I seem to need to be reminded of from time to time. Single points of failure are bad.

-Rich-

Monday, August 4, 2008

Thoughts on Matlab

I've been using Matlab a lot at work lately.

Now, by all rights, this should be a language that I love. It's a dynamic, highly expressive language.

However, something about it just sets my teeth on edge.

To be fair, I think I can divide my complaints into two groups: issues that are my fault and issues that are the languages fault.

Matlab treats everything as matrices. OK, that's a bit of an exaggeration, but not much of one. More to the point, Matlab works best when you are performing operations on an entire vector or matrix at once--rather than iterating over the data and performing the operation on each element individually.

A lot of the built in functions and operations are designed to work across entire matrices. In fact, these functions often operate equally well on scalar values or matrices. For example, X < Y could be two scalar values (in which case, it will return 0 for false or 1 for true), or it could be two, equal-sized matrices (in which case, it will return a matrix of 0s and 1s).

Personally, I think this obfuscates the code. It's often difficult to tell wether we're looking at scalar, vector or matrix operations. Still, I'm willing to accept this as my own personal issue. Indeed, it is part of a larger weakness on my part. Basically, I have trouble decomposing problems into matrix operations.

OK, it's easy to do on simple cases. But, lets say I'm building a neural network. I'm storing my weights in matrices. Now, I want to minimize the amount of iterating that I'm doing--but I often have trouble seeing the opportunities to parallelize my operations. It can be done. I've built my neural network code, and I've spent a considerable amount of time replacing iterations with matrix operations. But, I don't find it a natural-feeling way to code.

One of my co-workers has commented that I write Matlab code like I'm writing Java. I think that statement shows, not only my lack of Matlab skills, but her weaknesses in Java. I can tell you without hesitation, my Matlab code is nothing like my Java code. But, the underlying criticism still stands. I am often fighting against the language, not working with it.

I see a lot of people doing this with languages I love, and I get incredibly frustrated when they then unfairly criticize those languages. So, I'll accept the blame here, and try to do better in the future.

I do think there are some real issues, however. First off, the language is often inconsistent. For example, in X < Y, the X and Y could be either scalar or matrix values. However, X && Y must be scalars. If you want to do logical operations on matrices, you must use and(X, Y). To me, this makes no sense. Why should logical operations be different than comparisons?

Also, the environment seems a bit buggy. For example, with a single-processor machine, it is incredibly easy to put your code into an infinite loop that locks up your computer. On a dual-core, this is less of a problem, since I can ctrl-c my way to freedom, but on a single core, Matlab grabs control and won't let go.

Also, the IDE doesn't have many of the features we've come to expect from a modern development environment. There's a taste of debugging and profiling, but they are not as useful as other environments. The IDE lacks any real refactoring tools, and I haven't found any tools for running unit tests.

Finally, I don't like the way it organizes the code. Basically, each function must be in its own file. Yes, you can include multiple private, helper functions within a file, but they cannot be called from the outside. Also, I often want to test my helper functions, so I need to place them in their own file anyway, at least during development.

This really limits my ability to keep my code base organized. In other languages, I can have files of related functions, and folders of related files. Matlab removes one entire dimension. Yes, I can still group similar functions into a hierarchy of folders--and put those folders in other folders, and so on and so forth. Then I need to remember to add the entire tree to my path. It just feels really clunky to me.

So, the take-home message is this, Matlab is a great language for doing mathematical exploration of ideas and building quick prototypes, but it lacks the software engineering tools needed to build robust, large-scale projects.

-Rich-