Today's columnist is Christopher Sean Morrison from BRL-CAD. He writes:
"Who we are in the present includes who we were in the past."
For a variety of reasons, the start of a new year is a perfect time for "digital cleaning." I always spend a little time making sure everything in my home directory is properly filed away, categories all make sense, duplicates are weeded out, and everything is fully backed up. The process is similar to the Getting Things Done task management approach, with a notable exception. Since it's all digital, the storage costs are so close to zero that you really don't ever have to purge! You can be a guilt-free digital pack rat as long as it all gets properly organized into your archive.
I have a copy of practically every piece of electronic data I've ever worked on. Everything. Included in my massive data archive is everything from the obvious "keepers" such as source code repositories, family pictures, and OSBR articles, to the exceedingly mundane and trivial: emails, old university assignments, notes on my favorite scotch whiskies, log files, and more. If you've ever sent me an email, chatted with me over IRC, or observed my habit of taking pictures of fire hydrants while on travel, you can rest assured that particular piece of data is filed away in the archive.
Why? I love information. The obsessive data historian in me is fond of the information that I've created, written, and modified over the decades as that work and experience brought me to where I am today. It's a treasure trove with gems I can reflect upon, (re)learn from, and build on. The obsessive digital pack rat in me defends the archive as a vault of exceptionally useful information worth preserving because it's often and unexpectedly useful later on. Even with well over a million files, I can quickly find what I'm looking for when I need it. Neither my obsessive data historian or pack rat mentality, however, would be effective if the archive was a burden to maintain or if the data wasn't organized.
How does one keep everything they've ever done on a computer organized? Well that's the same as asking how does one eat an elephant. The smug answer is "one bite at a time." Pun intended. Human-user interface expert Jef Raskin of Macintosh fame and author of The Humane Interface said, "An interface is humane if it is responsive to human needs and considerate of human frailties." Start small and build up infrastructure as you need it. Don't start with a complex web-based content management system tied to an SQL database. Adopt a simple organization scheme that fits your needs. That brings me to my ABCs of data archive management: Attics, Basements, and Cupboards. As your data-organization demands grow, progress from the attic stage, to the basement stage, to the cupboard stage to meet your growing needs.
The Attic: The first simple step you can take towards organizing your personal or business data is to create an attic. Software developers that have used the CVS version control system know the concept well as 'Attic' was the graveyard files went to when they were deleted. An attic is a place you really don't go to very often. It's not pretty, rarely smells good, and will likely have "unwanted visitors" that will turn your data into useless shredded bedding if left unattended for very long. But it's a start.
A data attic can be as simple as a directory on your filesystem where you toss files as a backup every now and then. For businesses, it's a shared directory on a network file system. The key organizational trait you've introduced is that: there is one, and only one, place for everything. The maintenance overhead is exceptionally low because you're not spending time organizing your data, but if you had to find something, you at least know where to look. The downside, of course, is that you're not really organized yet and it can be practically impossible to find anything. If you're one of those people that has hundreds of desktop icons, you know what I'm talking about.
The Basement: As the data in your attic grows, so should your organizational strategy. For those who live in a part of the world that isn't familiar with subterranean accommodations, it's generally a cold simple space albeit much larger than an attic, often used for storage, but usually haphazardly organized.
A basement is where files get grouped together. You need to organize now that your files are no longer "in plain sight." It doesn't have to be pretty or efficient, but it should roughly categorize common types of information into at least 1/10th chunks. How you categorize will depend heavily on the data, but start vague and refine as needed. Documents, Pictures, Projects, etc.
Even though you not putting in a lot of organizational effort, you have to put in some effort (and your time is priceless) so it's time to think about the other B word: Backups. There really should be backups during the attic stage, but the reality is that most naively or ignorantly don't. By the time you start investing time and effort into your system, though, you definitely should be thinking at least about catastrophic data recovery. An effective simple backup strategy I used for many years was simply an automatic sync of my archive onto another hard drive nightly. That drive was replaced with a bigger drive as the size of my archive grew and the old one was stored off-site in case the house burnt down. That (along with a RAID 5 filesystem) was more than enough infrastructure to save me from any minor fat-fingering mistakes and hard drive failures. The level of effort required to restore data was roughly proportional to the level of disaster.
The Cupboards: Once your data is really big, you're invested. Your storage system is to the point where it's used frequently, maybe even many times throughout the day. From an efficiency standpoint, this is where you want to get to because it's where your data becomes the most convenient and easy to access. It should be a fully organized and easily perused storage system that you and others could work with on a daily basis. Your setup should be pleasant and efficient to work with with. There should be only as much complexity as is called for. It takes more regular maintenance to keep cupboards organized and more effort to put things away, but they are the most humane organization.
Keeping business data organized at this level can be a very challenging given the rate at which most organizations generate and process data, so you may need to collaborate with your workforce on establishing something that works for them. Don't just dictate a usage policy.
It takes a lot of time and effort to preserve digital knowledge after a job is over, whether personal or professional. It's overhead and nobody else is going to pay for it so it has to become common culture. Avoid wasting time reinventing, relearning, and rediscovering. Philosopher George Santayana said eloquently that, "those who cannot remember the past are condemned to repeat it." You will reap rewards on your organizational investment in the long term.
A few tips to leave you with:
- Don't store stuff you didn't touch. I too have been tempted to download an entire human genome dataset just for the sake of having it, but data doesn't belong in an archive until you actually do something with it.
- Separate out other people's stuff that you do touch. That way, if you ever need to ditch evidence in a pinch, it's all in one place. I got sophisticated and use a directory named "notmine.”
- Make a complete backup at least once a year. Portable USB drives make for great off-site storage.
- Have a solid search mechanism. If you can't find your data quickly, the archive will turn into a graveyard. Be adept at find, grep, and awk. Leverage Spotlight, use virtual folders, set up a file index - whatever works quickly for you.
- Clear your home directory once a year. Make the archive your central file store. Put everything away at least once a year. Take out only what you're actively working on. It helps to purge your email inbox at the same time.
- Use human-readable directory names. If might have saved you all of 4.2 seconds when you created "oldfmpxs" to store your old family pictures, but you shouldn't have to perform a mental somersault five years later trying to remember what you were thinking.
- Share your organizational and backup setup with others. Be proud of your data collection. Help others preserve their digital possessions too.