I have data on Compact Disks (CDs) from past projects. The technology was getting toward being affordable around 1996. CD writers dropped under $100 for the first time somewhere around there, and media started selling for less than $5 a disk. The amount of storage space on a CD was comparable to the size of hard disks available at the time, and optical storage seemed far better than tape as a medium. So now I have cases, drawers, and spindles of CDs dating right back to 1996.
No storage medium is perfect, so archived data is a commitment and not just a static collection. Last month, Sam asked me what I would like for my birthday. I said I wanted a disk for backing up data. After having a look at off-the-shelf external hard drives, it seemed that all the models I looked at had warranties of 1 year or shorter. However, if you buy an internal hard disk and a separate USB enclosure, the warranty on the drive can be much, much longer. Sam and I visited the Newegg site and picked out a Western Digital 1.5 terabyte drive and a Rosewill USB enclosure. The drive comes with a 5-year warranty. I can pair this with another 1.5 terabyte disk so that I can copy off my data from the CDs, then copy to the second hard disk.
Back when I was about to move from California to Michigan, I had a chat with a fellow who works for the Internet Archive. That is a project whose modest aim is to store the World Wide Web. All of it. You can browse sites as they were in 1995. Well, with a few caveats. My acquaintance said that the Internet Archive’s data storage was based on consumer-grade IDE drives. You can get them cheap and in quantity, and if you store things on multiple disks, the redundancy will help. That’s because disks fail. With an organization like the Internet Archive, they rack up lots of failures. They have to be swapping out bad drives and attempting to restore content from remaining copies on other drives. And they couldn’t, he said, quite keep up with the failures. Some data does get lost because failures occur before the redundancy can be exploited to restore some sites.
I figure for my purposes, the data I have is a copy of what my colleagues have, and for the hard disk copy, I aim to have two of those. I think that should be sufficiently paranoid. The process or workflow takes about six to seven minutes per CD to create a directory, copy the files, and mark the CD as copied. I’m working on the third page out of 32 pages in a CD case now. This will take some effort, but then I invested years of my life getting that data in the first place.