Refreshing Data Storage

I have data on Compact Disks (CDs) from past projects. The technology was getting toward being affordable around 1996. CD writers dropped under $100 for the first time somewhere around there, and media started selling for less than $5 a disk. The amount of storage space on a CD was comparable to the size of hard disks available at the time, and optical storage seemed far better than tape as a medium. So now I have cases, drawers, and spindles of CDs dating right back to 1996.

No storage medium is perfect, so archived data is a commitment and not just a static collection. Last month, Sam asked me what I would like for my birthday. I said I wanted a disk for backing up data. After having a look at off-the-shelf external hard drives, it seemed that all the models I looked at had warranties of 1 year or shorter. However, if you buy an internal hard disk and a separate USB enclosure, the warranty on the drive can be much, much longer. Sam and I visited the Newegg site and picked out a Western Digital 1.5 terabyte drive and a Rosewill USB enclosure. The drive comes with a 5-year warranty. I can pair this with another 1.5 terabyte disk so that I can copy off my data from the CDs, then copy to the second hard disk.

Back when I was about to move from California to Michigan, I had a chat with a fellow who works for the Internet Archive. That is a project whose modest aim is to store the World Wide Web. All of it. You can browse sites as they were in 1995. Well, with a few caveats. My acquaintance said that the Internet Archive’s data storage was based on consumer-grade IDE drives. You can get them cheap and in quantity, and if you store things on multiple disks, the redundancy will help. That’s because disks fail. With an organization like the Internet Archive, they rack up lots of failures. They have to be swapping out bad drives and attempting to restore content from remaining copies on other drives. And they couldn’t, he said, quite keep up with the failures. Some data does get lost because failures occur before the redundancy can be exploited to restore some sites.

I figure for my purposes, the data I have is a copy of what my colleagues have, and for the hard disk copy, I aim to have two of those. I think that should be sufficiently paranoid. The process or workflow takes about six to seven minutes per CD to create a directory, copy the files, and mark the CD as copied. I’m working on the third page out of 32 pages in a CD case now. This will take some effort, but then I invested years of my life getting that data in the first place.

Wesley R. Elsberry

Falconer. Interdisciplinary researcher: biology and computer science. Data scientist in real estate and econometrics. Blogger. Speaker. Photographer. Husband. Christian. Activist.

2 thoughts on “Refreshing Data Storage

  • 2010/02/07 at 12:18 am
    Permalink

    Any problems reading old cd’s? Were they burned with write-verification?
    [Our X-ray data are backed up on CD’s going back to 1998, but a number did not burn properly despite no error messages.]

    So you are keeping two hard disk copies (not RAIDEed?) in separate locations?

    Did you consider Blu-ray?
    ————
    Do you know anything about backing up pictures?
    The slide scanners don’t require removal from holder, but wouldn’t that be better? [I have a bunch of personal (usually out-of-focus/movement blurring) I’d like to back up sometime.]

  • 2010/02/07 at 8:13 am
    Permalink

    So far, I haven’t had an issue with the data CDs as far back as 1999. I have had some difficulty with a DVD written about the same time. All the CDs were verified for read-back at the time.

    Yes, two hard disk copies that are not a RAID will be kept. Separate locations are a good thing. I have had a RAID where the operating system had a glitch, and it instantly took out data we had on both copies. So my ideal setup for internal storage would be a mirrored two-disk RAID and a third, non-mirrored disk that gets synchronized daily.

    About Blu-ray: Last I checked, the cost per unit was still high, and burners were still multi-hundred dollar investment. Going with hard disk also means that the data is readily accessible for analysis, and that is a big plus on that account. Blu-ray would have also meant a significant amount of time spent re-mastering for the new layout per disk, and involved temporary hard disk storage of all the data anyway.

    Unless you use some sort of glass platen for scanning film, I don’t see a big difference in flatness between a good slide mount and the 35mm film holders that come with most slide scanners. If you have a warped cardboard slide mount, remount that slide in a Pakon mount. I think you can still get that. I got an Epson V500 scanner a while back that does a nice job. You can certainly do better if you are willing to spend $1000 or more instead of $170, but the Epson allows me to be actually scanning my aging collection of media rather than thinking that I will get to it someday.

Comments are closed.