Advertisement
If you have a new account but are having problems posting or verifying your account, please email us on hello@boards.ie for help. Thanks :)
Hello all! Please ensure that you are posting a new thread or question in the appropriate forum. The Feedback forum is overwhelmed with questions that are having to be moved elsewhere. If you need help to verify your account contact hello@boards.ie

Archiving Data

Options
  • 10-06-2020 12:10pm
    #1
    Registered Users Posts: 7,407 ✭✭✭


    I'm in the midst of a project collecting video clips, audio, images and text data for a voluntary organisation I work with. The idea was that we would try and get all our volunteers to give us video from our events that they have stored at home - and the drive to get this had been quite successful. I have thousands of images and about 100hrs of old photo and video now to digitise. That's all fine, over the course of the next year, I will process it all. I'd estimate Ill have close to 1TB of data in the end.

    Anyway, it's got me thinking about the effort it's taken and how good it would be to archive it properly, so that someone in say 50 years will have a copy of the collection, which in turn had let me to thinking about the durability of long term storage. While it's backed up to the cloud at the moment, I don't think that's viable long term due to the ongoing payment necessary for upkeep. SSD looks to be out, HDDs generally fail after 10 years seemingly, rewritable CDs and DVDs rot. Enterprise grade magnetic tape is relatively exotic, not at the consumer level and drives are expensive and the tape is only rated for 30 years. Rewritable Blu-ray seem a good option if you get the Verbatim M-Discs (and reasonable value at 25c/gb), but Blu-ray never really took off so if be worried about future access to drives and readability. I know the one file, 2 media types, three locations mantra for backup, but none of the above seem ideal let alone two of them.

    Format rot is something I've thought of, but not so worried as I've been saving in jpeg and mp4 h.264, common standards.

    I doubt I'm the first person to think of these issues, so i'd be interested in how others have addressed it.


Comments

  • Registered Users Posts: 14,011 ✭✭✭✭Johnboy1951


    1TB is not much to store.
    Have you considered multiple HDDs for multiple storage in different locations?
    You could then replace one storage device every year or two, on an ongoing 'round robin' basis for very little cost.


  • Registered Users Posts: 7,135 ✭✭✭10-10-20


    I tried tackling this with hundreds of VHS, DV-cam and other tape media which I inherited from a deceased relative about 5 years ago.
    At the time h.264 was becoming mainstream and I did some reading about what other standards were being used for such archiving, but there were only conversations around MPEG2 really, and this does not offer the quality or compression that h.264 offers.

    I seem to recall that DV tapes were averaging at about 40MB/sec at native MPEG2 and I had 500hrs of tape (roughly).

    What I realised in my case was that there was a three-way trade-off between an encoding standard, the capacity available to me at zero-cost (2TB) and the quickly depreciating quality of the VHS cassettes which appeared to be losing format/content rapidly. So I made the decision to encode the most vulnerable VHS into h.264 at a low-enough Q (higher quality) which would hopefully be good enough to re-encode at a later point should the need arise.

    I was using mencoder and two-pass encoding from the command-line at the time to perform the compression.
    https://wiki.archlinux.org/index.php/MEncoder
    If I was doing it all again I might use OpenShot or Handbrake for the compression.

    I placed the data on some spinning 2TB disks which are written using ZFS (a linux filesystem with resilience) which I believe will need to be refreshed every number of years. I decided not to commit it to DVD's or BluRay as some of these have degradation capabilities and I just did not have confidence in using that media-type at the time. Tape is out, I've seen enough tape-standard-churn and hardware error-prone drives to know that it gets silly unless you are at the highest end of the tape tier.

    I'll need to refresh my data soon. I'll spin it all up on a system and check for errors. If I can I'll source larger disks and copy the data across so that I have more than two copies.

    The ideal endpoint would be that I upload them to a service, such as a historical archiving project for curating, but in the meantime I intend on keeping copies locally as best I can.

    Other opinions are welcome!


  • Registered Users Posts: 7,407 ✭✭✭MrMusician18


    1TB is not much to store.
    Have you considered multiple HDDs for multiple storage in different locations?
    You could then replace one storage device every year or two, on an ongoing 'round robin' basis for very little cost.

    Really in the grand scheme of things it isn't, but its larger than most of the older tape formats and significantly larger than DVD's and BluRays. I'd prefer to stay away from hard drives as it requires ongoing maintenance. I know that once this project is finished, the "master" copies and unedited footage won't be looked at for years and it will be left in the back of a dry dusty cupboard until some other person comes calling for archives. While i'll be involved for the next few years at least, if I move away I know the process will not be done - indeed I could even forget about it.
    10-10-20 wrote: »
    I tried tackling this with hundreds of VHS, DV-cam and other tape media which I inherited from a deceased relative about 5 years ago.
    At the time h.264 was becoming mainstream and I did some reading about what other standards were being used for such archiving, but there were only conversations around MPEG2 really, and this does not offer the quality or compression that h.264 offers.

    I seem to recall that DV tapes were averaging at about 40MB/sec at native MPEG2 and I had 500hrs of tape (roughly).

    What I realised in my case was that there was a three-way trade-off between an encoding standard, the capacity available to me at zero-cost (2TB) and the quickly depreciating quality of the VHS cassettes which appeared to be losing format/content rapidly. So I made the decision to encode the most vulnerable VHS into h.264 at a low-enough Q (higher quality) which would hopefully be good enough to re-encode at a later point should the need arise.

    I was using mencoder and two-pass encoding from the command-line at the time to perform the compression.
    https://wiki.archlinux.org/index.php/MEncoder
    If I was doing it all again I might use OpenShot or Handbrake for the compression.

    I placed the data on some spinning 2TB disks which are written using ZFS (a linux filesystem with resilience) which I believe will need to be refreshed every number of years. I decided not to commit it to DVD's or BluRay as some of these have degradation capabilities and I just did not have confidence in using that media-type at the time. Tape is out, I've seen enough tape-standard-churn and hardware error-prone drives to know that it gets silly unless you are at the highest end of the tape tier.

    I'll need to refresh my data soon. I'll spin it all up on a system and check for errors. If I can I'll source larger disks and copy the data across so that I have more than two copies.

    The ideal endpoint would be that I upload them to a service, such as a historical archiving project for curating, but in the meantime I intend on keeping copies locally as best I can.

    Other opinions are welcome!

    Like yourself, I've encoded with h264 so far to preserve quality. I won't have as much video as yourself - not unless a new cache of videos comes my way (which is possible) I wouldn't be expecting to go over 1TB.

    Like I mentioned above, I'm reluctant to use hard disks due to the maintenance involved - I just know the bi-annual refreshes will be forgotten about, not least by myself. The format is at least accessible though. You make a fair point about tape though, we are now at LTO8 and I can see that the older drives are now hard to come by. I have been doing more and more research on optical media and Verbatim have a product called M-Disc which looks interesting. Rather than using photoreactive dies, uses discs coated with metal via a sputtering process. The data is literally engraved into the disk, and they claim will be stable for several hundred years. It does need a high powered M-Disc capable burner to write them though. Some BluRay-R use a similar process in their manufacture - you need to look for BD-R HTL (High to Low). Panasonic do them.

    So looks like my approach will be 2 or 3 copies of Panasonic BD-R HTL and 2 copies of hard drives that will be spun and refreshed every 3-5 years.


  • Registered Users Posts: 14,011 ✭✭✭✭Johnboy1951


    If you go the M-disc route you had best ensure you store a suitable player with the discs.
    These are not main stream and it is likely they will disappear in the mists of time eventually.

    Heck to even find a VHS cassette player these days is a chore!


  • Registered Users Posts: 36,167 ✭✭✭✭ED E


    1TB is not huge, any chance one of the educational institutions or national libraries etc could take on a copy? Putting it on a SAN would be fairly fault tolerant.

    If Im using the calculator right then Amazons Glacier is $49 per year for 1TB. Yes its not necessarily good for decades but if you were to use it until 2030 there might be a better long term archival format around then. None of the upfront cost of buying 3x LTO drives and associated tapes.


  • Advertisement
  • Closed Accounts Posts: 22,648 ✭✭✭✭beauf


    Multiple copies and test them periodically.

    No point having a backup if you never test it.


Advertisement