I. Introduction - The Deliverables
According to Shannon Roach these will be:
Since for whatever reason I decided to take this rather charming picture before Jeff gave me the audio tour, I thought I'd include it here just for fun. For those of you who do not know, Jeff is on the left and Nelson, one of the Vera audio staff, is on the right.
Main Folder: Vera Backup A-M
>Sub Folder: name of band, date, initials
Main Folder: Vera Backup N-Z
>Sub Folder: name of band, date, initials
3.) Vera 1, 2, 3 recording drives
Protools session files organized like so:
-One folder for a whole show named by headlining band, date, initials of who recorded it
This folder includes files for the opening bands as well
-Eventually all bands will have their own separate folder
-The recording drives employ a system of highlighting
C. What Jeff wants to see in the future:
- Record all DATs to their backup hard drives
- 3-4 backups
- Copies of records and CDs for bands who recorded at Vera
D. Current Issues:
1. Toast: the standard for CD and DVD burning in the Audio engineering community
2. Issue of cross platform compatibility for tagging and metadata
A. Storage
1. File naming conventions:
According to the Colorado Digital Program (CDP):
2. Hardware:
Hard Drives and particularly RAIDs (Redundant Array of Independent Disks) seem to be the general consensus for Audio storage.
The CDP defines a RAID as:
ARSC Technical Committee on RAIDs:
Beyond the consensus on RAIDs as the best way to store digital audio, there is also a consensus that audio be stored in multiple locations and not on a single array.
The ARSC Technical Committee suggests:
The CDP says:
The ARSC Technical Committee has the following to say about CDs and DVDs:
While in general both WAV and AIF formats are widely used there appears to be the emergence of a new format the BWF which is exactly the same as the WAV format, except that it captures metadata.
The CDP says the following regarding WAV and AIF:
Regarding Protools files, Gary Louie said the following:
Below is a diagram from the CDP regarding Sample Rate and Bit Depth Standards:
According to Shannon Roach these will be:
- Assess how much content there is, and how quickly the collection is likely to grow
- Figure out the best storage for audio archives (naming conventions, file format, where to store them)
- Figure out how to make these audio archives searchable
- Document all of the above & explain it to Vera staff
Since for whatever reason I decided to take this rather charming picture before Jeff gave me the audio tour, I thought I'd include it here just for fun. For those of you who do not know, Jeff is on the left and Nelson, one of the Vera audio staff, is on the right.
File formats used: WAV, AIF, mp3, ptf (protools file), and digital audio tapes (DAT)
A. The Hard Drives:
1.) 300GB HD - Live Shows & Mixdowns
Main folder: Live Show Mixdowns
>Sub folder: Name of band, date, initials of who recorded it
>Sub Folder: Live shows mp3s
>Sub Folder: 2007 live shows
2.) 500GB HD – Two partitions: mostly show back-ups
A. The Hard Drives:
1.) 300GB HD - Live Shows & Mixdowns
Main folder: Live Show Mixdowns
>Sub folder: Name of band, date, initials of who recorded it
- File name: whole show, date
- File name: individual song,
>Sub Folder: Live shows mp3s
>Sub Folder: 2007 live shows
2.) 500GB HD – Two partitions: mostly show back-ups
Main Folder: Vera Backup A-M
>Sub Folder: name of band, date, initials
Main Folder: Vera Backup N-Z
>Sub Folder: name of band, date, initials
3.) Vera 1, 2, 3 recording drives
Protools session files organized like so:
-One folder for a whole show named by headlining band, date, initials of who recorded it
This folder includes files for the opening bands as well
-Eventually all bands will have their own separate folder
-The recording drives employ a system of highlighting
- Green: indicates a folder whose contents have been mixed
- Orange: indicates a folder that is ready to be backed up
- Red: indicates a folder that has been backed up
- For live shows and recording sessions, they record to a recording drive
- These files are then mixed
- Then they are backed up onto a DVD (the protools files)
- Then the protools files are transfered from the recording drive to a Back-Up Drive
C. What Jeff wants to see in the future:
- Record all DATs to their backup hard drives
- 3-4 backups
- DVD backup for all protools files,
- One or two terabyte hard drives connected to the server
- I think he may have said he also wanted drives that are not connected to the server
- Hard Drive + CDs for audio backup this would be for mp3s, podcasting etc...
- Copies of records and CDs for bands who recorded at Vera
D. Current Issues:
1. Toast: the standard for CD and DVD burning in the Audio engineering community
- Jeff is concerned with future compatibility because there is a new version of Toast and it has recently changed the way one can access a DVD or CD Older CDs and DVDs burned using the old Version of toast cannot be viewed with the new version of Toast.
- I also think he said the new version of toast requires that you have toast to view the files I could be wrong.
- Jeff wants something for DVD burning that will be more future compatible Jeff also mentioned an issue with toast wanting to automatically convert wav files to aif and how there is a cross platform (PC vs. Mac) issue with this.
2. Issue of cross platform compatibility for tagging and metadata
- The audio folks use a mac. Mac has a different layout when you right click a file and check its properties. On a mac there is this thing called "Spotlight comments" where you can type in anything you wish about a file and it is then easily searchable.
- My personal experience with this is that it is not cross compatible. I have an external hard drive that i use to store all my internship files and personal files. I own a mac and use the spotlight comments especially for my audio recordings. When connecting to a PC and right clicking the file and choosing "properties," the comments do not show up. Not only that - and if i remember correctly - when I returned to my Mac and viewed the file, the spotlight comments had been erased. This is a major problem if we go the windows explorer tagging and metadata route.
A. Storage
1. File naming conventions:
According to the Colorado Digital Program (CDP):
"Systematic file naming is important for system compatibility, interoperability, and to demonstrate ownership of the digital asset. File naming conventions specific to an institution may be used. These might include protocols that include institutional acronyms, collection identifiers, or part designators, among others. In general it is recommended that the characters in the file names be alpha-numeric, lowercase, and not utilize spaces, tabs, commas, periods or any other characters reserved for computer system use, such as slashes, asterisks, or question marks."Seems pretty obvious enough to me.
2. Hardware:
Hard Drives and particularly RAIDs (Redundant Array of Independent Disks) seem to be the general consensus for Audio storage.
The CDP defines a RAID as:
"A self-contained collection of disk drives that act as a single large hard drive storage system. These configurations are designed to enable a system to operate when an individual drive within the array fails, thus minimizing and ideally eliminating the potential for the loss of data. There are ten types of RAID configurations. In general, the better choices are RAID 5, 6 or 10."Gary Louie, University of WA Music School archivist has the following to say about RAIDs:
- "Hard drives are the only practical method now to store the size and amount of files we have. Hard drives are also poor for reliability, so any HD storage has to be RAID of some sort, and must be duplicated in at least one other physical location."
- "Some things we are looking at for local, small scale archive "starters" are small RAID hard drive towers with removable sleds, duplicated onto similar arrays via gigabit ethernet in another building."
- "You can certainly consider a really small scale HD storage method that will still be helpful to you on all counts at low cost. An external HD holder (like the Newertech Voyager) and a handful of bare HDs, with constant duplicates and stored at different locations. Could be in conjunction with a small server system."
ARSC Technical Committee on RAIDs:
"RAID arrays remove some of the disadvantages of individual hard drives while they provide larger capacities than individual drives. They are extremely easy to use, especially when supplied as network-attached storage, but still should be considered mechanical devices that may fail at some point in time. A single RAID array is not adequate for long-term storage."For a more in-depth discussion of various RAID configurations go here.
Beyond the consensus on RAIDs as the best way to store digital audio, there is also a consensus that audio be stored in multiple locations and not on a single array.
The ARSC Technical Committee suggests:
"Generate local backup copies of all files as soon as possible after creationThe CDP suggests the same:
Store backups on a separate device or separate media from the original.
Make at least two, or preferably three sets of archival master digital files of preserved content and store them in different locations, and possibly on different types of media."
"It is highly recommended that organizations utilize a strategy employing multiple redundant copies of digital files in separate locations, as a fail-safe strategy for the failure or destruction of the digital media. Acceptable media may be optical disks, hard drives, or magnetic data tapes – all of which have particular strengths and weaknesses."Depending on the needs and resources of an organization offsite storage is a potential option. This is where I believe our connection to UW could come in very handy.
The CDP says:
"Projects considering outsourcing should consider the costs of ingestion and ongoing maintenance fees; metadata requirements for outsourced files; workflow requirements; and accessibility requirements."Vera's consulting "Technical Generalist" Darren White says:
"In terms of hard disk storage, VERA should be looking at a NAS/SAN on the VERA internal network, preferably with offsite backup. That way, if disaster strikes..."Other options such as DVDs and CDs are not recommended.
The ARSC Technical Committee has the following to say about CDs and DVDs:
- "Optical discs [are an] unsafe preservation storage media. Using this format requires costly test equipment to check the quality of both blank media and digital recordings. Even if this equipment is already available to an archive, the cost of storage media may be higher than the other formats, except in very small volume – individual CDs or DVDs are not expensive, but the cost-per-megabyte is substantially greater than other media types."
- "The format cannot hold files at higher bit depth and sample rate than 16 bit, 44.1 kHz, which is far less than the de facto 24 bit, 96 kHz standard. In addition, migrating an archive stored on optical discs is expensive, because it requires much human intervention."
What the CDP says about CDs:
Gary Louie, University of WA Music School archivist, says:
- "Storage and retrieval costs will escalate as CD collections become larger and more challenging to manage."
- "Limited physical life span (3 to 20 years) and the files stored on them are vulnerable due to physical deterioration, mishandling, improper storage and obsolescence."
- "Obsolete equipment for reading the media poses a threat in the long term."
- "Adhesive labels and permanent ink markers can cause early failure of CDs through chemical interaction with the CD’s recording layer."
Gary Louie, University of WA Music School archivist, says:
"Don't trust DVDs. No one thinks they will last very long. Same for CDs and tapes. Generally, the data written to them is also not very clean, especially audio CDs."3. Data Migration and Hardware Updating
Current hardware and software seem to go obsolete every 3 to 4 years. It is therefore extremely important to implement a Migration schedule.
The CDP Suggests:
The CDP Suggests:
- "Migrate your files frequently - at the very least every five years. Failure to incur these small incremental costs may lead to very large costs down the road when you need to migrate froman obsolete data format or medium."
- "Every sustainable digitization project should include the costs of data migration as a yearly budget line item."
- "At the time of migration, file integrity checks should be conducted... a “checksum” may be associated with a file when it is produced so that the checksum value may then be used to confirm the integrity of the file at the time of migration."
The subject of "checksum" makes for a great point of transition into the subject of quality control.
4. Quality Control
The CDP Recommends the following for quality control:
5. File Format, Sample Rate, and Bit Depth Standards4. Quality Control
The CDP Recommends the following for quality control:
- All digitized audio files should be sampled for sound quality.
- Technicians charged with quality control should listen for consistency in the audio quality at a number of points in the recording, listening for distortions in the sound, for proper playback speeds, and for artifacts such as hiss and hum.
- Technicians should specifically check that the volume levels are set correctly.
- Recordings should also be checked for completeness.
- Any distortions in the sound or other inconsistencies in the recordings should be noted in the metadata.
- Metadata should also be checked for accuracy and completeness, with special attention paid to accuracy of file names, which can lead to the effective loss of files when recorded inaccurately.
- In addition, file integrity checks may be conducted. A “checksum” may be associated with a file when it is produced so that the checksum value may then be used to confirm the integrity of the file at each step from creation to migration.
"A checksum is an algorithm-based method of determining the integrity and authenticity of a digital data object, such as a digital audio file, used to check whether errors or alterations have occurred during the transmission or storage of a data object. Please note, however, that a checksum will only tell you there is an error. It cannot tell you what the error is, nor can it correct the error."
"It is important to understand that quality control of audio can be a very subjective process. It is recommended that technicians performing this process be selected on the basis of their familiarity with a variety of types of recorded sound, and preferably have experience in a field that has already helped train their ears. Staff with previous audio experience such as audio engineers, are a natural selection, as are staff with formal musical training."
While in general both WAV and AIF formats are widely used there appears to be the emergence of a new format the BWF which is exactly the same as the WAV format, except that it captures metadata.
The CDP says the following regarding WAV and AIF:
"The WAV file type was developed by Microsoft, is in widespread use, and is readable by virtually all audio software programs. The AIF file type was developed by Apple Computer and is also in widespread use. Both of these file types are uncompressed and acceptable for long-term file storage. The WAV file type has become a standard and is recommended. In addition, the WAV file type is also available in a professional flavor, broadcast WAV (BWF), which has the capability to store metadata in the file header. Although not all audio software programs are currently capable of reading or writing to the metadata header, the BWF format is emerging as the WAV file type of preference for archival audio projects."For the sake of avoiding redundancy, I will simply say that all other sources considered thus far, are generally on the same page regarding WAV and AIF. However, since Vera also uses protools files (ptf) there are other issues to consider.
Regarding Protools files, Gary Louie said the following:
"If you are thinking long-term (10-50-100 years) who knows if anyone will be able to read ProTools files at all. The international recommendation for audio storage is the Broadcast WAV file, which holds the audio plus metadata for catalog and technical info. But that doesn't help much with PT sessions. We (i.e. University of Washington) don't do any PT work, so you should probably ask elsewhere (maybe the ProSoundWeb or Digidesign's forums) on how people archive their PT material."I have yet to research archival standards for protools files, if there are any. When I find the answer I will update this post. However, all things considered, the new BWF seems to be the optimal choice for Vera's needs.
Below is a diagram from the CDP regarding Sample Rate and Bit Depth Standards:
6. Metadata Issues and Standards
As established above in my conversation with Jeffery, retention of metadata when working across platforms (Mac to PC) is a major issue. Depending on how thorough BWF is and if it is cross-compatible, there are other options to consider.
Darren White suggested the following:
For maximum portability of metadata, I would recommend that it be stored externally to the file itself. Even with, say, MP3 files and ID3 tags, cross- platform compatibility issues remain, even after all these years. So WiMP and iTunes might make entirely different sense/nonsense out of the same ID3 tags. I also use a Mac primarily and I think if we are talking Protools and AIFF files, Mac capabilities may be sufficient for basic metadata but to allow for expanding requirements, something like a CMS and database or even just an external XML file may make more sense in the long run.The Arts and Humanities Data Service Perservation Handbook for Digital Audio suggests:
"Ensure that associated metadata integrated into the file itself can be extracted in the software tool and stored in the preservation format. If not, the information should be manually output to an ASCII text file."ASCII or XML might be the way to bridge the gap between Mac and PC. If we cannot attach metadata and tags directly to the audio files we can create a text file with all the pertinent information about each file. Someone can either create one text file to match each audio file with an identical file name or just one text file for a group of audio files within one folder. I lean a bit more towards individual files.
Sound Directions, a joint project between the University of Indiana and Harvard College has an excellent resource on metadata standards, which is much to large to duplicate on this blog, so I have made it available to you here.
The complete publication is available here.
VI. Online Sources
1.) ARSC Technical Committee on Preservation of Archival Sound Recordings
1.) ARSC Technical Committee on Preservation of Archival Sound Recordings
http://www.arsc-audio.org/pdf/ARSCTC_preservation.pdf
2.) Arts and Humanities Data Service Perservation Handbook for Digital
Audio http://www.ahds.ac.uk/preservation/audio-preservation-handbook.pdf
3.) Colorado Digital Program Audio Best Practices
http://www.bcr.org/dps/cdp/best/digital-audio-bp.pdf
This is great, Tom! When I read about your Spotlight problem, I was going to suggest sticking a text file next to each audio file, but you beat me to it in the end of your blog post. I really, really recommend going with plaintext. XML is great, but even XML can get a bit too technical for the uninitiated ("how do I open this?"), whereas a text file is pretty self-explanatory.
ReplyDeleteI do highly recommend that you find an off-site storage solution. Also, hard drives are actually somewhat resilient. Constant USE wears down a hard drive very quickly, but if you stick a USB hard drive in a closet and don't plug it in for a decade, it should be just as usable when you pull it out. Plus, with USB having complete backwards compatibility as it evolves, it should still be readable, at least in a 10- or 15-year horizon.
Still, as you've noted here, redundancy is always key when it comes to the preservation of digital data, and no archival solution is complete with multiple copies in multiple sites.
Hey, thanks for the comment! I only just now realized that you posted this. For some reason I don't get notifications when someone posts a comment. I bet there is a setting to change that...
ReplyDelete