Monday, January 4, 2010

Vera Research Progress Part 4 - Audio

For those of us most concerned with Standards and Best Practices for Audio Preservation, here is the sum-total of my research organized and collated for your viewing pleasure. For your information, anytime I interject in the course of this post, I will use italics.















I. Introduction - The Deliverables

According to Shannon Roach these will be:
  1. Assess how much content there is, and how quickly the collection is likely to grow
  2. Figure out the best storage for audio archives (naming conventions, file format, where to store them)
  3. Figure out how to make these audio archives searchable
  4. Document all of the above & explain it to Vera staff
II. Present Arrangement of Vera Audio Materials - A Conversation with Jeff McNulty

Since for whatever reason I decided to take this rather charming picture before Jeff gave me the audio tour, I thought I'd include it here just for fun. For those of you who do not know, Jeff is on the left and Nelson, one of the Vera audio staff, is on the right.









File formats used: WAV, AIF, mp3, ptf (protools file), and digital audio tapes (DAT)

A. The Hard Drives:

1.) 300GB HD - Live Shows & Mixdowns

Main folder: Live Show Mixdowns

>Sub folder: Name of band, date, initials of who recorded it
  • File name: whole show, date
  • File name: individual song,
>Sub folder: Rough mixes (day of show)
>Sub Folder: Live shows mp3s
>Sub Folder: 2007 live shows

2.) 500GB HD – Two partitions: mostly show back-ups

Main Folder: Vera Backup A-M
>Sub Folder: name of band, date, initials

Main Folder: Vera Backup N-Z
>Sub Folder: name of band, date, initials

3.) Vera 1, 2, 3 recording drives
Protools session files organized like so:

-One folder for a whole show named by headlining band, date, initials of who recorded it
This folder includes files for the opening bands as well

-Eventually all bands will have their own separate folder

-The recording drives employ a system of highlighting
  • Green: indicates a folder whose contents have been mixed
  • Orange: indicates a folder that is ready to be backed up
  • Red: indicates a folder that has been backed up
B. Basic Workflow:
  1. For live shows and recording sessions, they record to a recording drive
  2. These files are then mixed
  3. Then they are backed up onto a DVD (the protools files)
  4. Then the protools files are transfered from the recording drive to a Back-Up Drive
They like to keep the recording drives as empty as possible - nothing stays in there permanently

C. What Jeff wants to see in the future:

- Record all DATs to their backup hard drives

- 3-4 backups
  1. DVD backup for all protools files,
  2. One or two terabyte hard drives connected to the server
  3. I think he may have said he also wanted drives that are not connected to the server
  4. Hard Drive + CDs for audio backup this would be for mp3s, podcasting etc...
- All dvd and cd backups organized and cataloged
- Copies of records and CDs for bands who recorded at Vera

D. Current Issues:

1. Toast: the standard for CD and DVD burning in the Audio engineering community
  • Jeff is concerned with future compatibility because there is a new version of Toast and it has recently changed the way one can access a DVD or CD Older CDs and DVDs burned using the old Version of toast cannot be viewed with the new version of Toast.
  • I also think he said the new version of toast requires that you have toast to view the files I could be wrong.
  • Jeff wants something for DVD burning that will be more future compatible Jeff also mentioned an issue with toast wanting to automatically convert wav files to aif and how there is a cross platform (PC vs. Mac) issue with this.

2. Issue of cross platform compatibility for tagging and metadata

  • The audio folks use a mac. Mac has a different layout when you right click a file and check its properties. On a mac there is this thing called "Spotlight comments" where you can type in anything you wish about a file and it is then easily searchable.
  • My personal experience with this is that it is not cross compatible. I have an external hard drive that i use to store all my internship files and personal files. I own a mac and use the spotlight comments especially for my audio recordings. When connecting to a PC and right clicking the file and choosing "properties," the comments do not show up. Not only that - and if i remember correctly - when I returned to my Mac and viewed the file, the spotlight comments had been erased. This is a major problem if we go the windows explorer tagging and metadata route.
III. Individual Issues in Audio Preservation

A. Storage

1. File naming conventions:

According to the Colorado Digital Program (CDP):

"Systematic file naming is important for system compatibility, interoperability, and to demonstrate ownership of the digital asset. File naming conventions specific to an institution may be used. These might include protocols that include institutional acronyms, collection identifiers, or part designators, among others. In general it is recommended that the characters in the file names be alpha-numeric, lowercase, and not utilize spaces, tabs, commas, periods or any other characters reserved for computer system use, such as slashes, asterisks, or question marks."
Seems pretty obvious enough to me.

2. Hardware:

Hard Drives and particularly RAIDs (Redundant Array of Independent Disks) seem to be the general consensus for Audio storage.

The CDP defines a RAID as:
"A self-contained collection of disk drives that act as a single large hard drive storage system. These configurations are designed to enable a system to operate when an individual drive within the array fails, thus minimizing and ideally eliminating the potential for the loss of data. There are ten types of RAID configurations. In general, the better choices are RAID 5, 6 or 10."
Gary Louie, University of WA Music School archivist has the following to say about RAIDs:

  • "Hard drives are the only practical method now to store the size and amount of files we have. Hard drives are also poor for reliability, so any HD storage has to be RAID of some sort, and must be duplicated in at least one other physical location."
  • "Some things we are looking at for local, small scale archive "starters" are small RAID hard drive towers with removable sleds, duplicated onto similar arrays via gigabit ethernet in another building."
  • "You can certainly consider a really small scale HD storage method that will still be helpful to you on all counts at low cost. An external HD holder (like the Newertech Voyager) and a handful of bare HDs, with constant duplicates and stored at different locations. Could be in conjunction with a small server system."

ARSC Technical Committee on RAIDs:
"RAID arrays remove some of the disadvantages of individual hard drives while they provide larger capacities than individual drives. They are extremely easy to use, especially when supplied as network-attached storage, but still should be considered mechanical devices that may fail at some point in time. A single RAID array is not adequate for long-term storage."
For a more in-depth discussion of various RAID configurations go here.

Beyond the consensus on RAIDs as the best way to store digital audio, there is also a consensus that audio be stored in multiple locations and not on a single array.

The ARSC Technical Committee suggests:
"Generate local backup copies of all files as soon as possible after creation
Store backups on a separate device or separate media from the original.
Make at least two, or preferably three sets of archival master digital files of preserved content and store them in different locations, and possibly on different types of media."
The CDP suggests the same:
"It is highly recommended that organizations utilize a strategy employing multiple redundant copies of digital files in separate locations, as a fail-safe strategy for the failure or destruction of the digital media. Acceptable media may be optical disks, hard drives, or magnetic data tapes – all of which have particular strengths and weaknesses."
Depending on the needs and resources of an organization offsite storage is a potential option. This is where I believe our connection to UW could come in very handy.

The CDP says:
"Projects considering outsourcing should consider the costs of ingestion and ongoing maintenance fees; metadata requirements for outsourced files; workflow requirements; and accessibility requirements."
Vera's consulting "Technical Generalist" Darren White says:
"In terms of hard disk storage, VERA should be looking at a NAS/SAN on the VERA internal network, preferably with offsite backup. That way, if disaster strikes..."
Other options such as DVDs and CDs are not recommended.

The ARSC Technical Committee has the following to say about CDs and DVDs:
  • "Optical discs [are an] unsafe preservation storage media. Using this format requires costly test equipment to check the quality of both blank media and digital recordings. Even if this equipment is already available to an archive, the cost of storage media may be higher than the other formats, except in very small volume – individual CDs or DVDs are not expensive, but the cost-per-megabyte is substantially greater than other media types."
  • "The format cannot hold files at higher bit depth and sample rate than 16 bit, 44.1 kHz, which is far less than the de facto 24 bit, 96 kHz standard. In addition, migrating an archive stored on optical discs is expensive, because it requires much human intervention."
What the CDP says about CDs:
  • "Storage and retrieval costs will escalate as CD collections become larger and more challenging to manage."
  • "Limited physical life span (3 to 20 years) and the files stored on them are vulnerable due to physical deterioration, mishandling, improper storage and obsolescence."
  • "Obsolete equipment for reading the media poses a threat in the long term."
  • "Adhesive labels and permanent ink markers can cause early failure of CDs through chemical interaction with the CD’s recording layer."

Gary Louie, University of WA Music School archivist, says:
"Don't trust DVDs. No one thinks they will last very long. Same for CDs and tapes. Generally, the data written to them is also not very clean, especially audio CDs."
3. Data Migration and Hardware Updating

Current hardware and software seem to go obsolete every 3 to 4 years. It is therefore extremely important to implement a Migration schedule.

The CDP Suggests:
  • "Migrate your files frequently - at the very least every five years. Failure to incur these small incremental costs may lead to very large costs down the road when you need to migrate froman obsolete data format or medium."
  • "Every sustainable digitization project should include the costs of data migration as a yearly budget line item."
  • "At the time of migration, file integrity checks should be conducted... a “checksum” may be associated with a file when it is produced so that the checksum value may then be used to confirm the integrity of the file at the time of migration."

The subject of "checksum" makes for a great point of transition into the subject of quality control.

4. Quality Control

The CDP Recommends the following for quality control:
  • All digitized audio files should be sampled for sound quality.
  • Technicians charged with quality control should listen for consistency in the audio quality at a number of points in the recording, listening for distortions in the sound, for proper playback speeds, and for artifacts such as hiss and hum.
  • Technicians should specifically check that the volume levels are set correctly.
  • Recordings should also be checked for completeness.
  • Any distortions in the sound or other inconsistencies in the recordings should be noted in the metadata.
  • Metadata should also be checked for accuracy and completeness, with special attention paid to accuracy of file names, which can lead to the effective loss of files when recorded inaccurately.
  • In addition, file integrity checks may be conducted. A “checksum” may be associated with a file when it is produced so that the checksum value may then be used to confirm the integrity of the file at each step from creation to migration.
"A checksum is an algorithm-based method of determining the integrity and authenticity of a digital data object, such as a digital audio file, used to check whether errors or alterations have occurred during the transmission or storage of a data object. Please note, however, that a checksum will only tell you there is an error. It cannot tell you what the error is, nor can it correct the error."
"It is important to understand that quality control of audio can be a very subjective process. It is recommended that technicians performing this process be selected on the basis of their familiarity with a variety of types of recorded sound, and preferably have experience in a field that has already helped train their ears. Staff with previous audio experience such as audio engineers, are a natural selection, as are staff with formal musical training."
5. File Format, Sample Rate, and Bit Depth Standards

While in general both WAV and AIF formats are widely used there appears to be the emergence of a new format the BWF which is exactly the same as the WAV format, except that it captures metadata.

The CDP says the following regarding WAV and AIF:
"The WAV file type was developed by Microsoft, is in widespread use, and is readable by virtually all audio software programs. The AIF file type was developed by Apple Computer and is also in widespread use. Both of these file types are uncompressed and acceptable for long-term file storage. The WAV file type has become a standard and is recommended. In addition, the WAV file type is also available in a professional flavor, broadcast WAV (BWF), which has the capability to store metadata in the file header. Although not all audio software programs are currently capable of reading or writing to the metadata header, the BWF format is emerging as the WAV file type of preference for archival audio projects."
For the sake of avoiding redundancy, I will simply say that all other sources considered thus far, are generally on the same page regarding WAV and AIF. However, since Vera also uses protools files (ptf) there are other issues to consider.

Regarding Protools files, Gary Louie said the following:
"If you are thinking long-term (10-50-100 years) who knows if anyone will be able to read ProTools files at all. The international recommendation for audio storage is the Broadcast WAV file, which holds the audio plus metadata for catalog and technical info. But that doesn't help much with PT sessions. We (i.e. University of Washington) don't do any PT work, so you should probably ask elsewhere (maybe the ProSoundWeb or Digidesign's forums) on how people archive their PT material."
I have yet to research archival standards for protools files, if there are any. When I find the answer I will update this post. However, all things considered, the new BWF seems to be the optimal choice for Vera's needs.

Below is a diagram from the CDP regarding Sample Rate and Bit Depth Standards:




6. Metadata Issues and Standards

As established above in my conversation with Jeffery, retention of metadata when working across platforms (Mac to PC) is a major issue. Depending on how thorough BWF is and if it is cross-compatible, there are other options to consider.

Darren White suggested the following:
For maximum portability of metadata, I would recommend that it be stored externally to the file itself. Even with, say, MP3 files and ID3 tags, cross- platform compatibility issues remain, even after all these years. So WiMP and iTunes might make entirely different sense/nonsense out of the same ID3 tags. I also use a Mac primarily and I think if we are talking Protools and AIFF files, Mac capabilities may be sufficient for basic metadata but to allow for expanding requirements, something like a CMS and database or even just an external XML file may make more sense in the long run.
The Arts and Humanities Data Service Perservation Handbook for Digital Audio suggests:
"Ensure that associated metadata integrated into the file itself can be extracted in the software tool and stored in the preservation format. If not, the information should be manually output to an ASCII text file."
ASCII or XML might be the way to bridge the gap between Mac and PC. If we cannot attach metadata and tags directly to the audio files we can create a text file with all the pertinent information about each file. Someone can either create one text file to match each audio file with an identical file name or just one text file for a group of audio files within one folder. I lean a bit more towards individual files.

Sound Directions, a joint project between the University of Indiana and Harvard College has an excellent resource on metadata standards, which is much to large to duplicate on this blog, so I have made it available to you here.

The complete publication is available here.

VI. Online Sources

1.) ARSC Technical Committee on Preservation of Archival Sound Recordings
http://www.arsc-audio.org/pdf/ARSCTC_preservation.pdf

2.) Arts and Humanities Data Service Perservation Handbook for Digital

Audio http://www.ahds.ac.uk/preservation/audio-preservation-handbook.pdf

3.) Colorado Digital Program Audio Best Practices
http://www.bcr.org/dps/cdp/best/digital-audio-bp.pdf