Presenting digital video consistently is dependent on the design, coordination and quality of all aspects of both the video file and the video player. Specific factors such as what features of a codec are supported by the decoder, and how one colour space is converted to another affect how videos are presented. Media players are of course developed over time 鈥 new features are added and bugs are resolved 鈥 but while such changes may improve the quality of a player they also create scenarios where a digital media file may play differently in a new version of a player compared to an older one. As a result, the ever-evolving state of media playback technology creates challenges or technical complications for audio-visual conservators who are tasked with ensuring that digital video is presented consistently and as originally聽intended.
Approaches
In 2014, during the initial discussions that contributed to the development of this paper, conservators and conservation technicians from the Presto4U Video Art Community of Practice Group discussed use cases and difficulties in sustaining particular renderings of digital video over long periods of time. The strategies that were initially considered to help ensure sustained and consistent presentation allied closely with the traditional preservation treatments of emulation, normalisation and migration, but it was also noted that guidance was needed to determine if these treatments were necessary, and which strategy represented the best聽approach.
In general, implementations of audio-visual files that adhere properly to their associated standards and use the minimum required level of complexity are easier to control technically and would find more consistency in playback. However, many of the tools available to media creators generate files with unnecessary complexities. In addition, the familiar tools used to create media files are often ill equipped to allow for analysis or assessment of the file formats that they create. As a result, an alternate set of tools is necessary for conservators to evaluate digital media and to identify the reasons for possible playback聽discrepancies.
Emulation
Sustaining video presentations through emulation requires maintaining a player and all of its dependencies. For instance, if a creator determines that a video is intended for presentation through QuickTime Pro 7 this may mean preserving as an application along with its underlying set of components as well as an operating system that can support QuickTime Pro 7鈥檚 underlying 32 bit QtKit framework. Often video players rely on system codec libraries or audio-visual frameworks working behind the scenes, so it can be challenging to determine what exactly is necessary to document emulation聽sufficiently.
On a Mac it is possible to review which files are being used by an active application with the lsof (list of open files) command. For instance, while playing a QuickTime file in QuickTime Pro 7,1 open the Terminal applications and run the following聽command:
lsof | grep聽QuickTime
The substantial list of files produced will include some which may or may not be related to the media being played but are opened by the QuickTime application. Particularly notable will be the files within QuickTime鈥檚 component libraries such as those within /Library/QuickTime or /System/Library/QuickTime. The component files within these directories are used to decode or demux (demultiplex) specific audio-visual formats. Some applications will install components within these folders during installation so the additional features will also affect any application using QuickTime Pro 7. Two computers may both use the same version of QuickTime and same operating system but if the component libraries differ, the players may work聽inconsistently.
While playing video in VLC Player and running the following聽command:
lsof | grep VLC
the report will show that most of the files used by VLC are from within the VLC application itself with a much lower reliance on system files. Because VLC is much more self-reliant than QuickTime it is much easier to generate an identical VLC playback environment on a different computer. Additionally the VLC website provides access to an archive of its installation files going back to the early years of its development. In this way although the playback of a particular file may possibly change from one version of VLC to another (due to bug fixes, new features etc.) it is feasible to acquire older versions and emulate a playback scenario from the past. The portability of VLC and availability of versions make it much more suited to emulation strategies than QuickTime Pro聽7.
Normalisation
When following a strategy of normalisation, content could be reformatted to a single format (or possibly to one of a short list of formats) that the conservator is better able to control technically. The disadvantage of normalisation is that as a collection becomes more technically diverse it is more difficult to find normalisation formats that sustain the significant characteristics of a diverse collection. Normalising effectively also requires the process to be assessed to identify any possible loss between the original and the resulting normalised copy. If the significant characteristics of the original are manipulated in order to produce a normalised output then playback may be affected. For instance, if an NTSC DV file with 4:1:1 chroma subsampling is normalised by converting it to a 4:2:0 h264 file the colour resolution will be reduced and diagonal lines will appear more block-like than in the聽original.
For audio-visual normalisation, lossless codecs may be used to prevent additional compression artefacts from affecting the output, but such lossless codecs should be selected that can support various significant characteristics. Codecs such as jpeg2000 and ffv1 offer a great amount of flexibility so that the most popular pixel formats are supported. However, lossless codecs find much lower levels of support compared to lossy ones and lossless normalised copies of original media may use substantially higher data rates compared to the original. For instance, a high-definition h264 file may playback properly but once transcoded to jpeg2000, most modern computers would have difficulty playing high-definition lossless jpeg2000. Playing an HD jpeg2000 file could provide accurate images but playback may stutter or lag. A similar problem exists for uncompressed video, where for large modern frame sizes it may be a challenge to get a disk to read data fast enough to provide a real-time playback of uncompressed聽video.
Normalising can offer a more expected playback for conservators that gain technical familiarity with the specifications used for normalisation targets. However, the process of normalising itself is dependent on the design of the demuxer, decoder, encoder, and muxer utilised in the normalisation process. If the decoder is using the wrong colour primaries or misinterpreting the aspect ratio then this misunderstanding may become part of the resulting normalised files. The output of normalisation should be assessed to ensure that the result looks like the source with the hope that the normalised copy finds more consistent and sustainable聽playback.
Working with budgetary and technical limitations, conservators may not have complete control over presentation technologies. A work may be intended for an interlaced display but may have to be presented on a progressive monitor because a suitable interlaced monitor is not available. Or, similarly, a variable frame rate may be presented through a display mechanism that only supports a short list of supported frame rates. For works that do not happen to comply with the technical constraints of the available player a new access derivative must be made to adhere to these constraints. In this way the design of a normalisation strategy may include the production of standardised exhibition formats linked to the technology used by an institution or particular artists rather than for purely archival purposes. It might be necessary to support a range of different players. Creating such derivatives may be a necessary part of facilitating access or display but care should be taken to ensure that the significant characteristics are not needlessly manipulated during this process, but only altered to fit with the restrictions of a particular access or display聽environment.
Migration
In a migration strategy the media may be maintained as it is. The conservator would track and define its technical qualities and determine how to play the file back properly with modern players. With this strategy there is not a specific need to sustain a specific player (emulation) or change the media to achieve consistency (normalisation), but a consistent presentation is achieved through selection or manipulation of the player. Sustaining consistent media presentation properly through migration requires a more in-depth understanding of the significant characteristics of the work and how they are interpreted (or misinterpreted) by a player that is known to present the video as聽intended.
For instance, if a video file contains metadata that indicates that a BT.709 colour matrix should be used, but the officially preferred 鈥榣ook鈥 is from a player that presents the colour incorrectly through a BT.601 colour matrix, the discrepancy and target must be identified and documented so that a future presentation could utilise a BT.601 matrix. Another example is where a video file utilises a container that states one aspect ratio whereas the stream states another and the creator of the media is accustomed to a player that prioritises the stream鈥檚 aspect ratio for use. This conflict must be well understood by the conservator so that the presentation intended by the media creator may be recreated. In a strategy of migration it is less important to maintain the specific player but more important to maintain the knowledge of how to achieve a particular type of presentation with the video聽file.
厂颈驳苍颈蹿颈肠补苍迟听肠丑补谤补肠迟别谤颈蝉迟颈肠蝉
When considering the options of normalisation and migration (and to some extent emulation) the identification and documentation of the significant properties of a video file are crucial to maintaining the intended playback and evaluating whether it reproduces successfully. To the greatest extent feasible the significant characteristics of audio-visual media should be sustained throughout conservation and presentation activities and contexts, including within the digitisation of analogue material or the reformatting of existing digital聽material.
础蝉辫别肠迟听谤补迟颈辞
The display aspect ratio refers to a ratio of the width of the presented frame to the height of the presented frame. This is not usually determined by the frame size alone. For instance a video with a 720 x 480 frame size (which contains an encoded image that uses 480 rows of 720 pixels each) may be presented at a wide 16:9 ratio or a narrower 4:3 aspect ratio. When a 720 x 480 frame size is presented at 4:3 it may occupy the space of 640 x 480 on a computer monitor. A 720 x 480 frame presented at 16:9 is often not technically feasible because 480 is not divisible by 9, but the image would roughly occupy 853 x 480 pixels on a聽monitor.
The pixel aspect ratio expresses the ratio of the presentation width of the pixel to the presentation height of the pixel, so a 720 x 480 image may have a pixel aspect ratio of 8:9 (meaning that the pixel is intended for presentation as a thin rectangle rather than a square) and thus have a display aspect ratio of 4:3. The equation goes like聽this:
( width / height ) * ( pixel-aspect-ratio ) =聽display-aspect-ratio
Audio-visual files may contain aspect ratio information inconsistencies. Some streams store aspect ratio information and some containers do as well so it is possible for this information to be contradictory. For instance, a DV stream may store an aspect ratio of either 4:3 or 16:9 (only these two are possible). However, this stream may be stored within a container that declares that the DV stream should be presented at 3:2 or 2.35:1 or that the stream should be presented at a ninety-degree counter-clockwise rotation. Usually in the case of such discrepancies the container鈥檚 declared information will take precedence, but players may vary per codec or per container in this聽precedence.
Some containers, such as AVI, do not have a standardised way to document aspect ratio. The same is true for many codecs, such as nearly any form of uncompressed video. As a result some combinations of container and stream, such as uncompressed video in AVI, may not conform to any particular aspect ratio at all. In most cases such video will present with a square pixel aspect ratio so the display aspect ratio will be simply the frame width divided by the height, but in many cases this is wrong. For instance a videotape digitised to uncompressed video in an AVI container with a frame size of 720 x 576 may have originally presented at a 4:3 display aspect ratio but the resulting AVI file will likely present at 720:576 which is equal to 5:4. This will make the video appear somewhat stretched but could be compensated by adjusting the聽player.
Newer versions of the specification for the QuickTime container use an 鈥榓per鈥 atom to store various aspect ratio instructions for different intentions.2 Here there may be different aspect ratios for 鈥楥lean Mode鈥, 鈥楶roduction Mode鈥 and 鈥楥lassic Mode鈥. These aspect ratios are in addition to aspect ratios already defined within the track header and possibly the aspect ratios declared with the stream. In these cases the aspect ratio of the QuickTime track header should be considered authoritative, although not all players may agree. To inspect QuickTime container architectures use a QuickTime atom parser like Dumpster or and look for the trackWidth and trackHeight values within the 鈥榯khd鈥 atom (Track Header), or see the 鈥榯apt鈥 atom that contains various alternative track aspect聽ratios.
贵谤补尘别听蝉颈锄别
The frame size refers to the width and height of the pixels encoded in the stream, such as 720 x 480 or 1440 x 720. Certain frame sizes are predominant, although nearly any frame size may exist. Sometimes the presence of chroma subsampling will limit which frame sizes are possible, for instance 4:2:0 frames must use even-numbered widths and heights, and 4:2:2 must use a width that is a multiple of 2, and 4:1:1 must use a width that is a multiple of聽4.
Care should be taken when the pixel width or height of the display monitor is substantially different than the frame size of the video since a large amount of video scaling will be required. For instance if a work of computer video art with a small frame size is intended to be shown on a high definition monitor there are several methods for scaling the video from one size to another.3
If a 192 x 108 image is intended to be shown on a 1920 x 1080 monitor then the width and height must each increase by a factor of 10. The total number of pixels in the original frame is 20,736 but the presentation must use 2,073,600; thus 99% of the pixels in the resulting presentation must be artificially created. Typically when video is scaled from a small size to a larger size the newly introduced pixels will be set to values that average the luminance and colour of its neighbours. In many cases this approach will result in the addition of new colours that never existed in the original image. In the case of computer or pixel video art the effect may appear muddy and聽artificial.
contains several methods for scaling pixel art to a larger size, such as the 鈥榥eighbor鈥 scaling algorithm (-vf scale=flags=neighbor) or the hqx filter (, accessed 27 February 2015). With these methods an image may be increased to a larger frame size but retain the pixel art look of the original smaller聽image.
Colourspace types (conversion聽matrices)
Although most computer displays present video pixel data in components of red, green and blue (RGB), most video is stored as YUV.4 Getting from RGB to YUV and YUV to RGB requires using an equation to transfer one colourspace to another. The rationale behind using YUV is that all of the information about luminosity is moved to one channel (Y) and the colour data is moved to the other two (U and V). Since the eye is less sensitive to colour compared to luminosity, colour may be sampled at a lower rate without much effect on the聽viewer.
A presentation challenge for working with YUV video is that there are several different equations available to convert Y, U and V to R, G and B. If YUV data is interpreted using a BT.601 equation it will have different colours than an interpretation using the BT.709 version of the equation. Generally the same equation should be used to convert YUV back to RGB as was used to originally create the YUV from RGB, although occasionally a video creator may consider that the unintentional use of the wrong colour matrix provides the intended聽look.
In general human eyes are not very sensitive to minor colour differences so without a side-by-side or attentive comparison the differences between the various colour matrices may be difficult to deduce. However, the difference will be easiest to identify within the areas containing the most saturated聽colour.
Some codecs such as h264, ProRes, or VP8 will contain metadata to declare which matrix should be used for decoding to RGB. However, many streams do not provide this information and often neither do聽containers.
The QuickTime container stores colour matrix information within the 鈥榥clc鈥 section of the 鈥榗olr鈥 atom. These parameters are used by ColorSync to generate one of the following video colour聽spaces:
- HD (Rec. BT.709)
- SD (SMPTE-C / BT.601)
- SD (PAL)
With the absence of 鈥榥clc鈥 data, QuickTime applications defer to use SMPTE-C/BT.601 to convert YUV to RGB, which would give incorrect colours if the intention were to use EBU PAL or Rec.聽709.
The FFmpeg scale filter supports the overriding of the colour matrix with the in_color_matrix and out_color_matrix options to change how YUV is interpreted to RGB. This feature can be used in normalisation transcoding to better explicitly set a colour matrix if a source file does not define this properly.5
颁丑谤辞尘补听蝉耻产蝉补尘辫濒颈苍驳
For YUV video often the colour data is sampled as a lower resolution than the brightness information. Thus a 720 x 480 video may encode brightness information at 720 x 480 but the colour may be subsampled at 360 x 480 (4:2:2), 360 x 240 (4:2:0) or 180 x 480 (4:1:1). Converting video from one lower resolution chroma subsampling pattern to another (such as 4:1:1 to 4:2:0) will significantly reduce the quality and accuracy of the colour data.6
Interlacement
Interlaced video is recorded as two field-based images per frame. On a progressive monitor these two images are combed together to form one image from two. On a progressive display interlaced content may appear to have a slight jagged horizontal look. If the ordering of the fields is altered so that the images appear in the wrong order the effect can be聽substantial.
One method to test that the ordering of fields in an interlaced video is correct is to play the fields as their own images without joining the two fields into a whole frame. This can be done with the 鈥榮eparatefields鈥 filter in FFmpeg, such聽as:
ffplay {interlaced_video.mov} -vf聽separatefields
Material that is improperly interlaced would play back in an especially choppy way with this presentation. FFmpeg has additional filters such as 鈥榮etfield鈥 and 鈥榝ieldorder鈥, which may be used to correct improperly interlaced video or video with incorrect interlacement metadata. Additionally, the interlacement patterns of video frames may be detected through FFmpeg鈥檚 idet filter. The following聽command:
ffmpeg -i {interlaced_video.mov} -vf idet -f null聽-
will provide a summary that states how many frames appear to be interlaced and how many appear to look progressive. Occasionally, FFmpeg鈥檚 idet filter may conclude that a video file 鈥榣ooks鈥 to be interlaced although the metadata of the stream and container either does not declare so or states an incorrect聽value.
Within QuickTime the 鈥榝iel鈥 atom stores information regarding the arrangement of fields and frames. With this atom the container declares if a video track is intended for progressive or interlaced聽presentation.
For QuickTime files that do not contain a 鈥榝iel鈥 atom the player may obtain interlacement information from the codec (if such information exists) or presume the file is progressive. For interlaced video within a QuickTime container it is recommended to use the 鈥榝iel鈥 atom to clarify whether the video is intended for progressive or for a type of interlaced聽arrangement.
The following is an example of fiel showing progressive video (data gathered via 鈥榤ediainfo 鈥搃nform=鈥滵etails;1鈥澛爁ile.mov鈥):
00D0430D聽聽聽 聽聽 Field/Frame Information (10聽bytes)
00D0430D聽聽聽聽 聽 Header (8聽bytes)
00D0430D聽聽聽聽聽 Size:聽聽聽聽聽聽聽聽聽聽聽聽聽聽聽聽聽聽聽聽聽聽聽聽 聽聽聽 10聽(0x0000000A)
00D04311聽聽聽聽聽 聽 Name:聽聽聽聽聽聽聽聽聽聽聽聽聽聽聽聽聽聽聽聽聽聽聽聽聽fiel
00D04315聽聽聽聽 聽聽 fields:聽聽聽聽聽聽聽聽聽聽聽聽聽聽聽聽聽聽聽聽聽聽聽 聽 1聽(0x01)
00D04316聽聽聽聽 聽聽 detail:聽聽聽聽聽聽聽聽聽聽聽聽聽聽聽聽聽聽聽聽聽聽聽 聽 0聽(0x00)
The following is an example of fiel showing interlaced聽video:
00000ED3聽聽聽 聽聽聽 Field/Frame Information (10聽bytes)
00000ED3聽聽聽聽 聽聽 Header (8聽bytes)
00000ED3聽聽聽聽聽 聽 Size:聽聽聽聽聽聽聽聽聽聽聽聽聽聽聽聽聽聽聽聽聽聽聽聽 聽聽聽 10聽(0x0000000A)
00000ED7聽聽聽聽聽 聽 Name:聽聽聽聽聽聽聽聽聽聽聽聽聽聽聽聽聽聽聽聽聽聽聽聽聽fiel
00000EDB聽聽聽聽 聽 fields:聽聽聽聽聽聽聽聽聽聽 聽聽聽聽聽聽聽聽聽聽聽聽聽聽 2聽(0x02)
00000EDC聽聽聽聽 聽 detail:聽聽聽聽聽聽聽聽聽聽聽聽聽聽聽聽聽聽聽聽聽聽聽 聽 14聽(0x0E)
The 鈥榝ields鈥 value will contain one of the following聽values:
- 1 or 0x01 =聽progressive
- 2 or 0x02 =聽interlaced
If fields is set to 0, then 鈥榙etail鈥 will also be set to 0, otherwise 鈥榙etail鈥 will be used to indicate one of the聽following:
- 0 or 0x00 =聽progressive
- 1 or 0x01 = field with lower address contains topmost聽line
- 6 or 0x06 = field with higher address contains topmost聽line
- 9 or 0x09 = field containing line with lowest address is temporally聽earlier
- 14 or 0xE = field containing line with lowest address is temporally聽later
If the 鈥榝iel鈥 atom is present it may be edited via Atom Inspector as a sub-atom of the 鈥榮tsd鈥 Sample Descriptions atom.7
YUV sample聽range
For 8-bit video a luma or chroma sample may have a value from 0 to 255 (in hexadecimal 0x00 - 0xFF) or 10-bit video would use a range of 0-1023 (for simplicity this description will provide examples based on an 8-bit expression). Video broadcast standards apply a constraint for video sample ranges of 16鈥235 for luma and 16鈥240 for chroma. Thus for video in broadcast range a value of 16 is black and 235 is white. However, for video in full range a value of 16 is a dark grey, 235 is a light grey, 255 is white, and 0 is聽black.
Some codecs such as h264 can indicate if they are intended for broadcast range or full-range playback. However, most codecs and containers do not have a mechanism to express this聽specifically.
Full-range video can be identified in by the presence of the 鈥楥olor Range鈥櫬爒alue:
Color range聽聽聽聽聽聽聽聽聽聽聽聽聽聽聽聽聽聽聽聽聽聽聽聽聽 聽聽 :聽Full
or in MediaInfo鈥檚 trace report, see the聽video_full_range_flag:
00308AB1聽 video_signal_type_present_flag (9聽bytes)
00308AB0聽 video_signal_type_present_flag:聽Yes
00308AAD聽 video_format:聽聽聽聽聽聽聽聽聽聽聽聽聽聽聽聽聽聽聽 聽聽聽聽聽聽聽聽聽 5 (0x05) - (3 bits)聽-
00308AB4聽 video_full_range_flag:聽聽聽聽聽聽聽聽聽聽 聽聽聽聽聽 1 (0x01) - (1 bits) -聽Full
In FFmpeg a full-range video is noted by the use of a 鈥榶uvj鈥 prefix for the pixel聽format:
Stream #0:0(eng): Video: h264 (High) (avc1 / 0x31637661), yuvj420p(pc, bt709), 1280x720 [SAR 1:1 DAR 16:9], 25444 kb/s, 50 fps, 50 tbr, 50k tbn, 100k tbc聽(default)
The 鈥榡鈥 of yuvj stands for jpeg, which typically uses a full range for sample values without broadcast sample range聽constraints.
Video using full range is often found in DSLR cameras. Such video is often difficult to consistently present and process correctly as so many video applications are designed specifically for broadcast range. If a full range video is interpreted as if it is in broadcast range it will appear with a loss of detail in very white or black areas of the image and there will be a loss of聽contrast.
Full range v. broadcast聽range
The following command will generate a test file, which uses uncompressed 8-bit UYVY encoding and contains 256聽frames:
ffmpeg -f lavfi -i color=r=256,geq=lum_expr=N:cb_expr=128:cr_expr=128 -pix_fmt uyvy422 -vtag 2vuy -c:v rawvideo -r 256 -t 1聽test256y.mov
The Y value of each frame will be equal to the frame number, so frame 0 will show black and frame 255 will be聽white.
Playing the sample test file and using a digital colour meter can show what RGB values are created by the decoding. Here one can see that if it is decoded as broadcast range then frames 0鈥16 all create the same RGB value (0,0,0) and again frames 235鈥255 all create the same RGB value (255,255,255). Although frames 0鈥16 and frames 235鈥255 contain unique YUV values the RGB data they create is indiscernible on an RGB聽monitor.
Codec specific聽considerations
h264
The following command will create a very simple one-second h264 video with 256 frames (one for each possible value of an 8-bit luma聽expression):
ffmpeg -f lavfi -i color=r=256,geq=lum_expr=N:cb_expr=128:cr_expr=128 -pix_fmt yuv420p -c:v libx264 -r 256 -t 1聽test256y_broadcast.mp4
Because of the extreme simplicity of the visual image, the resulting h264 stream will be lossless. Each frame will contain only identical pixels where the value of the luma channel is equal to the frame number and chroma channels are set to 128 (mid-point of 8-bit range). Thus frame 42 will contain only samples where Y equals 42, so when displayed on a monitor R, G and B should be 30, 30 and 30 (a dark聽grey).
When the file is open in a libavcodec based player such as VLC or in QuickTime X each pixel per frame will decode identically as intended. This can be verified with a digital colour meter. QuickTime Pro 7 does not decode h264 properly but presents pixels with values that may deviate from the original. The overall effect of watching h264 in QuickTime Pro 7 is that there is a faint layer of noise over the image, which is added in all decoding, which affects both playback and transcoding via QuickTime Pro 7. Because of this, using QuickTime Pro 7 to transcode this h264 sample to an uncompressed format would be lossy and the resulting uncompressed file would contain the noise introduced by the QuickTime Pro 7 h264聽decoder.
Additionally H264 supports YUV encodings at both broadcast range and full range. If a full-range h264 file is played through a player that does not support the proper interpretation of full-range YUV samples then the video will appear to have a reduced level of contrast and whites and blacks will be聽clipped.
DV
Within QuickTime Pro 7 there is an option (under 鈥楽how Movie Properties/Visual Settings鈥) called 鈥楬igh Quality鈥. When 鈥楬igh Quality鈥 is disabled (which is the default) QuickTime will only decode part of the DV data. Presumably this was intended to allow older, less powerful computers to play DV even if it meant doing so improperly. In QuickTime Pro鈥檚 preferences is an option to 鈥楿se high-quality video setting when available鈥. Checking this will ensure that video is played correctly. When 鈥楬igh Quality鈥 is unchecked then DV files will play back a blurry and inaccurate聽image.
NTSC DV uses a 4:1:1 colour subsampling pattern that samples colour horizontally but not vertically. Nearly all modern video for the internet uses a 4:2:0 pattern. Both 4:1:1 and 4:2:0 sample colour data at the same rate but in incompatible patterns. As a result, when NTSC DV is converted from 4:1:1 to 4:2:0 (such as in a transcoding for a web presentation) there will be substantial loss to colour detail. The results will contain a softer image and diagonal lines will appear聽jagged.
MediaInfo
MediaInfo assesses digital audio-visual media and reports on various technical characteristics.8 MediaInfo will demultiplex (demux) the container format and interpret the contents. In many cases MediaInfo will also analyse portions of the streams contained with the format to gather additional information. The resulting information is then selected and associated with a technical vocabulary managed by MediaInfo. MediaInfo may then deduce additional pieces of information. For instance, by identifying a particular codec fourcc, MediaInfo may then deduce and report on a codec name and other associated聽attributes.
By default MediaInfo will show a fairly concise report. This can be obtained via the MediaInfo command line program with this command (replace file.mov with the filepath of a file to聽analyse):
尘别诲颈补颈苍蹿辞听蹿颈濒别.尘辞惫
A more detailed report may be obtained聽with:
mediainfo -f聽file.mov
The -f here stands for the 鈥榝ull鈥 option. In the full report many metadata values will appear identical, although they are formatted to serve different use聽cases.
By default MediaInfo uses human-readable labels for metadata terms. For archival or programmatic use of MediaInfo there is a raw language option, which uses internally unique metadata labels. This option may be obtained like聽this:
mediainfo -f 鈥搇anguage=raw聽file.mov
As an example of these outputs here is duration information of the video track with MediaInfo聽file.mov:
Duration聽聽聽聽聽聽聽聽聽聽聽聽聽聽聽聽聽聽聽聽聽聽聽聽聽聽聽聽 聽聽聽聽 : 1mn聽21s
Source duration聽聽聽聽聽聽聽聽聽聽聽聽聽聽聽聽聽聽聽聽聽 : 1mn聽21s
with 鈥榤ediainfo -f聽file.mov鈥:
Duration聽聽聽聽聽聽聽聽聽聽聽聽聽聽聽聽聽聽聽聽聽聽聽聽聽聽聽聽 聽聽聽聽 :聽81938
Duration聽聽聽聽聽聽聽聽聽聽聽聽聽聽聽聽聽聽聽聽聽聽聽聽聽聽聽聽 聽聽聽聽 : 1mn聽21s
Duration聽聽聽聽聽聽聽聽聽聽聽聽聽聽聽聽聽聽聽聽聽聽聽聽聽聽聽聽 聽聽聽聽 : 1mn 21s聽938ms
Duration聽聽聽聽聽聽聽聽聽聽聽聽聽聽聽聽聽聽聽聽聽聽聽聽聽聽聽聽 聽聽聽聽 : 1mn聽21s
Duration聽聽聽聽聽聽聽聽聽聽聽聽聽聽聽聽聽聽聽聽聽聽聽聽聽聽聽聽 聽聽聽聽 :聽00:01:21.938
Duration聽聽聽聽聽聽聽聽聽聽聽聽聽聽聽聽聽聽聽聽聽聽聽聽聽聽聽聽 聽聽聽聽 :聽00:01:21;28
Duration聽聽聽聽聽聽聽聽聽聽聽聽聽聽聽聽聽聽聽聽聽聽聽聽聽聽聽聽 聽聽聽聽 : 00:01:21.938聽(00:01:21;28)
Source duration聽聽聽聽聽聽聽聽聽聽聽聽聽聽聽聽聽聽聽聽聽 :聽81949
Source duration聽聽聽聽聽聽聽聽聽聽聽聽聽聽聽聽聽聽聽聽聽 : 1mn聽21s
Source duration聽聽聽聽聽聽聽聽聽聽聽聽聽聽聽聽聽聽聽聽聽 : 1mn 21s聽949ms
Source duration聽聽聽聽聽聽聽聽聽聽聽聽聽聽聽聽聽聽聽聽聽 : 1mn聽21s
Source duration聽聽聽聽聽聽聽聽聽聽聽聽聽聽聽聽聽聽聽聽聽 :聽00:01:21.949
and with mediainfo -f 鈥搇anguage=raw聽file.mov
Duration聽聽聽聽聽聽聽聽聽聽聽聽聽聽聽聽聽聽聽聽 聽聽聽聽聽聽聽聽聽聽聽 :聽81938
Duration/String聽聽聽聽聽聽聽聽聽聽聽聽聽 聽聽聽聽聽聽聽聽 : 1mn聽21s
Duration/String1聽聽聽聽聽聽聽 聽聽聽聽聽聽聽聽聽聽聽聽 : 1mn 21s聽938ms
Duration/String2聽聽聽聽聽聽聽聽聽聽聽聽 聽聽聽聽聽聽聽 : 1mn聽21s
Duration/String3聽聽聽聽聽聽聽聽聽聽聽聽 聽聽聽聽聽聽聽 :聽00:01:21.938
Duration/String4聽聽聽聽聽聽聽聽聽聽聽聽 聽聽聽聽聽聽聽 :聽00:01:21;28
Duration/String5聽聽聽聽聽聽聽聽聽聽聽聽 聽聽聽聽聽聽聽 : 00:01:21.938聽(00:01:21;28)
Source_Duration聽聽聽聽聽聽聽聽聽聽聽聽聽 聽聽聽聽聽聽 :聽81949
Source_Duration/String聽聽聽聽聽聽 聽聽 : 1mn聽21s
Source_Duration/String1聽聽聽聽聽 聽 : 1mn 21s聽949ms
Source_Duration/String2聽聽聽聽聽 聽 : 1mn聽21s
Source_Duration/String3聽聽聽聽聽 聽 :聽00:01:21.949
Note that each duration string type expresses the duration in a different manner. 鈥楧uration鈥 expresses the time in milliseconds, whereas 鈥楧uration/String3鈥 uses HH:MM:SS.mmm (hour, minute, second, millisecond), and 鈥楧uration/String4鈥 uses HH:MM:SSFF (hour, minute, second,聽frame).
Identification and playback maintenance聽risks
In the example above the video track provides two sets of durations called 鈥楧uration鈥 and 鈥楽ource Duration鈥. MediaInfo will often start a metadata term with the prefix 鈥楽ource_鈥 or 鈥極riginal_鈥 to express a conflict between the container and stream. Here the video stream contains 00:01:21.949 of video but the container presents a duration of 00:01:21.938. In this case, the QuickTime container of the file uses an edit list to only show a portion of the video. Video players that properly interpret QuickTime edit lists will show 00:01:21.938 of video whereas players that do not will play聽00:01:21.949.
In MediaInfo鈥檚 metadata labelling language the presence of a particular tag that is paired with a tag of the same name prefixed by 鈥楽ource_鈥 or 鈥極riginal_鈥 documents an expected or unexpected difference between the metadata of the container and the metadata of the stream. In some cases players may vary as to which (container or stream) are used to gather essential technical metadata for playback. When significant characteristics of a file vary internally care should be taken to determine which expression of that characteristic is聽official.
Conclusions
Conservation laboratories are filled with tools and long-refined expertise designed for dealing with a variety of artworks and materials. However, for digital video works the tools are only recently being identified and the expertise is in very early development. It is not difficult to imagine that conservators of the near future may be equally adept at identifying and resolving presentation and maintenance issues with digital video as conservators are currently with other types of physical聽material.
Unfortunately the technological diversity and complexity of codecs, containers and implementations within digital video collections makes it difficult to provide a simple set of guidelines that address presentation inconsistencies. However, familiarisation with open tools such as FFmpeg and MediaInfo can reveal many details about digital video that are often unseen and provide more technical control over such聽content.
A more active relationship between conservators and the developers of their utilities can also benefit both communities. As part of the research for this article the author submitted several tickets and requests to MediaInfo and VLC to make them more suitable for addressing conservation concerns and issues of consistent presentation. Often a small amount of development (whether performed voluntarily or through sponsorship) can have a great impact on conservation workflow. Additionally, as digital video conservation is still in an early stage of development it is crucial for digital video conservators to work as a community, sharing experiences, seeking advice and guiding the development of necessary聽expertise.