Audio Data Formats

Each Audio System on AJA’s NTV2 devices support up to 8 or 16 audio channels. (While some devices can be configured to support 6 channels, this may cause problems and should be avoided.)

The format of audio sample data in the host buffer exactly mirrors the format of audio sample data in the Audio System's audio buffer in SDRAM on the device. See Audio System Operation for more information.

Note: These formats apply even if the audio sample data is not PCM. See Audio System Operation for more information about non-PCM audio.

Single-Channel Individual Audio Samples

Each single-channel audio sample requires exactly four bytes of storage.

24-Bit (HD) Audio

Only the upper (most-significant) 24 bits are used, as follows (shown here as “Little-Endian”):

Note that when playing video from a host audio buffer that contains 32-bit audio samples, only the most-significant 24 bits will end up in the transmitted audio packets. The least-significant 8 bits from each 32-bit sample word in the host buffer are ignored.

20-Bit (SD) Audio

Only the upper (most-significant) 20 bits are used, as follows (shown here as “Little-Endian”):

Note that AJA devices don't support audio extended packets. Thus, when capturing SD video that contains 24-bit audio, only the most-significant 20 bits will end up in the captured samples in the host buffer. Likewise, when playing SD video from a host audio buffer that contains 24-bit (or 32-bit) audio samples, only the most-significant 20 bits will end up in the transmitted audio packets. The least-significant 12 bits from each 32-bit sample word in the host buffer are ignored.

6-Channel Mode

This format was used on older AJA devices, and is still provided for backward compatibility.

Note: 6-Channel Mode is deprecated. Please avoid its use, even on older boards.

8-Channel Mode

The layout for two samples of 8-channel audio data is:

Within each 32-bit word, the 20/24-bit audio sample is left-justified identically to that of the 6-channel format.

16-Channel Mode

The layout for two samples of 16-channel audio data is:

Within each 32-bit word, the 20/24-bit audio sample is left-justified identically to that of the 6-channel format.

Video Data Formats

All AJA devices provide and/or accept video, audio and ancillary data to/from the host in several formats. This section details the video formats and device frame buffer (data) formats.

Video Format

A video format describes a particular video signal, which implies a frame geometry, video standard, and frame rate. Each format is identified by a specific NTV2VideoFormat enumeration constant. All AJA devices support the “basic” SD-SDI and HD-SDI video formats that can be accommodated in a single 1.5 Gbps SDI link. Progressively newer AJA devices support 3Gbps dual-link HD-SDI and 3G-SDI formats, 6Gbps, 12Gbps, even up to 8K video.

SD Formats

525i 5994	NTV2_FORMAT_525_5994
525psf 2997	NTV2_FORMAT_525psf_2997
625i 50	NTV2_FORMAT_625_5000
625psf 2500	NTV2_FORMAT_625psf_2500

HD Formats

720p 50	NTV2_FORMAT_720p_5000
720p 5994	NTV2_FORMAT_720p_5994
720p 60	NTV2_FORMAT_720p_6000
1080p 2398	NTV2_FORMAT_1080p_2398
1080p 24	NTV2_FORMAT_1080p_2400
1080p 25	NTV2_FORMAT_1080p_2500
1080p 2997	NTV2_FORMAT_1080p_2997
1080p 30	NTV2_FORMAT_1080p_3000
1080i 50, 1080psf 25	NTV2_FORMAT_1080i_5000
1080i 5994, 1080psf 2997	NTV2_FORMAT_1080i_5994
1080i 60, 1080psf 30	NTV2_FORMAT_1080i_6000

2K×1080 Formats

Note: These formats are unrelated to the ancient 2K “Film” formats — e.g. NTV2_FORMAT_2K_1498, NTV2_FORMAT_2K_1500, NTV2_FORMAT_2K_2398, NTV2_FORMAT_2K_2400, NTV2_FORMAT_2K_2500. These older 2K film formats are obsolete and no longer supported by NTV2 devices.

2K×1080p 2398	NTV2_FORMAT_1080p_2K_2398
2K×1080p 24	NTV2_FORMAT_1080p_2K_2400
2K×1080p 25	NTV2_FORMAT_1080p_2K_2500
2K×1080p 2997	NTV2_FORMAT_1080p_2K_2997
2K×1080p 30	NTV2_FORMAT_1080p_2K_3000
2K×1080p 4795	NTV2_FORMAT_1080p_2K_4795_A
2K×1080p 48	NTV2_FORMAT_1080p_2K_4800_A
2K×1080p 50	NTV2_FORMAT_1080p_2K_5000_A
2K×1080p 5994	NTV2_FORMAT_1080p_2K_5994_A
2K×1080p 60	NTV2_FORMAT_1080p_2K_6000_A

UHD “Square” Formats

4×1920×1080psf 2398	NTV2_FORMAT_4x1920x1080psf_2398
4×1920×1080p 2398	NTV2_FORMAT_4x1920x1080p_2398
4×1920×1080psf 24	NTV2_FORMAT_4x1920x1080psf_2400
4×1920×1080p 24	NTV2_FORMAT_4x1920x1080p_2400
4×1920×1080psf 25	NTV2_FORMAT_4x1920x1080psf_2500
4×1920×1080p 25	NTV2_FORMAT_4x1920x1080p_2500
4×1920×1080p 2997	NTV2_FORMAT_4x1920x1080p_2997
4×1920×1080p 30	NTV2_FORMAT_4x1920x1080p_3000

UHD HFR “Square” Formats

4×1920×1080p 50	NTV2_FORMAT_4x1920x1080p_5000
4×1920×1080p 5994	NTV2_FORMAT_4x1920x1080p_5994
4×1920×1080p 60	NTV2_FORMAT_4x1920x1080p_6000

UHD TSI Formats

3840×2160p 2398	NTV2_FORMAT_3840x2160p_2398
3840×2160p 24	NTV2_FORMAT_3840x2160p_2400
3840×2160p 25	NTV2_FORMAT_3840x2160p_2500
3840×2160p 2997	NTV2_FORMAT_3840x2160p_2997
3840×2160p 30	NTV2_FORMAT_3840x2160p_3000

UHD HFR TSI Formats

3840×2160p 50	NTV2_FORMAT_3840x2160p_5000
3840×2160p 5994	NTV2_FORMAT_3840x2160p_5994
3840×2160p 60	NTV2_FORMAT_3840x2160p_6000

4K “Square” Formats

4×2048×1080psf 2398	NTV2_FORMAT_4x2048x1080psf_2398
4×2048×1080p 2398	NTV2_FORMAT_4x2048x1080p_2398
4×2048×1080psf 24	NTV2_FORMAT_4x2048x1080psf_2400
4×2048×1080p 24	NTV2_FORMAT_4x2048x1080p_2400
4×2048×1080psf 25	NTV2_FORMAT_4x2048x1080psf_2500
4×2048×1080p 25	NTV2_FORMAT_4x2048x1080p_2500
4×2048×1080p 2997	NTV2_FORMAT_4x2048x1080p_2997
4×2048×1080p 30	NTV2_FORMAT_4x2048x1080p_3000

4K HFR “Square” Formats

4×2048×1080p 4795	NTV2_FORMAT_4x2048x1080p_4795
4×2048×1080p 48	NTV2_FORMAT_4x2048x1080p_4800
4×2048×1080p 50	NTV2_FORMAT_4x2048x1080p_5000
4×2048×1080p 5994	NTV2_FORMAT_4x2048x1080p_5994
4×2048×1080p 60	NTV2_FORMAT_4x2048x1080p_6000
4×2048×1080p 11988	NTV2_FORMAT_4x2048x1080p_11988
4×2048×1080p 120	NTV2_FORMAT_4x2048x1080p_12000

4K TSI Formats

4096×2160p 2398	NTV2_FORMAT_4096x2160p_2398
4096×2160p 24	NTV2_FORMAT_4096x2160p_2400
4096×2160p 25	NTV2_FORMAT_4096x2160p_2500
4096×2160p 2997	NTV2_FORMAT_4096x2160p_2997
4096×2160p 30	NTV2_FORMAT_4096x2160p_3000

4K HFR TSI Formats

4096×2160p 4795	NTV2_FORMAT_4096x2160p_4795
4096×2160p 48	NTV2_FORMAT_4096x2160p_4800
4096×2160p 50	NTV2_FORMAT_4096x2160p_5000
4096×2160p 5994	NTV2_FORMAT_4096x2160p_5994
4096×2160p 60	NTV2_FORMAT_4096x2160p_6000
4096×2160p 11988	NTV2_FORMAT_4096x2160p_11988
4096×2160p 120	NTV2_FORMAT_4096x2160p_12000

UHD2 Formats

4×3840×2160p 2398	NTV2_FORMAT_4x3840x2160p_2398
4×3840×2160p 24	NTV2_FORMAT_4x3840x2160p_2400
4×3840×2160p 25	NTV2_FORMAT_4x3840x2160p_2500
4×3840×2160p 2997	NTV2_FORMAT_4x3840x2160p_2997
4×3840×2160p 30	NTV2_FORMAT_4x3840x2160p_3000
4×3840×2160p 50	NTV2_FORMAT_4x3840x2160p_5000
4×3840×2160p 5994	NTV2_FORMAT_4x3840x2160p_5994
4×3840×2160p 60	NTV2_FORMAT_4x3840x2160p_6000

8K Formats

4×4096×2160p 2398	NTV2_FORMAT_4x4096x2160p_2398
4×4096×2160p 24	NTV2_FORMAT_4x4096x2160p_2400
4×4096×2160p 25	NTV2_FORMAT_4x4096x2160p_2500
4×4096×2160p 2997	NTV2_FORMAT_4x4096x2160p_2997
4×4096×2160p 30	NTV2_FORMAT_4x4096x2160p_3000
4×4096×2160p 4795	NTV2_FORMAT_4x4096x2160p_4795
4×4096×2160p 48	NTV2_FORMAT_4x4096x2160p_4800
4×4096×2160p 50	NTV2_FORMAT_4x4096x2160p_5000
4×4096×2160p 5994	NTV2_FORMAT_4x4096x2160p_5994
4×4096×2160p 60	NTV2_FORMAT_4x4096x2160p_6000

To determine if a given device can handle a particular video format, call NTV2DeviceCanDoVideoFormat.

Frame Buffer Geometries

For each video format and frame buffer (pixel) format, video is arranged differently in memory. This means that for each format, there’s a different number of bytes per horizontal line (or “line pitch”, in 32-bit words) for each standard.

The NTV2FormatDescriptor class is used to inquire about rasters of any NTV2Standard and NTV2FrameBufferFormat. Once constructed, it can tell you the frame pixel dimensions (with or without VANC lines), the number of bytes per row, the byte count required to hold the frame, the byte offset to a particular line, etc.

Device Frame Buffer Formats

Uncompressed RGB and YCbCr video data in the device frame buffer is always stored full-frame. Interlaced video is always stored in the frame buffer with the first line of Field 1 (F1L1) at the top of the buffer, followed by the first line of Field 2 (F2L1), then F1L2, F2L2, F1L3, F2L3, etc., alternating to the end of the frame. (A very VERY long time ago, AJA made devices that stored all of Field 1’s lines in the top half of the buffer, and all of Field 2’s lines in the bottom half of the buffer. These devices and buffer formats are no longer supported.)

The frame buffer format describes what kind of data is stored in each frame and how the data is arranged in memory. Each format is identified by a specific NTV2FrameBufferFormat (aka NTV2PixelFormat) enumeration constant.

Note: The format of video data on the NTV2 device is identical to the format of the video data on the host, whether after transferring captured frames from the device to the host, or before transferring frames to be played from the host to the device.

All AJA devices support these basic frame buffer formats:

10-Bit YCbCr Format	NTV2_FBF_10BIT_YCBCR
8-Bit YCbCr Format	NTV2_FBF_8BIT_YCBCR

Many AJA devices support these additional frame buffer formats:

10-Bit RGB Format	NTV2_FBF_10BIT_RGB
Alternate 8-Bit YCbCr ('YUY2')	NTV2_FBF_8BIT_YCBCR_YUY2
32-Bit ARGB (PC) 8 bpc	NTV2_FBF_ARGB
32-Bit RGBA (Mac) 8 bpc	NTV2_FBF_RGBA
32-Bit ABGR (OpenGL) 8 bpc	NTV2_FBF_ABGR
10-Bit RGB - DPX Format	NTV2_FBF_10BIT_DPX
10-Bit YCbCr - DPX Format	NTV2_FBF_10BIT_YCBCR_DPX
8-Bit DVCPro	NTV2_FBF_8BIT_DVCPRO
8-Bit HDV	NTV2_FBF_8BIT_HDV
24-Bit RGB	NTV2_FBF_24BIT_RGB
24-Bit BGR	NTV2_FBF_24BIT_BGR
10-Bit DPX Little-Endian	NTV2_FBF_10BIT_DPX_LE
48-Bit RGB	NTV2_FBF_48BIT_RGB

New HDR pixel formats:

12-Bit Packed RGB

NTV2_FBF_12BIT_RGB_PACKED

Some AJA devices support planar frame buffer formats:

3-Plane 8-Bit YCbCr 4:2:0 ('I420' a.k.a. 'YUV-P420')	NTV2_FBF_8BIT_YCBCR_420PL3
3-Plane 8-Bit YCbCr 4:2:2 (Weitek 'Y42B' a.k.a. 'YUV-P8')	NTV2_FBF_8BIT_YCBCR_422PL3
3-Plane 10-Bit YCbCr 4:2:0 ('I420_10LE' a.k.a. 'YUV-P420-L10')	NTV2_FBF_10BIT_YCBCR_420PL3_LE
3-Plane 10-Bit YCbCr 4:2:2 ('I422_10LE' a.k.a. 'YUV-P-L10')	NTV2_FBF_10BIT_YCBCR_422PL3_LE
2-Plane 10-Bit YCbCr 4:2:0 ('YUV-P420-10')	NTV2_FBF_10BIT_YCBCR_420PL2
2-Plane 10-Bit YCbCr 4:2:2 ('YUV-P-10')	NTV2_FBF_10BIT_YCBCR_422PL2
8-Bit YCbCr 420 2-Plane	NTV2_FBF_8BIT_YCBCR_420PL2
8-Bit YCbCr 422 2-Plane	NTV2_FBF_8BIT_YCBCR_422PL2

To determine if a given device can handle a particular frame buffer format, call NTV2DeviceCanDoFrameBufferFormat.

The remainder of this section describes how these formats are laid out in memory. Note that a hardware color-space converter will convert the SDI (YCbCr) input/output data to/from RGB as necessary for the RGB formats.

Most AJA devices support RGB formats on SDI inputs and outputs. To determine if the device can support RGB over SDI, check if the device has a dual-link widget (i.e., call NTV2DeviceCanDoWidget, passing it NTV2_WgtDualLinkOut1. If the device can’t handle RGB over SDI, RGB data from the frame buffer must be converted to YCbCr before being output. Similarly, when an RGB frame buffer format is desired, the incoming YCbCr data must go through a color space converter en route to the frame buffer.

8-Bit YCbCr Format

This format, identified by the NTV2_FBF_8BIT_YCBCR enumeration constant, is used by both Windows ('UYVY') and QuickTime ('2vuy') for 8-Bit YCbCr video. Here’s the memory layout of two pixels:

10-Bit YCbCr Format

This format, identified by the NTV2_FBF_10BIT_YCBCR enumeration constant, has twelve 10-bit unsigned components that are packed into four 32-bit little-endian words (i.e. 6 pixels are represented in each 16 bytes). This is the format used in QuickTime movie files to store 10-bit YCbCr video and is referred to (by Apple and in MS-Windows) as the 'v210' format.

Here are the four 32-bit words (six pixels) in increasing address order:

Here are the same six pixels – the four 32-bit little-endian words – in decreasing address order:

8-Bit ARGB, RGBA, ABGR Formats

These formats incorporate 8-bit Red, Green, Blue and Alpha (key) components. The device Color Space Converter(s) will perform the proper conversion to/from 10-Bit YCbCr and Key.

8-Bit ARGB, identified by the NTV2_FBF_ARGB enumeration constant, is used extensively on the Windows platform (and on most AJA devices can be routed to an SDI output for ARGB 4:4:4:4 over-the-wire):

8-Bit BGRA, identified by the NTV2_FBF_RGBA enumeration constant, is used extensively on the MacOS platform:

8-Bit ABGR, identified by the NTV2_FBF_ABGR enumeration constant, is used extensively with OpenGL:

10-Bit RGB Format

This format is identified by the NTV2_FBF_10BIT_RGB enumeration constant. For playout, the AJA device firmware converts the 10-bit RGB video data formatted as in the table below to the expected SMPTE-standard 10-Bit YCbCr output signal. Conversely, for capture/ingest, 10-bit YCbCr input video is converted into this 10-bit RGB pixel format:

The most significant two bits of each 32-bit pixel contain its alpha information:

10-Bit RGB - DPX Format

This format, identified by the NTV2_FBF_10BIT_DPX enumeration constant, is laid out as follows, before byte-swapping:

This is the memory layout after byte-swapping:

10-Bit YCbCr - DPX Format

This format, identified by the NTV2_FBF_10BIT_YCBCR_DPX enumeration constant, is laid out as follows, before byte-swapping:

After byte-swapping:

24-Bit RGB

This format, identified by the NTV2_FBF_24BIT_RGB enumeration constant, is laid out as follows:

24-Bit BGR

This format, identified by the NTV2_FBF_24BIT_BGR enumeration constant, has this single-pixel layout:

10-Bit DPX Little-Endian

This format, identified by the NTV2_FBF_10BIT_DPX_LE enumeration constant, has this single-pixel layout:

48-Bit RGB

This format, identified by the NTV2_FBF_48BIT_RGB enumeration constant, has this single-pixel layout:

Note: The least-significant 4 bits of each color component in this format are zero or irrelevant.

12-Bit Packed RGB

This AJA HDR format, identified by the NTV2_FBF_12BIT_RGB_PACKED enumeration constant, has this layout:

Alternate 8-Bit YCbCr ('YUY2')

This format, identified by the NTV2_FBF_8BIT_YCBCR_YUY2 enumeration constant, has this single-pixel layout:

8-Bit DVCPro

This format, identified by the NTV2_FBF_8BIT_DVCPRO enumeration constant, is a popular lossy 4:1:1 compression scheme.

8-Bit HDV

This format, identified by the NTV2_FBF_8BIT_HDV enumeration constant, is a lossy H.262/MPEG-2 (video) and MPEG-1 Layer 2 (audio) compression scheme.

10-Bit Raw YCbCr (CION)

This format, identified by the NTV2_FBF_10BIT_RAW_YCBCR enumeration constant, is used for raw RGB Bayer capture from the AJA CION camera. Bayer pixels are 10-bit resolution stored in big-endian packed format, as required by the ‘DNG’ (TIFF) file specification. The packing cadence is 16 Bayer pixels in 20 bytes.

3-Plane 8-Bit YCbCr 4:2:0 ('I420' a.k.a. 'YUV-P420')

This format, identified by the NTV2_FBF_8BIT_YCBCR_420PL3 enumeration constant, is a popular planar video encoding format.

For all planes, the left-to-right, top-to-bottom pixel values are laid out in memory in increasing address order.

The luminance plane is a sequence of 8-bit (0-255) luminance values, one byte per pixel. Thus, the size, in bytes, of the luma plane is WxH bytes, where W is the raster width (in pixels), and H is the raster height (in lines). The Luma Plane should terminate on a 32-bit (4-byte) boundary.

The chroma planes immediately follow the luma plane in memory, each being one-fourth the size of the luma plane, the Cb plane preceding the Cr plane. Chroma values are 8-bit (0-255) values, one per 2x2 pixel quad (horizontal and vertical subsampling).

Since all values are byte values, there are no endianness issues.

3-Plane 8-Bit YCbCr 4:2:2 (Weitek 'Y42B' a.k.a. 'YUV-P8')

This format, identified by the NTV2_FBF_8BIT_YCBCR_422PL3 enumeration constant, is a popular planar video encoding format.

For all planes, the left-to-right, top-to-bottom pixel values are laid out in memory in increasing address order.

The luminance plane is a sequence of 8-bit (0-255) luminance values, one byte per pixel. Thus, the size, in bytes, of the luma plane is WxH bytes, where W is the raster width (in pixels), and H is the raster height (in lines). The luma plane should terminate on a 32-bit (4-byte) boundary.

The chroma planes immediately follow the luma plane in memory, each half the size of the luma plane, the Cb plane preceding the Cr plane. Chroma values are 8-bit (0-255) values, one per horizontal pixel pair (horizontal-only subsampling).

Since all values are byte values, there are no endianness issues.

3-Plane 10-Bit YCbCr 4:2:0 ('I420_10LE' a.k.a. 'YUV-P420-L10')

This format, identified by the NTV2_FBF_10BIT_YCBCR_420PL3_LE enumeration constant, is a popular planar video encoding format.

For all planes, the left-to-right, top-to-bottom pixel values are laid out in memory in increasing address order.

The luminance plane is a sequence of 10-bit (0-1023) luminance values, each stored in a 16-bit word per pixel in little-endian byte order, with the most-significant 6 bits set to zero. Thus, the size, in bytes, of the luma plane is WxHx2 bytes, where W is the raster width (in pixels), and H is the raster height (in lines). The Luma Plane should terminate on a 64-bit (8-byte) boundary.

The chroma planes immediately follow the luma plane in memory, each being one-fourth the size of the luma plane, the Cb plane preceding the Cr plane. Chroma values are 10-bit (0-1023) values, stored identically to the luma values, one chroma value per 2x2 pixel quad (horizontal and vertical subsampling).

3-Plane 10-Bit YCbCr 4:2:2 ('I422_10LE' a.k.a. 'YUV-P-L10')

This format, identified by the NTV2_FBF_10BIT_YCBCR_422PL3_LE enumeration constant, is a popular planar video encoding format.

For all planes, the left-to-right, top-to-bottom pixel values are laid out in memory in increasing address order.

The luminance plane is a sequence of 10-bit (0-1023) luminance values, each stored in a 16-bit word per pixel in little-endian byte order, with the most-significant 6 bits set to zero. Thus, the size, in bytes, of the luma plane is WxHx2 bytes, where W is the raster width (in pixels), and H is the raster height (in lines). The luma plane should terminate on a 64-bit (8-byte) boundary.

The chroma planes immediately follow the luma plane in memory, each half the size of the luma plane, the Cb plane preceding the Cr plane. Chroma values are 10-bit (0-1023) values, stored identically to the luma values, one chroma value per horizontal pixel pair (horizontal-only subsampling).

2-Plane 10-Bit YCbCr 4:2:0 ('YUV-P420-10')

This format, identified by the NTV2_FBF_10BIT_YCBCR_420PL2 enumeration constant, is a two-plane format commonly used with video encoders.

The raster pixel left-to-right, top-to-bottom scan order coincides with increasing address order.

The Luma Plane is a sequence of 10-bit (0-1023) luminance values stored in a packed succession of 8-bit chunks, requiring 5 bytes for every 4 pixels.

Thus, the size, in bytes, of the Luma Plane is W × H × 5 ÷ 4 bytes, where W is the raster width (in pixels), and H is the raster height (in lines). For example, for HD 1920×1080, each line requires exactly 2,400 bytes; 2,592,000 bytes for the entire image.

The Luma Plane should terminate on a 64-bit (8-byte) boundary — i.e., the W × H product should be divisible by 8.

The Chroma Plane immediately follows the Luma Plane in memory, half the size of the Luma plane.

Chroma values are 10-bit (0-1023) values, one Cb/Cr pair per 2×2 pixel quad (horizontal and vertical subsampling).

2-Plane 10-Bit YCbCr 4:2:2 ('YUV-P-10')

This format, identified by the NTV2_FBF_10BIT_YCBCR_422PL2 enumeration constant, is a two-plane format commonly used with video encoders. It’s identical to its 4:2:0 sibling (above) except the Chroma plane is full-height, not half-height:

Thus, this format only has horizontal chroma sub-sampling – there’s no vertical sub-sampling.