AJA NTV2 SDK
17.1.3.1410
NTV2 SDK 17.1.3.1410
|
Each Audio System on AJA’s NTV2 devices support up to 8 or 16 audio channels. (While some devices can be configured to support 6 channels, this may cause problems and should be avoided.)
The format of audio sample data in the host buffer exactly mirrors the format of audio sample data in the Audio System's audio buffer in SDRAM on the device. See Audio System Operation for more information.
Each single-channel audio sample requires exactly four bytes of storage.
Only the upper (most-significant) 24 bits are used, as follows (shown here as “Little-Endian”):
Note that when playing video from a host audio buffer that contains 32-bit audio samples, only the most-significant 24 bits will end up in the transmitted audio packets. The least-significant 8 bits from each 32-bit sample word in the host buffer are ignored.
Only the upper (most-significant) 20 bits are used, as follows (shown here as “Little-Endian”):
Note that AJA devices don't support audio extended packets. Thus, when capturing SD video that contains 24-bit audio, only the most-significant 20 bits will end up in the captured samples in the host buffer. Likewise, when playing SD video from a host audio buffer that contains 24-bit (or 32-bit) audio samples, only the most-significant 20 bits will end up in the transmitted audio packets. The least-significant 12 bits from each 32-bit sample word in the host buffer are ignored.
This format was used on older AJA devices, and is still provided for backward compatibility.
The layout for two samples of 8-channel audio data is:
Within each 32-bit word, the 20/24-bit audio sample is left-justified identically to that of the 6-channel format.
The layout for two samples of 16-channel audio data is:
Within each 32-bit word, the 20/24-bit audio sample is left-justified identically to that of the 6-channel format.
All AJA devices provide and/or accept video, audio and ancillary data to/from the host in several formats. This section details the video formats and device frame buffer (data) formats.
A video format describes a particular video signal, which implies a frame geometry, video standard, and frame rate. Each format is identified by a specific NTV2VideoFormat
enumeration constant. All AJA devices support the “basic” SD-SDI and HD-SDI video formats that can be accommodated in a single 1.5 Gbps SDI link. Progressively newer AJA devices support 3Gbps dual-link HD-SDI and 3G-SDI formats, 6Gbps, 12Gbps, even up to 8K video.
525i 5994 | NTV2_FORMAT_525_5994 |
525psf 2997 | NTV2_FORMAT_525psf_2997 |
625i 50 | NTV2_FORMAT_625_5000 |
625psf 2500 | NTV2_FORMAT_625psf_2500 |
720p 50 | NTV2_FORMAT_720p_5000 |
720p 5994 | NTV2_FORMAT_720p_5994 |
720p 60 | NTV2_FORMAT_720p_6000 |
1080p 2398 | NTV2_FORMAT_1080p_2398 |
1080p 24 | NTV2_FORMAT_1080p_2400 |
1080p 25 | NTV2_FORMAT_1080p_2500 |
1080p 2997 | NTV2_FORMAT_1080p_2997 |
1080p 30 | NTV2_FORMAT_1080p_3000 |
1080i 50, 1080psf 25 | NTV2_FORMAT_1080i_5000 |
1080i 5994, 1080psf 2997 | NTV2_FORMAT_1080i_5994 |
1080i 60, 1080psf 30 | NTV2_FORMAT_1080i_6000 |
2K×1080p 2398 | NTV2_FORMAT_1080p_2K_2398 |
2K×1080p 24 | NTV2_FORMAT_1080p_2K_2400 |
2K×1080p 25 | NTV2_FORMAT_1080p_2K_2500 |
2K×1080p 2997 | NTV2_FORMAT_1080p_2K_2997 |
2K×1080p 30 | NTV2_FORMAT_1080p_2K_3000 |
2K×1080p 4795 | NTV2_FORMAT_1080p_2K_4795_A |
2K×1080p 48 | NTV2_FORMAT_1080p_2K_4800_A |
2K×1080p 50 | NTV2_FORMAT_1080p_2K_5000_A |
2K×1080p 5994 | NTV2_FORMAT_1080p_2K_5994_A |
2K×1080p 60 | NTV2_FORMAT_1080p_2K_6000_A |
4×1920×1080psf 2398 | NTV2_FORMAT_4x1920x1080psf_2398 |
4×1920×1080p 2398 | NTV2_FORMAT_4x1920x1080p_2398 |
4×1920×1080psf 24 | NTV2_FORMAT_4x1920x1080psf_2400 |
4×1920×1080p 24 | NTV2_FORMAT_4x1920x1080p_2400 |
4×1920×1080psf 25 | NTV2_FORMAT_4x1920x1080psf_2500 |
4×1920×1080p 25 | NTV2_FORMAT_4x1920x1080p_2500 |
4×1920×1080p 2997 | NTV2_FORMAT_4x1920x1080p_2997 |
4×1920×1080p 30 | NTV2_FORMAT_4x1920x1080p_3000 |
4×1920×1080p 50 | NTV2_FORMAT_4x1920x1080p_5000 |
4×1920×1080p 5994 | NTV2_FORMAT_4x1920x1080p_5994 |
4×1920×1080p 60 | NTV2_FORMAT_4x1920x1080p_6000 |
3840×2160p 2398 | NTV2_FORMAT_3840x2160p_2398 |
3840×2160p 24 | NTV2_FORMAT_3840x2160p_2400 |
3840×2160p 25 | NTV2_FORMAT_3840x2160p_2500 |
3840×2160p 2997 | NTV2_FORMAT_3840x2160p_2997 |
3840×2160p 30 | NTV2_FORMAT_3840x2160p_3000 |
3840×2160p 50 | NTV2_FORMAT_3840x2160p_5000 |
3840×2160p 5994 | NTV2_FORMAT_3840x2160p_5994 |
3840×2160p 60 | NTV2_FORMAT_3840x2160p_6000 |
4×2048×1080psf 2398 | NTV2_FORMAT_4x2048x1080psf_2398 |
4×2048×1080p 2398 | NTV2_FORMAT_4x2048x1080p_2398 |
4×2048×1080psf 24 | NTV2_FORMAT_4x2048x1080psf_2400 |
4×2048×1080p 24 | NTV2_FORMAT_4x2048x1080p_2400 |
4×2048×1080psf 25 | NTV2_FORMAT_4x2048x1080psf_2500 |
4×2048×1080p 25 | NTV2_FORMAT_4x2048x1080p_2500 |
4×2048×1080p 2997 | NTV2_FORMAT_4x2048x1080p_2997 |
4×2048×1080p 30 | NTV2_FORMAT_4x2048x1080p_3000 |
4×2048×1080p 4795 | NTV2_FORMAT_4x2048x1080p_4795 |
4×2048×1080p 48 | NTV2_FORMAT_4x2048x1080p_4800 |
4×2048×1080p 50 | NTV2_FORMAT_4x2048x1080p_5000 |
4×2048×1080p 5994 | NTV2_FORMAT_4x2048x1080p_5994 |
4×2048×1080p 60 | NTV2_FORMAT_4x2048x1080p_6000 |
4×2048×1080p 11988 | NTV2_FORMAT_4x2048x1080p_11988 |
4×2048×1080p 120 | NTV2_FORMAT_4x2048x1080p_12000 |
4096×2160p 2398 | NTV2_FORMAT_4096x2160p_2398 |
4096×2160p 24 | NTV2_FORMAT_4096x2160p_2400 |
4096×2160p 25 | NTV2_FORMAT_4096x2160p_2500 |
4096×2160p 2997 | NTV2_FORMAT_4096x2160p_2997 |
4096×2160p 30 | NTV2_FORMAT_4096x2160p_3000 |
4096×2160p 4795 | NTV2_FORMAT_4096x2160p_4795 |
4096×2160p 48 | NTV2_FORMAT_4096x2160p_4800 |
4096×2160p 50 | NTV2_FORMAT_4096x2160p_5000 |
4096×2160p 5994 | NTV2_FORMAT_4096x2160p_5994 |
4096×2160p 60 | NTV2_FORMAT_4096x2160p_6000 |
4096×2160p 11988 | NTV2_FORMAT_4096x2160p_11988 |
4096×2160p 120 | NTV2_FORMAT_4096x2160p_12000 |
4×3840×2160p 2398 | NTV2_FORMAT_4x3840x2160p_2398 |
4×3840×2160p 24 | NTV2_FORMAT_4x3840x2160p_2400 |
4×3840×2160p 25 | NTV2_FORMAT_4x3840x2160p_2500 |
4×3840×2160p 2997 | NTV2_FORMAT_4x3840x2160p_2997 |
4×3840×2160p 30 | NTV2_FORMAT_4x3840x2160p_3000 |
4×3840×2160p 50 | NTV2_FORMAT_4x3840x2160p_5000 |
4×3840×2160p 5994 | NTV2_FORMAT_4x3840x2160p_5994 |
4×3840×2160p 60 | NTV2_FORMAT_4x3840x2160p_6000 |
4×4096×2160p 2398 | NTV2_FORMAT_4x4096x2160p_2398 |
4×4096×2160p 24 | NTV2_FORMAT_4x4096x2160p_2400 |
4×4096×2160p 25 | NTV2_FORMAT_4x4096x2160p_2500 |
4×4096×2160p 2997 | NTV2_FORMAT_4x4096x2160p_2997 |
4×4096×2160p 30 | NTV2_FORMAT_4x4096x2160p_3000 |
4×4096×2160p 4795 | NTV2_FORMAT_4x4096x2160p_4795 |
4×4096×2160p 48 | NTV2_FORMAT_4x4096x2160p_4800 |
4×4096×2160p 50 | NTV2_FORMAT_4x4096x2160p_5000 |
4×4096×2160p 5994 | NTV2_FORMAT_4x4096x2160p_5994 |
4×4096×2160p 60 | NTV2_FORMAT_4x4096x2160p_6000 |
To determine if a given device can handle a particular video format, call NTV2DeviceCanDoVideoFormat.
For each video format and frame buffer (pixel) format, video is arranged differently in memory. This means that for each format, there’s a different number of bytes per horizontal line (or “line pitch”, in 32-bit words) for each standard.
The NTV2FormatDescriptor class is used to inquire about rasters of any NTV2Standard and NTV2FrameBufferFormat. Once constructed, it can tell you the frame pixel dimensions (with or without VANC lines), the number of bytes per row, the byte count required to hold the frame, the byte offset to a particular line, etc.
Uncompressed RGB and YCbCr video data in the device frame buffer is always stored full-frame. Interlaced video is always stored in the frame buffer with the first line of Field 1 (F1L1) at the top of the buffer, followed by the first line of Field 2 (F2L1), then F1L2, F2L2, F1L3, F2L3, etc., alternating to the end of the frame. (A very VERY long time ago, AJA made devices that stored all of Field 1’s lines in the top half of the buffer, and all of Field 2’s lines in the bottom half of the buffer. These devices and buffer formats are no longer supported.)
The frame buffer format describes what kind of data is stored in each frame and how the data is arranged in memory. Each format is identified by a specific NTV2FrameBufferFormat (aka NTV2PixelFormat) enumeration constant.
All AJA devices support these basic frame buffer formats:
10-Bit YCbCr Format | NTV2_FBF_10BIT_YCBCR |
8-Bit YCbCr Format | NTV2_FBF_8BIT_YCBCR |
Many AJA devices support these additional frame buffer formats:
New HDR pixel formats:
12-Bit Packed RGB | NTV2_FBF_12BIT_RGB_PACKED |
Some AJA devices support planar frame buffer formats:
To determine if a given device can handle a particular frame buffer format, call NTV2DeviceCanDoFrameBufferFormat.
The remainder of this section describes how these formats are laid out in memory. Note that a hardware color-space converter will convert the SDI (YCbCr) input/output data to/from RGB as necessary for the RGB formats.
Most AJA devices support RGB formats on SDI inputs and outputs. To determine if the device can support RGB over SDI, check if the device has a dual-link widget (i.e., call NTV2DeviceCanDoWidget, passing it NTV2_WgtDualLinkOut1. If the device can’t handle RGB over SDI, RGB data from the frame buffer must be converted to YCbCr before being output. Similarly, when an RGB frame buffer format is desired, the incoming YCbCr data must go through a color space converter en route to the frame buffer.
This format, identified by the NTV2_FBF_8BIT_YCBCR enumeration constant, is used by both Windows ('UYVY') and QuickTime ('2vuy') for 8-Bit YCbCr video. Here’s the memory layout of two pixels:
This format, identified by the NTV2_FBF_10BIT_YCBCR enumeration constant, has twelve 10-bit unsigned components that are packed into four 32-bit little-endian words (i.e. 6 pixels are represented in each 16 bytes). This is the format used in QuickTime movie files to store 10-bit YCbCr video and is referred to (by Apple and in MS-Windows) as the 'v210' format.
Here are the four 32-bit words (six pixels) in increasing address order:
Here are the same six pixels – the four 32-bit little-endian words – in decreasing address order:
These formats incorporate 8-bit Red, Green, Blue and Alpha (key) components. The device Color Space Converter(s) will perform the proper conversion to/from 10-Bit YCbCr and Key.
8-Bit ARGB, identified by the NTV2_FBF_ARGB enumeration constant, is used extensively on the Windows platform (and on most AJA devices can be routed to an SDI output for ARGB 4:4:4:4 over-the-wire):
8-Bit BGRA, identified by the NTV2_FBF_RGBA enumeration constant, is used extensively on the MacOS platform:
8-Bit ABGR, identified by the NTV2_FBF_ABGR enumeration constant, is used extensively with OpenGL:
This format is identified by the NTV2_FBF_10BIT_RGB enumeration constant. For playout, the AJA device firmware converts the 10-bit RGB video data formatted as in the table below to the expected SMPTE-standard 10-Bit YCbCr output signal. Conversely, for capture/ingest, 10-bit YCbCr input video is converted into this 10-bit RGB pixel format:
The most significant two bits of each 32-bit pixel contain its alpha information:
This format, identified by the NTV2_FBF_10BIT_DPX enumeration constant, is laid out as follows, before byte-swapping:
This is the memory layout after byte-swapping:
This format, identified by the NTV2_FBF_10BIT_YCBCR_DPX enumeration constant, is laid out as follows, before byte-swapping:
After byte-swapping:
This format, identified by the NTV2_FBF_24BIT_RGB enumeration constant, is laid out as follows:
This format, identified by the NTV2_FBF_24BIT_BGR enumeration constant, has this single-pixel layout:
This format, identified by the NTV2_FBF_10BIT_DPX_LE enumeration constant, has this single-pixel layout:
This format, identified by the NTV2_FBF_48BIT_RGB enumeration constant, has this single-pixel layout:
This AJA HDR format, identified by the NTV2_FBF_12BIT_RGB_PACKED enumeration constant, has this layout:
This format, identified by the NTV2_FBF_8BIT_YCBCR_YUY2 enumeration constant, has this single-pixel layout:
This format, identified by the NTV2_FBF_8BIT_DVCPRO enumeration constant, is a popular lossy 4:1:1 compression scheme.
This format, identified by the NTV2_FBF_8BIT_HDV enumeration constant, is a lossy H.262/MPEG-2 (video) and MPEG-1 Layer 2 (audio) compression scheme.
This format, identified by the NTV2_FBF_10BIT_RAW_YCBCR enumeration constant, is used for raw RGB Bayer capture from the AJA CION camera. Bayer pixels are 10-bit resolution stored in big-endian packed format, as required by the ‘DNG’ (TIFF) file specification. The packing cadence is 16 Bayer pixels in 20 bytes.
This format, identified by the NTV2_FBF_8BIT_YCBCR_420PL3 enumeration constant, is a popular planar video encoding format.
For all planes, the left-to-right, top-to-bottom pixel values are laid out in memory in increasing address order.
The luminance plane is a sequence of 8-bit (0-255) luminance values, one byte per pixel. Thus, the size, in bytes, of the luma plane is WxH bytes, where W is the raster width (in pixels), and H is the raster height (in lines). The Luma Plane should terminate on a 32-bit (4-byte) boundary.
The chroma planes immediately follow the luma plane in memory, each being one-fourth the size of the luma plane, the Cb plane preceding the Cr plane. Chroma values are 8-bit (0-255) values, one per 2x2 pixel quad (horizontal and vertical subsampling).
Since all values are byte values, there are no endianness issues.
This format, identified by the NTV2_FBF_8BIT_YCBCR_422PL3 enumeration constant, is a popular planar video encoding format.
For all planes, the left-to-right, top-to-bottom pixel values are laid out in memory in increasing address order.
The luminance plane is a sequence of 8-bit (0-255) luminance values, one byte per pixel. Thus, the size, in bytes, of the luma plane is WxH bytes, where W is the raster width (in pixels), and H is the raster height (in lines). The luma plane should terminate on a 32-bit (4-byte) boundary.
The chroma planes immediately follow the luma plane in memory, each half the size of the luma plane, the Cb plane preceding the Cr plane. Chroma values are 8-bit (0-255) values, one per horizontal pixel pair (horizontal-only subsampling).
Since all values are byte values, there are no endianness issues.
This format, identified by the NTV2_FBF_10BIT_YCBCR_420PL3_LE enumeration constant, is a popular planar video encoding format.
For all planes, the left-to-right, top-to-bottom pixel values are laid out in memory in increasing address order.
The luminance plane is a sequence of 10-bit (0-1023) luminance values, each stored in a 16-bit word per pixel in little-endian byte order, with the most-significant 6 bits set to zero. Thus, the size, in bytes, of the luma plane is WxHx2 bytes, where W is the raster width (in pixels), and H is the raster height (in lines). The Luma Plane should terminate on a 64-bit (8-byte) boundary.
The chroma planes immediately follow the luma plane in memory, each being one-fourth the size of the luma plane, the Cb plane preceding the Cr plane. Chroma values are 10-bit (0-1023) values, stored identically to the luma values, one chroma value per 2x2 pixel quad (horizontal and vertical subsampling).
This format, identified by the NTV2_FBF_10BIT_YCBCR_422PL3_LE enumeration constant, is a popular planar video encoding format.
For all planes, the left-to-right, top-to-bottom pixel values are laid out in memory in increasing address order.
The luminance plane is a sequence of 10-bit (0-1023) luminance values, each stored in a 16-bit word per pixel in little-endian byte order, with the most-significant 6 bits set to zero. Thus, the size, in bytes, of the luma plane is WxHx2 bytes, where W is the raster width (in pixels), and H is the raster height (in lines). The luma plane should terminate on a 64-bit (8-byte) boundary.
The chroma planes immediately follow the luma plane in memory, each half the size of the luma plane, the Cb plane preceding the Cr plane. Chroma values are 10-bit (0-1023) values, stored identically to the luma values, one chroma value per horizontal pixel pair (horizontal-only subsampling).
This format, identified by the NTV2_FBF_10BIT_YCBCR_420PL2 enumeration constant, is a two-plane format commonly used with video encoders.
The raster pixel left-to-right, top-to-bottom scan order coincides with increasing address order.
The Luma Plane is a sequence of 10-bit (0-1023) luminance values stored in a packed succession of 8-bit chunks, requiring 5 bytes for every 4 pixels.
Thus, the size, in bytes, of the Luma Plane is W × H × 5 ÷ 4 bytes, where W is the raster width (in pixels), and H is the raster height (in lines). For example, for HD 1920×1080, each line requires exactly 2,400 bytes; 2,592,000 bytes for the entire image.
The Luma Plane should terminate on a 64-bit (8-byte) boundary — i.e., the W × H product should be divisible by 8.
The Chroma Plane immediately follows the Luma Plane in memory, half the size of the Luma plane.
Chroma values are 10-bit (0-1023) values, one Cb/Cr pair per 2×2 pixel quad (horizontal and vertical subsampling).
This format, identified by the NTV2_FBF_10BIT_YCBCR_422PL2 enumeration constant, is a two-plane format commonly used with video encoders. It’s identical to its 4:2:0 sibling (above) except the Chroma plane is full-height, not half-height:
Thus, this format only has horizontal chroma sub-sampling – there’s no vertical sub-sampling.