This project presents a lightweight and efficient approach to MP4 video playback compatible with modern graphics APIs, including DirectX 11, DirectX 12, and Vulkan. Using the Media Foundation API, video frames are decoded into the IYUV format for optimal performance. The YUV data is stored in a single-channel dynamic texture, structured to separate luminance (Y) and chrominance (UV) information. A custom pixel shader performs real-time YUV-to-RGB conversion directly on the GPU, ensuring minimal overhead and seamless integration with diverse rendering pipelines. This method avoids the performance penalties of RGBA conversions and relies only on a simple single-channel texture and shader logic, making it adaptable to any GPU-based rendering API. The result is a robust and highly portable solution for real-time MP4 video playback in graphics-intensive applications.
Decoding video files in MP4 format to RGBA space using the Media Foundation API introduces a significant performance overhead. However, decoding directly into the IYUV format offers excellent performance. This project addresses the final step by reorganizing the UV data layout and adding padding to simplify GPU processing. The YUV data is then uploaded as a single 1-byte-per-texel texture, which minimizes memory usage and ensures efficient data transfer. A simple pixel shader on the GPU samples the YUV information for each pixel and performs the YUV-to-RGB conversion in real time. This approach seamlessly integrates decoded video frames into the rendering pipeline as standard textures, providing an easy-to-implement, high-performance solution compatible with modern graphics APIs.
As a reference, a raw frame with size 8x4 returned by the MF library in IYUV has the following layout: ([https://docs.microsoft.com/en-us/windows/win32/medfound/recommended-8-bit-yuv-formats-for-video-rendering])
Luminance: 8x4 followed by U4x4, followed by V4x4
Y00 Y01 Y02 Y03 Y04 Y05 Y06 Y07 Y00 Y01 Y02 Y03 Y04 Y05 Y06 Y07
Y10 Y11 Y12 Y13 Y14 Y15 Y16 Y17 Y10 Y11 Y12 Y13 Y14 Y15 Y16 Y17
Y20 Y21 Y22 Y23 Y24 Y25 Y26 Y27 Y20 Y21 Y22 Y23 Y24 Y25 Y26 Y27
Y30 Y31 Y32 Y33 Y34 Y35 Y36 Y37 => Y30 Y31 Y32 Y33 Y34 Y35 Y36 Y37
U00 U02 U04 U06 U00 U02 U04 U06 U20 U22 U24 U26
U20 U22 U24 U26 V00 V02 V04 V06 V20 V22 V24 V26
V00 V02 V04 V06
V20 V22 V24 V26
The decoded YUV data is copied directly from the buffer provided by the MF library into a GPU buffer. The luminance (Y) channel is copied without modification, while padding is added to the chrominance (UV) channels to simplify texture access. While this padding slightly increases texture memory usage and restricts wrap mapping, it provides a more GPU-friendly layout:
New Layout: 8x4 followed by U4,Pad, U4,Pad, V4,Pad, V4,Pad
Y00 Y01 Y02 Y03 Y04 Y05 Y06 Y07
Y10 Y11 Y12 Y13 Y14 Y15 Y16 Y17
Y20 Y21 Y22 Y23 Y24 Y25 Y26 Y27
Y30 Y31 Y32 Y33 Y34 Y35 Y36 Y37
U00 U02 U04 U06 0 0 0 0
U20 U22 U24 U26 0 0 0 0
V00 V02 V04 V06 0 0 0 0
V20 V22 V24 V26 0 0 0 0
This layout is highly convenient for pixel shader processing, as it aligns neatly with the normalized texture coordinates of each pixel. The grid structure also allows for efficient bilinear interpolation if required. However, minor artifacts may appear along the top and bottom lines due to potential mixing of U and V data when sampling near these regions.
The CPU cost of decoding a 1080p MP4 video and transferring it to the GPU is reduced to just 0.5ms per frame. The sample implementation demonstrates this solution using the DirectX 11 API, but the approach is fully adaptable to any modern graphics pipeline, including DirectX 12, Vulkan, or OpenGL. The method requires only a single-channel dynamic texture, making it lightweight, efficient, and highly portable.
There are no models linked
There are no models linked