Metrics
Building on the ReVQ-2k dataset, we develop a new two-stream NR-VQA metric to predict rendered video quality. The architecture of our proposed model, which is divided into two main components: image qualityassessment and temporal stability analysis.
Image Quality Assessment: In this stream, we evaluate the overall image quality of videos. Given the extensive analysis in existing literature, we adopt the cropping strategy and Swin transformer (Swin-T) model employed in FAST-VQA. This stream considers factors such as clarity and appropriate exposure, in alignment with existing NR-VQA methods, and evaluates static rendering artifacts like Moir´e patterns.
Temporal Stability Analysis: In this stream, we crop a series of images from consecutive video frames, align them via motion estimation, and assess their temporal stability using image differencing. The results from both streams are then combined through a multilayer perceptron (MLP) to regress the overall video quality,
Overview of our NR-VQA model
Image Differencing
With generated motion vectors and disocclusion maps, each subset \( K_i \) is subjected to backward warping and disoccluded pixel removal, resulting in an aligned subset. We calculate image differencing for adjacent frames and for frames separated by one, two, and three intervals, enabling detection of flickering across various frequencies. The image differences are then fed into a depth-wise separable convolutions-based image difference detector to evaluate pixel-level temporal stability. Finally, the stability maps from all subsets are subjected to average pooling and a MLP to regress the temporal stability score.