ReVQ

Abstract

Quality assessment of videos is crucial for many computer graphics applications, including video games, virtual reality, and augmented reality, where visual performance has a significant impact on user experience. When test videos cannot be perfectly aligned with references or when references are unavailable, the significance of no-reference video quality assessment (NR-VQA) methods is undeniable. However, existing NR-VQA datasets and metrics are primarily focused on camera-captured videos; applying them directly to rendered videos would result in biased predictions, as rendered videos are more prone to temporal artifacts.

To address this, we present a large rendering-oriented video dataset with subjective quality annotations, as well as a designed NR-VQA metric specific to rendered videos. The proposed dataset includes a wide range of 3D scenes and rendering settings, with quality scores annotated for various display types to better reflect real-world application scenarios. Building on this dataset, we calibrated our NR-VQA metric to assess rendered video quality by looking at both image quality and temporal stability. We compare our metric to existing NR-VQA metrics, demonstrating its superior performance on rendered videos. Finally, examples illustrate how our metric can be used to benchmark supersampling methods and evaluate frame generation strategies for real-time rendering.

Dataset

The proposed dataset,

ReVQ-2K

Features a diverse range of 3D scenes.
Applies a variety of rendering configurations, supplemented by various supersampling and post-processing techniques.
Includes perceptual quality scores annotated on both smartphone screens and desktop displays to replicate real-world scenarios.

3D Scenes

Examples of 3D scenes from the ReVQ-2k dataset

Showcases

Urban

Interior

Landscapes

Cartoon

Weather

Rendering Configuration

The scalability reference of Unreal Engine was adopted for our rendering settings.

Resolution Scale: 0

Resolution Scale: 1

Resolution Scale: 2

                  
                        1
                        r.ScreenPercentage=50

                  
                        1
                        r.ScreenPercentage=75

                  
                        1
                        r.ScreenPercentage=100

View Distance: 0

View Distance: 1

View Distance: 2

                  
                        1
                        r.ViewDistanceScale=0.4

                  
                        1
                        r.ViewDistanceScale=0.6

                  
                        1
                        r.ViewDistanceScale=1.0

Anti-Aliasing: 0

Anti-Aliasing: 1

Anti-Aliasing: 2

                  
                        1
                        r.PostProcessAAQuality=0

                  
                        1
                        r.PostProcessAAQuality=2

                  
                        1
                        r.PostProcessAAQuality=4

Post Process: 0

Post Process: 1

Post Process: 2

                  
                    1r.MotionBlurQuality=0
2r.BlurGBuffer=0
3r.AmbientOcclusionLevels=0
4r.AmbientOcclusionRadiusScale=1.7
5r.DepthOfFieldQuality=0
6r.RenderTargetPoolMin=300
7r.LensFlareQuality=0
8r.SceneColorFringeQuality=0
9r.EyeAdaptationQuality=0
10r.BloomQuality=4
11r.FastBlurThreshold=0
12r.Upscale.Quality=1
13r.Tonemapper.GrainQuantization=0

                  
                

                  
                    1r.MotionBlurQuality=3
2r.BlurGBuffer=0
3r.AmbientOcclusionLevels=1
4r.AmbientOcclusionRadiusScale=1.7
5r.DepthOfFieldQuality=1
6r.RenderTargetPoolMin=350
7r.LensFlareQuality=0
8r.SceneColorFringeQuality=0
9r.EyeAdaptationQuality=0
10r.BloomQuality=4
11r.FastBlurThreshold=2
12r.Upscale.Quality=2
13r.Tonemapper.GrainQuantization=0

                  
                

                  
                    1r.MotionBlurQuality=4
2r.BlurGBuffer=1
3r.AmbientOcclusionLevels=3
4r.AmbientOcclusionRadiusScale=1
5r.DepthOfFieldQuality=2
6r.RenderTargetPoolMin=400
7r.LensFlareQuality=2
8r.SceneColorFringeQuality=1
9r.EyeAdaptationQuality=2
10r.BloomQuality=5
11r.FastBlurThreshold=7
12r.Upscale.Quality=3
13r.Tonemapper.GrainQuantization=1

                  
                

Texture Quality: 0

Texture Quality: 1

Texture Quality: 2

                  
                    1r.Streaming.MipBias=2.5
2r.MaxAnisotropy=0
3r.Streaming.PoolSize=200

                  
                    1r.Streaming.MipBias=1
2r.MaxAnisotropy=2
3r.Streaming.PoolSize=400

                  
                    1r.Streaming.MipBias=0
2r.MaxAnisotropy=8
3r.Streaming.PoolSize=1000

Effects Quality: 0

Effects Quality: 1

Effects Quality: 2

                  
                    1r.TranslucencyLightingVolumeDim=24
2r.RefractionQuality=0
3r.SSR=0
4r.SceneColorFormat=3
5r.DetailMode=0
6r.TranslucencyVolumeBlur=0
7r.MaterialQualityLevel=0

                  
                

                  
                    1r.TranslucencyLightingVolumeDim=32
2r.RefractionQuality=0
3r.SSR=0
4r.SceneColorFormat=3
5r.DetailMode=1
6r.TranslucencyVolumeBlur=0
7r.MaterialQualityLevel=1

                  
                

                  
                    1r.TranslucencyLightingVolumeDim=64
2r.RefractionQuality=2
3r.SSR=1
4r.SceneColorFormat=4
5r.DetailMode=2
6r.TranslucencyVolumeBlur=1
7r.MaterialQualityLevel=1

                  
                

FSR2

DLSS

TAAU

1	r.FidelityFX.FSR2.QualityMode=2

1	r.NGX.DLSS.Quality=-1

1	r.TemporalAA.Upsampling=1

Labeling Software

Desktop Platform

Smartphone Platform

Quality Scores

OA-MOS: The overall mean opinion score (OA-MOS) reflects the overall quality of videos by considering factors such as colorfulness, block artifacts, visual blurring, and temporal disruptions like flickering.

TS-MOS: The temporal stability mean opinion score (TS-MOS) reflects the temporal stability of videos, focusing on issues like flickering, moving jaggies, and other temporal artifacts that critically affect rendered video quality.

720p

1080p

2k

All

Metrics

Building on the ReVQ-2k dataset, we develop a new two-stream NR-VQA metric to predict rendered video quality. The architecture of our proposed model, which is divided into two main components: image qualityassessment and temporal stability analysis.

Image Quality Assessment: In this stream, we evaluate the overall image quality of videos. Given the extensive analysis in existing literature, we adopt the cropping strategy and Swin transformer (Swin-T) model employed in FAST-VQA. This stream considers factors such as clarity and appropriate exposure, in alignment with existing NR-VQA methods, and evaluates static rendering artifacts like Moir´e patterns.

Temporal Stability Analysis: In this stream, we crop a series of images from consecutive video frames, align them via motion estimation, and assess their temporal stability using image differencing. The results from both streams are then combined through a multilayer perceptron (MLP) to regress the overall video quality,

Overview of our NR-VQA model

Image Differencing

With generated motion vectors and disocclusion maps, each subset \( K_i \) is subjected to backward warping and disoccluded pixel removal, resulting in an aligned subset. We calculate image differencing for adjacent frames and for frames separated by one, two, and three intervals, enabling detection of flickering across various frequencies. The image differences are then fed into a depth-wise separable convolutions-based image difference detector to evaluate pixel-level temporal stability. Finally, the stability maps from all subsets are subjected to average pooling and a MLP to regress the temporal stability score.