Real-Time Novel-View Synthesis for the Web
Using 3D Gaussian Splatting
Exploring Mesh-Supervised 3D Gaussian Scene Optimization
and Efficient Web Rendering for Product Visualization
Master’s Thesis in Computer Science and Engineering
BENJAMIN SANNHOLM
Department of Computer Science and Engineering
CHALMERS UNIVERSITY OF TECHNOLOGY
UNIVERSITY OF GOTHENBURG
Gothenburg, Sweden 2024

Master’s thesis 2024
Real-Time Novel-View Synthesis for the Web
Using 3D Gaussian Splatting
Exploring Mesh-Supervised 3D Gaussian Scene Optimization and
Efficient Web Rendering for Product Visualization
BENJAMIN SANNHOLM
Department of Computer Science and Engineering
Chalmers University of Technology
University of Gothenburg
Gothenburg, Sweden 2024
Real-Time Novel-View Synthesis for the Web Using 3D Gaussian Splatting
Exploring Mesh-Supervised 3D Gaussian Scene Optimization and Efficient Web
Rendering for Product Visualization
BENJAMIN SANNHOLM
© BENJAMIN SANNHOLM, 2024.
Supervisor: Erik Sintorn, Department of Computer Science and Engineering
Advisor: Pontus Holmertz Liljekvist, Rapid Images
Examiner: Ulf Assarsson, Department of Computer Science and Engineering
Master’s Thesis 2024
Department of Computer Science and Engineering
Chalmers University of Technology and University of Gothenburg
SE-412 96 Gothenburg
Telephone +46 31 772 1000
Typeset in LATEX
Gothenburg, Sweden 2024
iv
Real-Time Novel-View Synthesis for the Web Using 3D Gaussian Splatting
Exploring Mesh-Supervised 3D Gaussian Scene Optimization and Efficient Web
Rendering for Product Visualization
BENJAMIN SANNHOLM
Department of Computer Science and Engineering
Chalmers University of Technology and University of Gothenburg
Abstract
This thesis explores real-time novel-view synthesis for web applications using 3D
Gaussian Splatting (3DGS), with a focus on enhancing product visualization. The
study investigates two primary research questions: The impact of utilizing classical
scene representations (i.e., polygonal meshes) on the optimization process and results
of 3D Gaussian Splatting, and the efficient rendering of 3D Gaussians within web
constraints.
Firstly, a method for initializing a 3D Gaussian scene from existing scene geometry
is proposed. Evaluation across various synthetic scenes suggests that while there is
noticeable quality improvement in some cases, the average improvement is marginal.
Secondly, multiple WebGPU-based rendering methods for 3D Gaussian scenes are
implemented and evaluated. Results indicate that using the original 3DGS archi-
tecture on the web is viable, with a geometry-based rendering method significantly
outperforming the original renderer in terms of frame-time speed-up. An optimization
technique to tighten 3D Gaussian screen-space bounding boxes further enhances
performance.
Overall, the findings demonstrate that 3D Gaussian Splatting can be effectively
applied to real-time web-based novel-view synthesis, offering a potential avenue for
interactive and high-quality product visualization.
Keywords: 3D Gaussian Splatting, novel-view synthesis, web applications, real-time
rendering, mesh-supervised optimization, product visualization, computer graphics.
v

Acknowledgments
Thank you to Rapid Images for allowing me to be in an inspiring and motivating
workplace with such friendly and helpful colleagues, for letting me freely explore my
topic of interest, and for providing the technical resources that enabled my thesis
work. Furthermore, I would like to express my gratitude to my academic supervisor
Erik Sintorn and my company advisor Pontus Holmertz Liljekvist for your guidance,
our insightful technical discussions, and your constructive feedback. Last but far
from least, thank you to my mother, father, sister, and friends for your unwavering
support, thoughtful encouragement, and feedback throughout my master’s program
and this concluding academic milestone.
Benjamin Sannholm, Gothenburg, 2024-07-01
vii

Contents
List of Figures xi
List of Tables xiii
1 Introduction 1
2 Theory 5
2.1 Novel-View Synthesis . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
2.1.1 Neural Radiance Fields . . . . . . . . . . . . . . . . . . . . . . 5
2.2 3D Gaussian Splatting (3DGS) . . . . . . . . . . . . . . . . . . . . . 5
2.2.1 Scene Representation . . . . . . . . . . . . . . . . . . . . . . . 6
2.2.2 Optimization . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
2.2.3 Differentiable CUDA-Driven Tile-Based Renderer . . . . . . . 9
3 Method 15
3.1 Mesh-Supervised 3D Gaussian Optimization . . . . . . . . . . . . . . 15
3.2 Web-Based Renderer . . . . . . . . . . . . . . . . . . . . . . . . . . . 16
3.2.1 Original 3DGS Architecture in WebGPU . . . . . . . . . . . . 16
3.2.2 Geometry-Based Renderer . . . . . . . . . . . . . . . . . . . . 18
3.2.3 Optimization . . . . . . . . . . . . . . . . . . . . . . . . . . . 20
4 Results 23
4.1 Mesh-Supervised 3D Gaussian Optimization . . . . . . . . . . . . . . 23
4.1.1 Evaluation Methodology . . . . . . . . . . . . . . . . . . . . . 23
4.1.2 Reconstruction Quality . . . . . . . . . . . . . . . . . . . . . . 23
4.2 Web-Based Renderer . . . . . . . . . . . . . . . . . . . . . . . . . . . 25
4.2.1 Evaluation Methodology . . . . . . . . . . . . . . . . . . . . . 25
4.2.2 Image Quality . . . . . . . . . . . . . . . . . . . . . . . . . . . 27
4.2.3 Run-Time Performance . . . . . . . . . . . . . . . . . . . . . . 29
5 Discussion 31
5.1 Mesh-Supervised 3D Gaussian Optimization . . . . . . . . . . . . . . 31
5.1.1 Limitations . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32
5.1.2 Future Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32
5.2 Web-Based Renderer . . . . . . . . . . . . . . . . . . . . . . . . . . . 33
5.2.1 Limitations . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34
ix
Contents
5.2.2 Future Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34
5.3 Risks and Ethics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35
Bibliography 37
A Additional Evaluation Results I
A.1 Web-Based Renderer . . . . . . . . . . . . . . . . . . . . . . . . . . . I
A.1.1 Run-Time Performance for Medium Views . . . . . . . . . . . I
A.1.2 High-Resolution Output Images . . . . . . . . . . . . . . . . . I
x
List of Figures
2.1 An illustration exemplifying the characteristics of a 3D Gaussian. . . 6
2.2 An overview of the CUDA kernels executed by 3DGS’ differentiable
CUDA-driven tile-based renderer. . . . . . . . . . . . . . . . . . . . . 10
3.1 An overview of the logical steps performed by the geometry renderer. 18
3.2 A comparison of the square axis-aligned bounding box used by the
3DGS renderer and our tight bounding box used by the 3DGS-web-
opt and geometry-opt renderers. . . . . . . . . . . . . . . . . . . . 21
4.1 An example of two scenes where our method exhibits the greatest
reconstruction quality improvement compared to 3DGS. . . . . . . . . 25
4.2 An example of test cases with the three proximity levels used for each
camera angle during evaluation. . . . . . . . . . . . . . . . . . . . . . 25
4.3 The worst performing test-case, with regard to similarity, for the
3DGS-web renderer in comparison to the 3DGS renderer. . . . . . 27
4.4 An example illustrating how images of scenes with sub-pixel Gaussians
exhibit a noticeable difference when using the geometry-opt renderer,
in comparison with the 3DGS renderer. . . . . . . . . . . . . . . . . 28
xi
List of Figures
xii
List of Tables
4.1 Reconstruction quality of our mesh-supervised 3D Gaussian optimiza-
tion method compared to the original 3DGS optimization method. . . 24
4.2 Image similarity of rendered images from the web-based renderers
compared to the 3DGS renderer averaged over all test cases. . . . . . 27
4.3 Run-time performance of the web-based renderers compared to the
3DGS renderer. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29
A.1 Run-time performance of the web-based renderers compared to the
3DGS renderer for medium views of each scene. . . . . . . . . . . . . I
A.2 Image similarity of rendered images from the web-based renderers
compared to the 3DGS renderer averaged over all test cases, but
using output image dimensions of 2000× 2000 pixels. . . . . . . . . . I
A.3 Run-time performance of the web-based renderers compared to the
3DGS renderer, but using output image dimensions of 2000× 2000 pixels. II
xiii
List of Tables
xiv
1
Introduction
The use of computer graphics to render and display products to consumers in a
web-based environment is presently ubiquitous. However, due to requirements of high
quality and therefore use of complex geometry combined with intricate materials,
the rendering process is often prohibitively computationally expensive to perform
at interactive, let alone real-time, rates. In cases where multiple views of the same
product need to be shown, a small set of images from a few fixed angles are typically
rendered and can be cycled through by the user. For a more dynamic experience, an
image sequence in which the camera moves around an object could be rendered and
even interactively scrubbed through using user controls. However, both approaches
produce a fixed set of observable views. They cannot allow the user to interactively
observe a product from any desired viewpoint among the infinite set of possible
views or to smoothly transition between viewpoints. Ideally, it would be possible to
observe the scene from any viewpoint in real time without compromising quality.
Novel-view synthesis is the problem of generating previously unobserved views of an
existing scene given a limited set of images depicting the scene. With the advent
of neural radiance fields (NeRF) [1], it was shown that neural networks (in this
case, used to represent a scene in the form of a radiance field) are an effective tool
for achieving novel-view synthesis. Since NeRF, it has been shown more generally
that neural fields [2] and other rendering methods that use machine learning for
scene optimization, are effective for novel-view synthesis, either as a direct scene
representation or as a way of deriving a traditional scene representation from a set
of images.
Early methods using machine learning for scene optimization were fairly restrictive,
slow to train, and far from performing in real time. However, much research has
been conducted to bring the methods closer to being useful in a broader context.
For example, improving performance, enabling scene editing, relighting, composition,
dynamic scenes, large-scale scenes, and generalizing training to multiple scenes.
Successful usage in a real-world application is demonstrated by Google’s Immersive
View [3], where they synthesize fly-through videos of indoor environments using
neural fields. Additionally, recent works have shown methods for training and
rendering to achieve novel-view synthesis in real time, such as in Nvidia’s “Instant
Neural Graphics Primitives” [4] and 3D Gaussian Splatting [5].
Seeing this successful innovation with methods using machine learning for scene
optimization, we choose to use the methods presented in “3D Gaussian Splatting for
1
1. Introduction
Real-Time Radiance Field Rendering” (3DGS) [5] to perform novel-view synthesis
in the context of a web-based environment to allow a more interactive product
experience for the user without significant loss of visual quality compared to original
still image renders. 3D Gaussian Splatting is one of the few novel-view synthesis
methods that maintain state-of-the-art quality while producing frames at real-time
rates [6]. Their use of an explicit-volume scene representation in the form of 3D
Gaussians allows efficient GPU-accelerated rasterization, unlike previous methods
based on radiance fields, which use an implicit-volume scene representation, requiring
expensive integration along camera rays.
We address two aspects of 3DGS in parallel. Firstly, in a typical novel-view synthesis
scenario, nothing but a set of 2D images depicting the scene is assumed to be known.
However, since in our problem domain the set of input images that novel-view
synthesis will be performed on originates from a virtual 3D scene, we recognize that
additional information from the original scene representation could be used to improve
the result. Secondly, the renderer presented in the 3DGS paper is implemented using
Nvidia’s CUDA platform, meaning it is not immediately runnable in a portable way,
let alone runnable on the web platform. Successful attempts to render 3D Gaussians
on the web have been made [7]–[14]. However, we have not found a clear comparison
of different methods targeted specifically at the web platform. Accordingly, we
explore the following research questions:
1. Given a classical scene representation (i.e., polygonal mesh), what effect does us-
ing the existing scene geometry to inform initialization have on the optimization
process and results of 3D Gaussian Splatting?
2. How can 3D Gaussians be rendered efficiently within the constraints of the
web?
We propose a method for initializing a 3D Gaussian scene from existing scene
geometry. The method is evaluated across a variety of small-scale synthetic scenes.
Our results suggest that the method provides noticeable quality improvement for
some scenes in some cases, however, only a marginal improvement is seen on average.
Furthermore, we implement multiple WebGPU-based methods for rendering 3D
Gaussian scenes. Our evaluation across multiple small-scale synthetic scenes using
a variety of camera angles shows that using the architecture of the original 3DGS
renderer is viable on the web. Moreover, our geometry-based rendering method
mostly outperforms the original 3DGS renderer significantly, with a frame-time
“speed-up” of ∼0.5×, ∼1.5×, and ∼5.1× in the worst, average, and best case,
respectively. Finally, we recognize that the size of the 3D Gaussian screen-space
bounding boxes used by the original 3DGS renderer is overly large, causing an
unnecessarily large workload. We introduce an augmented method providing tighter
screen-space bounding boxes, achieving an average frame-time speed-up of ∼1.2–2.0×
and ∼1.2–2.1× for our WebGPU adaption of the 3DGS renderer and our geometry-
based renderer, respectively. With our optimization applied, compared to the original
3DGS renderer, our 3DGS-based and geometry-based WebGPU renderers achieve an
average frame-time speed-up of ∼1.0–1.3× and ∼1.0–6.3×, respectively.
2
1. Introduction
In summary, our main contributions are
• a method for initializing a 3D Gaussian scene from existing scene geometry
along with an evaluation of the method,
• an implementation and comparison of the 3DGS renderer’s original architecture
and our geometry-based rendering architecture on the web using WebGPU,
• a performance optimization by making 3D Gaussian screen-space bounding
boxes tighter.
3
1. Introduction
4
2
Theory
The following chapter gives an overview of the topics fundamental for understanding
our method and our discussion. Section 2.1 introduces the problem of novel-view
synthesis, and Section 2.2 covers how the paper “3D Gaussian Splatting for Real-Time
Radiance Field Rendering” [5] approaches solving this problem.
2.1 Novel-View Synthesis
The problem of novel-view synthesis can be defined as follows: Given only a set of
2D raster images of the same scene taken from arbitrary viewpoints, how can novel
views consistent with the previously observed ones be synthesized?
2.1.1 Neural Radiance Fields
The paper “Representing Scenes as Neural Radiance Fields for View Synthesis”
(NeRF) [1] popularized a method for novel-view synthesis from which many state-of-
the-art works are derived. This method uses a neural network to encode a continuous
volumetric scene representation. To create an image rays are marched at fixed
intervals for each pixel of the output image. The neural network is queried at each
sample along the ray to determine the density and outgoing radiance in the camera’s
direction. These samples are accumulated to determine the final incoming radiance
to the camera. To train the neural network, gradient descent is used with a loss
function that quantifies the reconstruction error between the input reference images
and images rendered of the NeRF scene.
2.2 3D Gaussian Splatting (3DGS)
Following the ideas and success of NeRF [1] and its many derivatives, the paper “3D
Gaussian Splatting for Real-Time Radiance Field Rendering” [5] (3DGS) proposes a
more direct and simplified method to achieve novel-view synthesis. The key pieces of
the 3DGS method are: A novel explicit continuous volumetric scene representation
based on 3D Gaussians, an optimization method for fitting 3D Gaussian properties to
match input views, and a differentiable CUDA-driven tile-based renderer for efficient
rasterization of 3D Gaussians. The following sections describe these three pieces in
more detail.
5
2. Theory
2.2.1 Scene Representation
As opposed to previous approaches to novel-view synthesis or scene reconstruction,
such as NeRF [1] or photogrammetry, which typically use an explicit surface (e.g.,
triangular meshes), an implicit surface (e.g., signed distance fields), or an implicit
volume (e.g., neural fields) for scene representation, 3DGS instead uses an explicit
volume representation to describe a scene [5]. While the smallest primitive for a
triangular mesh scene representation is a triangle, the primitive of a 3DGS scene is a
3D Gaussian.
Cut-Off Boundary
µ
y
z
x
Figure 2.1: An illustration exemplifying the characteristics of a 3D Gaussian.
In this context, as illustrated in Figure 2.1, a 3D Gaussian can be thought of as an
ellipsoid with an arbitrary position, potentially non-uniform scale, and rotation in
3D space [5]. Additionally, the ellipsoid is not solid; rather, it can be seen as being
filled with gas where the center is the most dense, and the density decreases toward
the ellipsoid’s surface. More specifically, the fall-off is proportional to a Gaussian
function.
A Gaussian function is continuous for its whole domain [5]. Therefore, a scene
constructed using 3D Gaussians is a continuous and differentiable function describing
volume density. This differentiability makes 3D Gaussians a good candidate for
use with gradient descent techniques to optimize the scene toward some optimal
configuration, which is further discussed in Section 2.2.2.
However, it should be noted that a Gaussian function only reaches zero at the limit of
positive and negative infinity [5]. In practice, this unlimited extent poses a problem
for drawing 3D Gaussians since in theory all Gaussians contribute a non-zero value to
every point in space and, therefore, every pixel. This problem is solved by bounding
the Gaussian function at three standard deviations from its center, seen in Figure 2.1
as the “Cut-Off Boundary”. Using this bound, less than 1% of the Gaussian’s
contribution is lost [15], and the 3D Gaussians can be treated as ellipsoids.
Additionally, to facilitate modeling view-dependent appearance of objects in the scene,
the color of a 3D Gaussian for a given viewing direction is described by spherical
6
2. Theory
harmonic functions of degree 0 to 3 [5]. This description requires 48 real-valued
spherical harmonic coefficients to be stored per 3D Gaussian.
To render an image from a scene constructed out of 3D Gaussians as volume elements,
the density function for the scene needs to be integrated along each camera ray
to determine the total light transmission through the volume [5]. Since each 3D
Gaussian is artificially bounded, assuming no 3D Gaussians overlap, each Gaussian
can be integrated separately in order along the ray. However, for efficiency, 3DGS
instead approximates the transmission through a single 3D Gaussian as a screen-space
2D Gaussian. The practical details of this approximation are seen in Section 2.2.3.
Formally the configuration of a 3D Gaussian, centered at µ with rotation q, scale s,
spherical harmonic coefficients SH, and opacity at the center α, can be defined as
g = (µ, s,q, SH, α), (2.1)
where µ ∈ R3, s ∈ R3, q is a unit quaternion, SH ∈ R48, and α ∈ [0, 1]. The scale and
rotation of a 3D Gaussian can also be jointly described by a matrix Σ = RSSTRT3D
where S and R are the corresponding scale and rotation matrices derived from s and
q, respectively [5].
Furthermore, the density D : Rn → [0, 1] at point x ∈ Rn for a Gaussian centered at
µ ∈ Rn with transformation matrix Σ ∈ Rn×n and center opacity α is defined as
DΣ(x) = α ·GΣ(x), (2.2)
where G is the Gaussian function for the Gaussian. G : Rn → (0, 1] is defined as
(x) = − 1vTΣ−1G e 2 vΣ , (2.3)
where v = x− µ.
2.2.2 Optimization
The second key piece of the 3DGS method is its optimization process, which is
responsible for configuring a set of 3D Gaussians such that they together resemble
the shape and appearance of the scene observable in a given set of input images V
[5]. The process is performed in two stages: The initialization stage and the iterative
refinement stage.
Initialization The optimization process begins by creating an initial set of Gaus-
sians S0, roughly representing the shape of the scene [5]. The positions of the
Gaussians in this set are initialized using one of two ways:
1. For “Real-World Scenes”, such as found in the Mip-NeRF 360 [16] paper, the
Tanks and Temples dataset [17], and the Deep blending paper [18], COLMAP1
[19] is run with the images of V as input. COLMAP produces a sparse point
cloud with points in places where common features are found in the views of
the input set. A single Gaussian is placed at every point in the point cloud.
1A widely used Structure-from-Motion library for estimating camera parameters and producing
a 3D point cloud from a set of 2D images.
7
2. Theory
2. For “Synthetic Bounded Scenes”, such as in NeRF’s Realistic Synthetic 360°
dataset [1], N positions are uniformly sampled in a cuboid with fixed dimensions
covering the contents of all scenes in the dataset. A single Gaussian is placed
at each sampled position.
No rotation is initially applied, and each Gaussian is uniformly scaled proportional
to the mean distance to its three closest neighbors [5]. Each Gaussian’s first three
spherical harmonic coefficients are randomly selected uniformly within valid ranges,
and the rest are set to zero, making each Gaussian initially have the same color no
matter which side it is observed from. Meanwhile, opacity is set to 0.1.
Finally, for later parts of the optimization process the camera parameters (e.g.,
position, orientation, and field-of-view) for each input view need to be known [5].
These parameters are determined in two different ways for the previously mentioned
types of datasets. For the “Real-World Scenes” where the images typically come
from real-world pictures, the camera parameters are usually not known ahead of
time. However, as part of the COLMAP process, the camera parameters for each
input view are estimated and used as is for this case. Meanwhile, for the Realistic
Synthetic 360° dataset where the images are produced from Blender scenes, the
camera parameters for each view are known ahead of time and can be used directly.
Iterative Refinement The initial set of Gaussians (S0) is a rough approximation
of the scene observable in the input views and will typically not resemble it very
well. To bring the set of Gaussians closer to a configuration that matches the scene
observable in the input views, 3DGS uses an iterative approach based on stochastic
gradient descent.
The iterative refinement process takes the current set of Gaussians Si−1 and produces
an augmented set of Gaussians Si at each step, where i is the number of iterations
performed so far. For each iteration, a view is randomly selected from the set of input
views V . Let Ii be the input image, Di be the dimensions of the input image, and Ci
be the camera parameters for the selected view at iteration i. The set of Gaussians
is then rendered using 3DGS’s differentiable CUDA-driven renderer (further detailed
in Section 2.2.3) with dimensions Di, Gaussians Si−1, and camera parameters Ci as
input, producing an image Ri.
The optimization process aims to minimize the difference between the rendered image
and the ground truth image. To quantify how much the images differ for the current
view, 3DGS uses the following loss function:
L(Ri, Ii) = 0.8 · L1(Ri, Ii) + 0.2 · LD-SSIM(Ri, Ii), (2.4)
where L1 is the widely-known metric mean absolute error (computed per channel of
each pixel), and LD-SSIM(X,Y) = 1− SSIM(X,Y) with SSIM being the commonly
used objective image similarity metric Structural Similarity Index [20].
Furthermore, in typical gradient descent fashion, the loss function is used to determine
how the set of Gaussians should be augmented to approach a configuration where
the ground truth image and the rendered image are as similar as possible. Since, as
8
2. Theory
previously mentioned, the scene representation and the renderer are differentiable, the
resulting colors of each pixel in the rendered image can be differentiated with respect
to any input parameter. More importantly, the loss function L can be differentiated
with respect to any Gaussian parameter. For example, if the partial derivative of
L with respect to µx for some Gaussian is a positive number, this indicates that
the Gaussian’s position’s x-component should be decreased for the loss function to
decrease, and in turn, on average bring the ground truth and the rendered image to
be more similar.
Using the derivatives of the loss function (collectively called the function’s gradient),
the update step producing a new set of Gaussians with augmented parameters can
approximately be described by the following oversimplified relation:
Si = {g − λ∇gL(Ri, Ii) | g ∈ Si−1}, (2.5)
where i > 0, λ is the step size (controlling how much the parameters change each
step), and ∇gL is the gradient of L with respect to the parameters of Gaussian g.
In reality, however, 3DGS uses the Adam optimizer [21] which employs a slightly
more sophisticated approach to gradient descent optimization. Additionally, 3DGS
uses separate step sizes for different Gaussian parameters, and exponential decay
is used for the step size of Gaussian positions. Furthermore, some parameters are
fixed until a configurable number of iterations have passed. Finally, not all Gaussian
parameters are directly optimized as is. Rather, opacity and scale use a sigmoid and
an exponential activation function, respectively, to facilitate easier optimization. For
complete details regarding the optimization schedule, see the 3DGS paper [5].
The 3DGS optimization process additionally adapts to scenes of varying complexity
and ensures not too many Gaussians are created by introducing a densification and
pruning scheme. After a warm-up period, at regular intervals of 100 iterations, a
densification step is performed to create new Gaussians in areas where there are too
few Gaussians to reconstruct the scene well and Gaussians with a low α or that are
overly large in world- or view-space are pruned. The densification step is performed
for Gaussians with large view-space position gradients (indicating a lack of Gaussians
nearby). These Gaussians are either cloned and the new Gaussian is moved in the
direction of the view-space position gradient, or the Gaussian is split and the original
and the new Gaussian are both scaled down and moved to cover the same area as the
original Gaussian occupied. Finally, to ensure that only Gaussians with meaningful
contribution are kept, every 3000th iteration, the α of all Gaussians is lowered to a
small value. The optimization will then increase the α of Gaussians that are needed
and, as previously described, the Gaussians whose α stays low will be pruned.
2.2.3 Differentiable CUDA-Driven Tile-Based Renderer
The third and final key contribution of 3DGS is a method for efficiently and dif-
ferentiably producing an image depicting a set of 3D Gaussians. The 3DGS paper
proposes a tile-based rasterizer that projects 3D Gaussians into screen space and
draws them using differentiable operations. To allow for GPU acceleration and ease
of integration with PyTorch which is used for the optimization process, the rasterizer
9
2. Theory
Pre-Process Allocate Create Sort Identify
3D Gaussians 2D Gaussian 2D Gaussian 2D Gaussian
Accumulate
Tile Ranges
Instances Instances Instances 2D Gaussians
Figure 2.2: An overview of the CUDA kernels executed by 3DGS’ differentiable
CUDA-driven tile-based renderer.
is implemented as PyTorch modules using C++ and CUDA kernels. This method
will hereafter be referred to as the 3DGS renderer.
The inputs to the renderer are a set of 3D Gaussians S and camera parameters
C. Let S = {g0, g1, . . . , gN−1}, where gi is a 3D Gaussian with ID i and N is the
total number of 3D Gaussians. As seen in Figure 2.2, broadly, the rasterization is
performed by projecting each 3D Gaussian into a screen-space 2D Gaussian using
the given camera parameters and calculating view-dependent properties, such as
color. Additionally, the 2D Gaussians are instantiated into tiles in a screen-space
grid. The 2D Gaussian instances are thereafter grouped per tile and sorted by depth
to allow for efficient front-to-back access per tile during drawing. Finally, tile-by-tile
and pixel-by-pixel, the 2D Gaussians assigned to the current pixel’s corresponding
grid tile are accumulated using alpha blending. The following sections detail how
these logical operations are implemented in practice as consecutive executions of
CUDA kernels.
Pre-Process 3D Gaussians Rendering a given set of Gaussians S begins by
executing a kernel with thread groups of dimensions 256× 1× 1 where each thread
handles one 3D Gaussian each. The 3D Gaussian’s position µ is transformed into
camera space and then projected into continuous pixel coordinates µ2D using the
camera’s world-to-camera and camera-to-clip matrices. With the Gaussian’s camera-
space coordinates now known, the Gaussian is culled if its center is behind the
camera’s near plane.
Subsequently, the 3D Gaussian’s transformation matrix Σ3D, jointly describing its
rotation and scale in world space, must be transformed into screen space to describe
the rotation and scale of a corresponding 2D Gaussian in pixel coordinates. To
ensure the resulting transformation is affine, 3DGS combines the already affine world-
to-camera transform W with a local affine approximation of the camera’s projective
transformation, derived from the first two terms of the perspective transformation’s
Taylor series, as proposed by Zwicker et al. [22]. The resulting transformation matrix
for a 2D Gaussian becomes
Σ T T2D = JW Σ3D W J , (2.6)
where J is the Jacobian matrix of the local affine approximation.
With the Gaussian’s screen-space position and transform determined, the renderer
can now calculate the 2D Gaussian’s opacity for any pixel in the image using function
DΣ2D , described in Equation (2.2). However, it would be wasteful to iterate through
all the scene’s Gaussians for every pixel. Therefore, a uniform screen-space grid
with tiles occupying 16× 16 pixels is introduced. To know which tiles the current 2D
10
2. Theory
Gaussian overlaps, the kernel additionally determines the Gaussian’s screen-space
extents. The extents are found by constructing a square axis-aligned bounding box
centered on the Gaussian’s center. The width and height of the box are set to be
double of three standard deviations of Σ2D’s largest eigenvalue, where the largest
eigenvalue corresponds to the scale of the Gaussian’s longest axis, as follows:
[ ] ⌈ ⌉T √
Esquare(Σ2D) = s s s = 2 3 max(λ1, λ2) (2.7)
where λ1 and λ2 are the two eigenvalues of Σ2D.
Finally, using the current 3D Gaussian’s spherical harmonic coefficients, its view-
dependent color is calculated using the direction from the camera to the Gaussian’s
center.
Once this kernel has finished, screen-space position µ2D, camera-space depth d,
inverse of the 2D transformation matrix Σ−12D, number of tiles overlapped o, and
view-dependent color c will have been stored, in global memory, in individual buffers
with one slot for each Gaussian.
Allocate 2D Gaussian Instances For each tile a 2D Gaussian has overlapped,
the renderer will create a 2D Gaussian instance stored in a single contiguous buffer.
To determine how many instance slots should be allocated per 2D Gaussian, the
previously written “number of tiles overlapped” buffer is used. Let this buffer be
Bo = [o0, o1, . . . , oN−1], where oi is the number of tiles overlapped by the Gaussian
with ID i. Furthermore, to create the 2D Gaussian instances it also needs to be
known at what offset in the instances buffer each 2D Gaussian’s instances should
be written. This is determined using an inclusive prefix sum over Bo, resulting in a
buffer Boffset = [offset0, offset1, . . . , offsetN−1], where offset
∑k
k = i=0 oi.
In practice, the prefix sum is performed efficiently in parallel using “Single-pass
Parallel Prefix Scan with Decoupled Look-back” [23] through Nvidia’s CUB library
[24], [25]. Furthermore, the prefix sum is performed in place so no additional memory
is allocated for buffer Boffset.
Create 2D Gaussian Instances With the offsets and total number of instances
now known, two buffers, one for keys and one for values, are dynamically allocated at
run-time with a capacity equal to the total number of instances. After that, the 2D
Gaussian instances are created by executing a kernel with thread groups of dimensions
256× 1× 1 where each thread handles one 2D Gaussian each. If the current 2D
Gaussian overlaps no tiles, as indicated by the offsets buffer, no instances are created
and the 2D Gaussian is effectively culled from all future steps. In any other case, for
every overlapped tile a 64-bit integer sorting key and value are separately written into
the two buffers. The value is merely the ID of the Gaussian. Meanwhile, the key is a
64-bit integer where the 32 most significant bits equal the tile ID of the overlapped
tile and the 32 least significant bits equal the 32-bit floating-point camera-space
depth d of the current 2D Gaussian.
11
2. Theory
Sort 2D Gaussian Instances Next, the 2D Gaussian instance key and value
buffers are sorted in ascending order with respect to the values in the key buffer.
This sorting step is efficiently performed using Nvidia’s parallel radix sort Onesweep
[26] through Nvidia’s CUB library [24], [27]. Since the keys are composed of the
tile ID and depth of each instance, the instances will be arranged such that all
instances belonging to the same tile are placed at consecutive indices, and within
each group, the instances will be ordered from lowest to highest camera-space depth.
Using this scheme the relevant 2D Gaussians for each tile can efficiently be fetched
in front-to-back order.
Identify Tile Ranges Moreover, before the 2D Gaussians in each tile can be
accumulated, it needs to be determined how many instances have been assigned to
each tile and where each range of instances is located in the sorted instance buffers.
To determine the ranges, a kernel with thread groups of dimensions 256× 1× 1
where each thread handles one 2D Gaussian instance each is executed. Let i be the
index of the current instance and ti be the tile ID of the current instance.
Collaboratively, the threads fill a buffer Branges = [(rs0, re0), (rs1, re1), . . . , (rsT−1,
reT−1)], where T is the total number of screen-space tiles, and rsk and rek are the
inclusive start and exclusive end indices in the instances buffers for the tile with ID
k, respectively. The buffer is initially cleared to all zeroes. To fill the buffer, for each
2D Gaussian instance there are four cases to consider. Either the instance is the first
overall, i.e., i = 0, or the instance is the first in a new tile, i.e., i ≥ 1 and ti ≠ ti−1.
In the first case, the start index of the current instance’s tile rst is set to 0. In thei
second case, the start index of the new tile rst and the end index of the tile beforei
ret −1 are both set to i. Thirdly, if the instance is the last overall, i.e., i = N − 1i
where N is the total number of instances, the end index of the current instance’s tile
rst is set to N . Finally, if none of the previous conditions hold, meaning the instancei
is not at the boundary of any tile’s instances range, the current thread writes no
value.
Accumulate 2D Gaussians Finally, to calculate and write the color of each
pixel in the output image, a kernel where each thread group handles one tile and
each thread corresponds to a pixel in that tile is executed. The thread groups have
dimensions 16× 16× 1 to match the size of the grid tiles. Let t be the ID of the
current thread group’s corresponding tile.
Firstly, the 2D Gaussian instances range (rst, ret) for the current tile is fetched. Let
N = ret − rst be the number of 2D Gaussians in the current tile. Thereafter, all
threads in the thread group will alternate between two roles: Fetching 2D Gaussians
and accumulating 2D Gaussians. All threads synchronize using a barrier to wait for
all other threads to finish their current work before switching roles. This process
will be repeated either until all N Gaussians have been accumulated or until all
threads in the thread group report themselves as having finished early. A thread
can finish early if for its corresponding pixel the accumulated transmission of all
Gaussians processed so far goes below a low threshold (0.0001). In this case, since
the 2D Gaussians are processed from front to back, accumulating the color of more
12
2. Theory
Gaussians would make no meaningful difference to the pixel’s color.
In the role of fetching 2D Gaussians, all threads in the thread group will collaboratively
fetch, at most, the 256 frontmost remaining 2D Gaussian instances and the properties
of their corresponding 2D Gaussians by loading one each into thread group shared
memory. If there are fewer than 256 Gaussian instances left, a subset of the group’s
256 threads will stay idle. The properties loaded into shared memory are the
Gaussian’s ID, screen-space position µ2D, inverse of the 2D transformation matrix
Σ−12D, and opacity α. View-dependent color c is loaded upon demand by each
individual thread during accumulation and not ahead of time.
In the role of accumulating 2D Gaussians, each thread in the thread group handles
one pixel each. Let xp be the center of the pixel. Each thread keeps track of its
corresponding pixel’s current color and transmission. If not already marked as having
finished early, the thread accumulates the contribution of the currently loaded chunk
of 2D Gaussians using a standard alpha-blending model such that
C0 = 0, (2.8)
Ci = Ci−1 + Ti−1αici,
T0 = 1, (2.9)
Ti = Ti−1 · (1− αi),
where Ci and Ti are the pixel’s color and transmission, respectively, after the i
frontmost Gaussians have been accumulated. Here α ii = DΣi (xp), Σ2D 2D is the
transformation matrix and ci is the view-dependent color of the ith 2D Gaussian in
the current tile.
Once a thread has finished loading and accumulating Gaussians, it writes the final
color CN to its corresponding pixel’s location in the output image.
13
2. Theory
14
3
Method
The following chapter introduces and details our approach to answering the previously
presented research questions. Sections 3.1 and 3.2 correspond to our methods for
the first and second research questions, respectively.
3.1 Mesh-Supervised 3D Gaussian Optimization
To approach answering the first research question, 3D Gaussian Splatting is aug-
mented with an extension to supervise the optimization process using the original
geometry of a scene. Our method creates an initial set of 3D Gaussians with param-
eters such that the Gaussians approximate the original geometry at the start of the
optimization process. The method is implemented directly on top of the PyTorch and
CUDA implementation [28] provided by the authors of the “3D Gaussian Splatting
for Real-Time Radiance Field Rendering” [5] paper.
As described in Section 2.2.2, 3DGS begins the optimization process with an ini-
tialization stage in which an initial set of N 3D Gaussians S0 is created. Let
S0 = {g0, g1, . . . , gN−1} where gi = (µi, si,qi, SHi, αi) is the configuration of Gaus-
sian i. The position µi, scale si, and orientation qi are augmented. All other parts
of the initialization stage are unmodified. The only additional input to our method
is a triangular mesh whose coordinates are assumed to be in the same coordinate
space as the given cameras.
To initialize the set of 3D Gaussians, for each 3D Gaussian, firstly, a point is uniformly
randomly sampled on the surface of the mesh. Let p be the sampled point. The
sampled point is used as the position of the Gaussian, i.e., µ = p.
Secondly, the Gaussian is oriented such that its major and minor axes are aligned with
the major and minor axes of the face the point was samp[led on. Let v1, v2, and v] 3T
be the positions of the face’s three vertices and let V = v1 − v̄ v2 − v̄ v3 − v̄ ,
where v̄ is the mean of the three vertices. The major axis x and minor axis y in the
plane defined by the three vertices are derived from the eigenvectors of the covariance
matrix of V, in standard Principal Component Analysis (PCA) fashion [29]. To
describe the final orientation of the 3D Gaussian, a local-to-world rotation matrix R
is constructed as follows: [ ]
R = x y n , (3.1)
15
3. Method
where n = x × y is the normal of the face. Furthermore, the rotation matrix is
converted into a quaternion which is used for the 3D Gaussian’s q property.
Finally, the scale of the 3D Gaussian is set such that it roughly covers the same
area as the original face it is reconstructing. To determine how much the Gaussian
should be scaled along its major and minor axes, the corresponding eigenvalues of
the previously mentioned eigenvectors are used. Let λx and λy be the eigenvalues of
the major and the minor axes, respectively. However, it has to be taken into account
that many 3D Gaussians might be randomly placed on the same face. In this case
Gaussians that are not placed near the middle of the face are likely to span outside
the face if scaled to a size similar to the whole face. Therefore, all Gaussians whose
position ended up on the same face are scaled down proportional to the total number
of Gaussians on that face. Let k be the total number of Gaussians placed on the
same face as the current one. The fina[l scale of ]the 3D Gaussian is then
s = λx λy ϵ , (3.2)
k k
where ϵ is a small value larger than 0 to ensure the 3D Gaussian is flat. However,
note that due to the usage of log as the activation function for a Gaussian’s scale, a
value of 0 cannot be used as it would result in the parameter used in the optimization
process to be −∞, which cannot be further optimized.
3.2 Web-Based Renderer
To take steps toward answering the research question of how 3D Gaussians can
efficiently be rendered on the web, two methods were implemented and examined.
The first method, presented in Section 3.2.1, and the second method, presented in
Section 3.2.2, will hereafter be referred to as 3DGS-web and geometry, respec-
tively. Furthermore, variants of the two renderers with additional optimization were
implemented. These are referred to as 3DGS-web-opt and geometry-opt, and
will be presented in Section 3.2.3.
All our methods are implemented in Rust and compile to a WebAssembly module
that, together with a tiny amount of JavaScript for initialization, runs in the browser.
For hardware-accelerated graphics processing, WebGPU, through the wgpu Rust
library, is used. A key benefit of WebGPU, as opposed to WebGL2, is the availability
of compute shaders, which are required by all our methods.
3.2.1 Original 3DGS Architecture in WebGPU
Our first method of rendering 3D Gaussians on the web (the 3DGS-web renderer)
directly takes inspiration from the architecture of the 3DGS renderer, presented
in Section 2.2.3. The idea was to assess whether or not using this architecture in
a web-based context is feasible and where compromises need to be made due to
the limitations of the web platform. This section covers how the 3DGS renderer’s
architecture was adapted to a WebGPU implementation and where the original and
our implementation differ.
16
3. Method
Similarly to the 3DGS renderer, our implementation renders a set of 3D Gaussians
through six phases: Pre-Process 3D Gaussians, Allocate 2D Gaussian Instances,
Create 2D Gaussian Instances, Sort 2D Gaussian Instances, Identify Tile Ranges, and
Accumulate 2D Gaussians. All phases are implemented as one or more executions
of a WebGPU compute shader, with work-group sizes equivalent to the ones used
for the thread groups of the corresponding CUDA kernels. The following sections
highlight notable differences for each phase compared to the 3DGS renderer, if any.
Allocate 2D Gaussian Instances Since Nvidia’s CUB library is made specifically
for the CUDA platform, it could not be reused for our WebGPU implementation.
Therefore, the parallel prefix sum is performed using an implementation [30], by Raph
Levien and Reese Levine, based on the state-of-the-art method “Single-pass Parallel
Prefix Scan with Decoupled Look-back” [23]. This implementation was chosen
because it is one of the few based on the same paper as the CUB implementation.
However, according to Levien [31], due to a lack of inter-workgroup synchronization
primitives and a forward-progress guarantee for WebGPU compute shaders, the
original method cannot be fully implemented in WebGPU and still be guaranteed to
work cross-platform.
Create 2D Gaussian Instances As mentioned in Section 2.2.3, during the
rendering of a frame, the 3DGS renderer dynamically allocates appropriately sized
buffers for 2D Gaussian instances in GPU memory. However, to perform this
allocation the total number of instances has to be copied from the offsets buffer
(produced in the Allocate 2D Gaussian Instances phase) in GPU memory to main
memory. For simplicity and to avoid this copy, we define an upper limit of 10 000 000
instances and allocate a single buffer for these before rendering a frame. Any 2D
Gaussian instance that does not fit in the buffer is never created and, therefore, not
rendered.
With each instance occupying 8 bytes (as seen in the following paragraph), this buffer
takes a constant amount of ∼77MiB in GPU memory. This upper limit was chosen
experimentally such that all test cases used for evaluation (see Section 4.2.1) render
without artifacts. However, it should be noted that the number of instances can
exceed the limit for extreme cases (not included in our evaluation set) such as when
a large amount of 2D Gaussians cover a large portion of the screen, especially if a
high-resolution image is rendered. Nevertheless, the 2D Gaussians are typically too
small in screen space for most scenes and views for this case to occur.
Additionally, in the original 3DGS implementation, the sorting key and value for
each 2D Gaussian instance are 64-bit integers. However, due to a lack of support
for 64-bit keys in the used sorting algorithm implementation (presented in the next
section) and a lack of a 64-bit integer type in WGSL, 32-bit integers were used for
simplicity. In our implementation, the key is therefore split into two 16-bit chunks.
In the 16 most significant bits, the ID of the instance’s tile is stored, just like in the
3DGS renderer. Consequently, due to the limited space of 16 bits, at most 65 536
tiles can be used. Assuming a square image is rendered, the maximum dimensions of
the rendered image is 4096× 4096 pixels.
17
3. Method
In the 16 least significant bits, the 32-bit floating-point camera-space depth d of the
Gaussian is quantized and encoded. To mitigate issues due to the low precision of 16
bits when sorting 2D Gaussians by depth, unlike the 3DGS renderer, we linearly
encode the camera-space depth into the 16-bit integer dencoded as follows:
d− znear
dnormalized = ,
zfar − znear (3.3)
dencoded = ⌊0.5 + 65535 · dnormalized⌋ ,
where znear and zfar are the camera-space distances to the current view’s near and
far plane, respectively.
Sort 2D Gaussian Instances For sorting 2D Gaussian instances, ideally Nvidia’s
state-of-the-art Onesweep algorithm [26] for parallel radix sorting would have been
used to facilitate comparison of the 3DGS renderer and our renderer. Since Nvidia’s
CUB library cannot be used outside the CUDA platform, a WebGPU implementation
of the Onesweep algorithm would have had to be used. Unfortunately, Onesweep
cannot be correctly implemented using WebGPU due to a lack of compute shader
sub-group operations (warp and wavefront in Nvidia and AMD parlance, respectively)
[32].
Instead, we used a hybrid parallel radix sort implementation [33] by Raph Levien.
This implementation is primarily based on AMD’s FidelityFX radix sort [34] and
additionally mixes in a technique called warp-level multi-split from Onesweep [26],
[32], [35]. However, due to the lack of sub-group operations in WebGPU, the warp-
level multi-split cannot be as efficiently performed as in Onesweep. Instead, it is
implemented using work-group shared memory [32].
Accumulate 2D Gaussians As previously mentioned, while accumulating 2D
Gaussian instances for a tile, each thread in the 3DGS renderer continues as
long as all threads have not voted to finish early. This collaborative cross-thread
vote on whether to stop is implemented in the original renderer using the CUDA
synchronization function __syncthreads_count [36], [37, p. 173]. WebGPU lacks
a corresponding synchronization function. An attempt was made to emulate this
function using atomics, work-group barriers, and work-group shared memory, however,
this turned out to be slower than doing nothing. Therefore, our renderer does not
check for this early termination condition and all 2D Gaussian instances in the tile
will always be loaded even if all pixels are finished.
3.2.2 Geometry-Based Renderer
Sort Project Rasterize
3D Gaussians 3D Gaussians 2D Gaussian Quads
Figure 3.1: An overview of the logical steps performed by the geometry renderer.
Our second method of rendering 3D Gaussians on the web (the geometry renderer)
utilizes the traditional graphics rendering pipeline by drawing each 3D Gaussian
18
3. Method
through rasterizing geometry. The inputs to the renderer are a set of 3D Gaussians
S and camera parameters C. Let S = {g0, g1, . . . , gN−1}, where gi is a 3D Gaussian
with ID i and N is the total number of 3D Gaussians.
As seen in Figure 3.1, the rasterization begins by sorting the given 3D Gaussians by
camera-space depth. After that, each 3D Gaussian is projected into a screen-space
2D Gaussian using the given camera parameters, and the Gaussian’s view-dependent
color is calculated. For each 2D Gaussian, a bounding screen-space quadrilateral is
constructed. Finally, each quadrilateral is rasterized and alpha blended in back-to-
front order. The following sections detail how these logical operations are implemented
using compute shaders and the traditional graphics processing pipeline through
WebGPU.
Sort 3D Gaussians For the 3D Gaussians to be rasterized and blended in back-
to-front order, they need to be sorted. Therefore, rendering a given set of Gaussians
S begins by executing a compute shader with workgroups of dimensions 256× 1× 1
where each invocation (the WebGPU equivalent of a CUDA thread) handles one
3D Gaussian each. Each 3D Gaussian’s world-space position µ is transformed into
camera-space position µcam using the camera’s world-to-camera matrix, where the
camera’s forward direction is along the positive z-axis. Let z be the z-component
of µcam. Finally, the 32-bit floating-point number z is reinterpreted as a 32-bit
integer and stored along with the Gaussian’s ID in sorting entries buffer Bunsorted =
[(z0, ID0), (z1, ID1), . . . , (zN−1, IDN−1)].
Thereafter, buffer Bunsorted is sorted in ascending order by zi, producing buffer Bsorted.
The sorting is performed using the parallel radix sort WebGPU implementation by
Raph Levien presented in Section 3.2.1.
Project 3D Gaussians The next key step is to construct a 2D quadrilateral for
each 3D Gaussian such that it bounds the 3D Gaussian in screen space. The purpose
of the quad is to cause fragment shader invocations for, at least, every pixel to which
the corresponding 3D Gaussian can contribute.
This next step is started by a single draw call that executes a WebGPU render
pass with camera parameters, buffers for all 3D Gaussian properties, and the buffer
Bsorted as input. No vertex or index buffers are used for the draw call; instead, it is
instructed to use instanced drawing where four vertices are to be constructed by the
vertex shader and this is repeated for N instances.
Let iinstance ∈ {0, 1, . . . , N − 1} be the ID of the current geometry instance. To
ensure the 3D Gaussians are drawn back to front, instance iinstance corresponds to
the sorting entry of Bsorted at index N − iinstance − 1, and therefore, 3D Gaussian
with ID i = IDN−iinstance−1.
Using the same procedure as described in the “Pre-Process 3D Gaussians” phase of
the original 3DGS renderer (presented in Section 2.2.3), the vertex shader calculates
the corresponding 2D Gaussian’s center µ2D in continuous pixel coordinates and
transformation matrix Σ2D. Furthermore, the extents of the 2D Gaussian’s axis-
aligned bounding box centered at µ2D are determined as e = Esquare(Σ2D) (see
19
3. Method
Equation (2.7)). Using µ2D and e, the framebuffer coordinates of the four vertices
for the current instance are determined and then transformed into clip space.
Finally, using the current 3D Gaussian’s spherical harmonic coefficients, its view-
dependent color is calculated using the direction from the camera to the Gaussian’s
center.
Once the vertex processing stage has finished, screen-space position µ2D, the inverse
of the 2D transformation matrix Σ−12D, and view-dependent color c is passed on to
the fragment processing stage.
Rasterize 2D Gaussian Quads With one screen-space quadrilateral now con-
structed for each 2D Gaussian, the fragment shader of the render pass will be invoked
for every pixel overlapped by each Gaussian’s corresponding quadrilateral. Let xp
be the center of the pixel handled by the current fragment shader invocation. The
fragment shader outputs color C and alpha α of the current pixel for the current 2D
Gaussian as follows:
C = c · α,
(3.4)
α = DΣ2D(xp),
where function D is from Equation (2.2).
Finally, the rasterized 2D Gaussians are blended in back-to-front order by the GPU
by appropriately setting the blending parameters of the WebGPU render pass. The
render pass is configured to achieve the following standard pre-multiplied alpha
blending operation:
Cresult = Csource +Cdestination · (1− αsource) (3.5)
3.2.3 Optimization
A key observation regarding the workload of both the 3DGS-web renderer and
the geometry renderer is that the total amount of tiles in 3DGS-web and the
total amount of pixels in geometry that a 2D Gaussian’s contribution has to be
calculated for is directly related to the size of each 2D Gaussian’s bounding box.
As introduced in Section 2.2.1, a Gaussian never reaches zero over its whole domain
and, therefore, contributes to every point in space and every pixel. Thus, to achieve
efficient rendering, it is limited to three standard deviations from its center. The
elliptical boundary corresponding to this limit for a 2D Gaussian can be seen in
Figure 3.2 as a dotted black line.
The 3DGS renderer approximates the bounding ellipse using an axis-aligned bound-
ing box with square extents as described by Esquare (defined in Equation (2.7)). As
seen in Figure 3.2a, the bounding box is dimensioned such that the bounding ellipse’s
longest axis will fit regardless of orientation. This makes the extent of the bounding
box unnecessarily large.
Therefore, we propose replacing 3DGS’s square bounding box with an axis-aligned
bounding box with dimensions such that it tightly bounds a 2D Gaussian’s elliptical
20
3. Method
(a) 3DGS (b) Ours
Figure 3.2: A comparison of the square axis-aligned bounding box used by the
3DGS renderer and our tight bounding box used by the 3DGS-web-opt and
geometry-opt renderers.
boundary, as seen in Figure 3.2b. The two axes of the bounding ellipse are a and b,
defined as ⌈ √ ⌉
a = v1 l , l =
∥v1∥
1 1
v ⌈
3√λ1⌉ , (3.6)
b = 2 l2, l2 = 3 λ2 ,∥v2∥
where v1 and v2 are the two eigenvectors of Σ2D with eigenvalues λ1 and λ2, respec-
tively. Finally, as shown by Quílez [38], using the axes of the bounding ellipse our
extents function can be defined as [√ √ ]T
Etight(Σ2D) = 2 a2x + b2 2x ay + b2y , (3.7)
[ ]T [ ]T
where a = ax ay and b = bx by . This new extents function for a 2D
Gaussian’s axis-aligned bounding box is used in the 3DGS-web-opt and geometry-
opt renderers.
21
3. Method
22
4
Results
The following chapter details how the previously presented methods were evaluated
and highlights key insights from the results. Sections 4.1 and 4.2 correspond to the
methods presented in Sections 3.1 and 3.2, respectively.
4.1 Mesh-Supervised 3D Gaussian Optimization
4.1.1 Evaluation Methodology
To evaluate our method of initializing a 3D Gaussian scene using existing scene
geometry, the following evaluation methodology was followed. A training and
evaluation set consisting of images depicting the eight scenes from NeRF’s Realistic
Synthetic 360° dataset [1] was used. The exact selection of training and evaluation
views provided in NeRF’s dataset was used. Additionally, the corresponding Blender
scenes provided in the dataset were used to export triangular meshes for each scene.
Using the set of training views and meshes as input, the optimization process was
run using the original 3DGS method and our method. Both methods were initialized
with a set of 100 000 3D Gaussians. To facilitate comparison with the original 3DGS
paper, the process was run for 30 000 iterations. The 3DGS scene was checkpointed
at 7000 and 30 000 iterations. Finally, objective reconstruction quality, i.e., the
difference between an original image and a reconstructed image from the same
viewpoint, was measured for every view in the evaluation set using three commonly
used objective image similarity metrics: Structural Similarity Index (SSIM) [20],
Peak Signal-to-Noise Ratio (PSNR), and Learned Perceptual Image Patch Similarity
(LPIPS) [39].
4.1.2 Reconstruction Quality
Table 4.1 shows the results of measuring objective reconstruction quality for 3DGS
and our method. The metrics have been averaged per method and scene across all
views of the evaluation set. Furthermore, following the approach of the original 3DGS
paper, the optimized 3D Gaussian scene is evaluated at 7000 and 30 000 training
iterations. The metrics at these two points are shown in Table 4.1a and Table 4.1b,
respectively.
23
4. Results
Table 4.1: Reconstruction quality of our mesh-supervised 3D Gaussian optimization
method compared to the original 3DGS optimization method. For the three objective
image similarity metrics SSIM, PSNR, and LPIPS, the upward and downward arrows
indicate whether larger or smaller values, respectively, correspond to higher similarity.
The colored backgrounds indicate the “best” method for each metric and scene. The
“#” column shows the number of 3D Gaussians in the scene in kilo-Gaussians.
Chair Drums Ficus Hotdog
Method SSIM↑ PSNR↑ LPIPS↓ # SSIM↑ PSNR↑ LPIPS↓ # SSIM↑ PSNR↑ LPIPS↓ # SSIM↑ PSNR↑ LPIPS↓ #
3DGS 0.980 33.19 0.020 317 0.947 25.41 0.050 247 0.984 34.19 0.016 191 0.981 35.86 0.030 145
Ours 0.983 33.57 0.016 490 0.948 25.46 0.045 390 0.984 34.20 0.015 228 0.982 36.08 0.025 253
Lego Materials Mic Ship
Method SSIM↑ PSNR↑ LPIPS↓ # SSIM↑ PSNR↑ LPIPS↓ # SSIM↑ PSNR↑ LPIPS↓ # SSIM↑ PSNR↑ LPIPS↓ #
3DGS 0.975 33.66 0.027 263 0.950 28.75 0.051 136 0.985 33.17 0.014 133 0.898 30.57 0.129 210
Ours 0.979 34.28 0.020 370 0.950 28.72 0.049 227 0.989 34.72 0.009 274 0.902 30.80 0.112 365
(a) At 7000 training iterations
Chair Drums Ficus Hotdog
Method SSIM↑ PSNR↑ LPIPS↓ # SSIM↑ PSNR↑ LPIPS↓ # SSIM↑ PSNR↑ LPIPS↓ # SSIM↑ PSNR↑ LPIPS↓ #
3DGS 0.988 35.58 0.010 489 0.955 26.28 0.037 390 0.987 35.50 0.012 266 0.985 38.06 0.020 188
Ours 0.987 35.30 0.011 645 0.954 26.25 0.036 526 0.986 35.39 0.012 299 0.985 37.80 0.018 291
Lego Materials Mic Ship
Method SSIM↑ PSNR↑ LPIPS↓ # SSIM↑ PSNR↑ LPIPS↓ # SSIM↑ PSNR↑ LPIPS↓ # SSIM↑ PSNR↑ LPIPS↓ #
3DGS 0.983 36.06 0.016 344 0.960 30.50 0.037 160 0.993 36.74 0.006 196 0.906 31.69 0.106 278
Ours 0.983 36.09 0.015 438 0.958 30.42 0.036 237 0.993 37.03 0.006 285 0.905 31.66 0.096 411
(b) At 30 000 training iterations
At 7000 iterations, our method shows a minor improvement in all metrics for almost
all scenes. On the other hand, at 30 000 iterations, the metrics are marginally better
for some scenes and marginally worse for others. On average, both methods produce
images of similar reconstruction quality after this many iterations. Finally, using our
method the number of 3D Gaussians in each scene is significantly higher overall.
Figure 4.1 shows two scenes where the difference between our method’s and 3DGS’
reconstruction quality (PSNR) is large, i.e., where our method exhibits the greatest
improvement. In the first scene at 7000 iterations (see Figure 4.1a), an improvement
is seen in the microphone’s mesh. In this case, 3DGS could not reconstruct the
mesh’s high-frequency details on the underside and front of the microphone, while our
method gave a result much closer to the ground truth image. However, for the same
scene at 30 000 iterations (see Figure 4.1b), little difference is seen using our method
compared to 3DGS. This is consistent with the results shown in Table 4.1. Moreover,
in the second scene at 7000 iterations (see Figure 4.1c), our method resulted in
a reconstruction much closer to the ground truth image compared to 3DGS. The
difference is most clearly seen in the long gray jagged Lego piece in the middle of the
bulldozer. Furthermore, the same difference is also, surprisingly, seen in Figure 4.1d
at 30 000 iterations.
24
4. Results
Ground Truth Ours 7k 3DGS 7k Ours 30k 3DGS 30k
(a) (b)
(c) (d)
Figure 4.1: An example of two scenes where our method exhibits the greatest
reconstruction quality improvement compared to 3DGS. The columns labeled “7k”
and “30k” correspond to images of each scene after 7000 and 30 000 training iterations
have passed, respectively.
4.2 Web-Based Renderer
4.2.1 Evaluation Methodology
(a) Wide view 0.25× (b) Medium view 1× (c) Close-up view 4×
Figure 4.2: An example of test cases with the three proximity levels used for each
camera angle during evaluation.
To evaluate the web-based renderers (with and without optimizations) compared
to the 3DGS renderer, two aspects were considered: Image quality and run-time
performance. For all renderers, a test set of 240 test cases was used. The test set
is based on the eight scenes from NeRF’s Realistic Synthetic 360° dataset [1]. For
all scenes, a single set of ten randomly selected camera angles were used where the
25
4. Results
camera faces the scene’s center and the camera’s position is uniformly sampled from
a sphere centered on the scene’s center. Additionally, three proximity variants of
each camera angle were used, referred to as wide, medium, and close-up views with
zoom levels of 0.25×, 1×, 4×, respectively. Here, 1× roughly corresponds to the
object of the scene being close but still fully visible with some margin around the
edges of the frame. An example of test cases showing the three proximity levels can
be seen in Figure 4.2. To facilitate comparison with the original 3DGS paper, all
test cases were rendered using dimensions of 800× 800 pixels.
To ensure that the performance comparison between the web-based renderers and the
3DGS renderer is as fair as possible, the images produced by the web-based renderers
were compared to the images produced by the 3DGS renderer using three commonly
used objective image similarity metrics: Structural Similarity Index (SSIM) [20],
Peak Signal-to-Noise Ratio (PSNR), and Learned Perceptual Image Patch Similarity
(LPIPS) [39]. These similarity metrics were computed for all test cases individually,
then averaged and grouped by renderer.
To evaluate the run-time performance of the 3DGS renderer and the web-based
renderers, GPU frame time was measured and approximate GPU memory usage was
estimated for all test cases. For all renderers, to minimize the influence of external
factors on measured frame time, GPU frame time and GPU memory usage were
collected during 90 consecutive frames after initially having rendered 30 warm-up
frames from starting the renderer. Additionally, each renderer was fully restarted
between running each test case. This means restarting the operating system process
for the 3DGS renderer and performing a full page reload for the web renderers.
Finally, the collected performance metrics of the 90 rendered frames were aggregated
into an average GPU frame time and the maximum GPU memory usage. All test
cases were executed using an Nvidia GeForce RTX 3070 Ti graphics card.
GPU frame time and memory usage for the 3DGS renderer was captured using the
real-time 3D Gaussian viewer [40] published by the authors of the original 3DGS
paper. To measure GPU frame time, the viewer was modified to use CUDA events
and cudaEventElapsedTime. Meanwhile, no built-in way was found to measure
GPU memory usage caused by a single operating system process using CUDA APIs.
Therefore, using cudaMemGetInfo, total GPU memory usage is measured at the start
of the application and once again after each finished frame. It should be noted that
if any other process allocates or frees GPU memory between these measurements,
the resulting renderer GPU memory usage could be incorrect. Hence, we refer to it
as an estimation and the amount is not guaranteed to be exact.
GPU frame time and memory usage for the web-based renderers was captured
by running each of the WebGPU-based renderers in Google Chrome (version 125
Beta) with the enable-webgpu-developer-features and enable-unsafe-webgpu
feature flags enabled. To capture GPU frame time, the WebGPU timestamp-query
[41] extension was used. The enable-webgpu-developer-features flag is enabled
to ensure timestamps are not quantized [42] (to multiples of 100µs) and stay as high
accuracy as possible. The enable-unsafe-webgpu flag is enabled to allow usage of
the GPUCommandEncoder.writeTimestamp [43] function, which was recently removed
26
4. Results
(a) 3DGS (b) 3DGS-web (c) Absolute difference
Figure 4.3: The worst performing test-case, with regard to similarity, for the
3DGS-web renderer in comparison to the 3DGS renderer.
from the WebGPU specification [44] but still kept in Chrome under the flag [45].
With regard to GPU memory usage, there is currently no way to query the exact
usage on the web platform. Therefore, the GPU memory usage was estimated using
the webgpu-memory JavaScript library which keeps track of every resource allocated
through the WebGPU API and estimates their total size based on the parameters
used to construct them [46]. It should be noted that this does not necessarily count
the exact memory usage on the GPU since it is up to the GPU driver and GPU to
decide how resources are laid out in memory.
4.2.2 Image Quality
Table 4.2: Image similarity of rendered images from the web-based renderers
compared to the 3DGS renderer averaged over all test cases. For the three objective
image similarity metrics SSIM, PSNR, and LPIPS, the upward and downward arrows
indicate whether larger or smaller values, respectively, correspond to higher similarity.
The colored backgrounds indicate the 1st , 2nd , and 3rd “best” method for
each metric.
Method SSIM↑ PSNR↑ LPIPS↓
3DGS-web 0.999 53.02 0.002
3DGS-web-opt 0.999 52.49 0.002
geometry 1.000 61.68 0.000
geometry-opt 0.999 51.41 0.001
The averaged similarity metrics for all test cases can be seen in Table 4.2. Since all
SSIM and LPIPS values are close to one and zero, respectively, these results suggest
that overall the web-based renderers on average produce images with high similarity
to the corresponding images produced by the 3DGS renderer. The geometry
renderer produces the most similar images with effectively no noticeable difference,
while the other three renderers all produce images containing some (minor) differences
compared to the 3DGS renderer.
27
4. Results
Figure 4.3 shows the image produced by the 3DGS-web (and 3DGS-web-opt1)
renderer for its test case with lowest PSNR value, i.e., the worst-case. As the absolute
difference shows, the error is small and almost perceptually insignificant. In general
differences in images produced by 3DGS-web and 3DGS-web-opt, compared to
3DGS, are caused by the lower precision of 16-bit floating-point depth values used
for sorting the Gaussians, causing them to be drawn in a different order than in
images produced by the 3DGS renderer.
(a) 3DGS (b) geometry-opt (c) Absolute difference
Figure 4.4: An example illustrating how images of scenes with sub-pixel Gaussians
exhibit a noticeable difference when using the geometry-opt renderer, in comparison
with the 3DGS renderer. The difference occurs in the mesh of the microphone and
the detailed patterns of the chair where Gaussians are smaller than a pixel. Note
that the chair is from a wide-view test case but magnified 4× for clarity.
Furthermore, Figure 4.4 exemplifies the difference in images for some of the worst-
case test cases for the geometry-opt renderer. As can be seen in the absolute
difference, the deviation occurs in the mesh of the microphone and the detailed
patterns of the chair. In these high-frequency areas, there are many tiny screen-space
Gaussians. Tiny Gaussians that previously used a square screen-space bounding box
in the “non-optimized” renderers and were trained for this case are likely to become
smaller than a pixel in the optimized version of the geometry-based renderer. The
3DGS renderer draws these sub-pixel Gaussians larger than a single pixel while
the geometry-opt renderer clamps the bounding box to cover at least one pixel.
This differing approach is the cause of the slight but nearly imperceptible image
difference.
1Since the two images are practically identical, only the image produced by 3DGS-web is
shown.
28
4. Results
4.2.3 Run-Time Performance
Table 4.3: Run-time performance of the web-based renderers compared to the
3DGS renderer. The “Time” metric denotes GPU frame time in milliseconds and
the “Mem” metric denotes approximate maximum GPU memory usage in mebibytes.
The colored backgrounds indicate the 1st , 2nd , and 3rd “best” method for
each metric and scene.
Chair Drums Ficus Hotdog Lego Materials Mic Ship Avg.
Method Time Mem Time Mem Time Mem Time Mem Time Mem Time Mem Time Mem Time Mem Time Mem
3DGS 6.16 659 5.79 642 4.11 436 2.74 371 4.09 573 3.47 486 5.60 457 4.28 661 4.53 536
3DGS-web 8.09 398 6.97 419 4.07 326 3.25 300 5.53 369 3.80 367 5.71 346 7.06 388 5.56 365
3DGS-web-opt 5.41 398 5.15 419 3.15 326 2.37 300 3.75 369 3.00 367 3.98 346 4.09 388 3.87 365
geometry 3.80 158 3.03 177 1.78 91 2.61 68 3.42 131 1.80 129 1.85 110 5.12 149 2.93 127
geometry-opt 2.05 158 1.96 177 1.25 91 1.49 68 1.95 131 1.31 129 1.43 110 2.41 149 1.73 127
(a) All test-cases
Chair Drums Ficus Hotdog Lego Materials Mic Ship Avg.
Method Time Mem Time Mem Time Mem Time Mem Time Mem Time Mem Time Mem Time Mem Time Mem
3DGS 3.18 659 2.77 642 2.16 436 2.18 371 2.83 573 1.95 484 1.76 457 3.69 661 2.56 536
3DGS-web 7.88 398 4.76 419 2.94 326 3.94 300 6.18 369 2.47 367 2.50 346 10.19 388 5.11 365
3DGS-web-opt 3.28 398 2.69 419 1.89 326 2.15 300 2.83 369 1.77 367 1.62 346 4.08 388 2.54 365
geometry 7.13 158 4.87 177 2.46 91 4.71 68 6.40 131 2.31 129 2.71 110 10.89 149 5.18 127
geometry-opt 3.01 158 2.56 177 1.53 91 2.07 68 2.85 131 1.50 129 1.58 110 4.14 149 2.41 127
(b) Only test-cases with close-up views
Chair Drums Ficus Hotdog Lego Materials Mic Ship Avg.
Method Time Mem Time Mem Time Mem Time Mem Time Mem Time Mem Time Mem Time Mem Time Mem
3DGS 11.47 561 11.08 608 7.60 368 3.98 307 6.55 479 5.87 472 11.99 421 6.21 535 8.09 469
3DGS-web 11.35 398 11.53 419 6.71 326 3.37 300 6.61 369 6.16 367 11.37 346 6.73 388 7.98 365
3DGS-web-opt 8.99 398 8.94 419 5.41 326 3.10 300 5.48 369 4.87 367 7.72 346 5.09 388 6.20 365
geometry 1.91 158 2.00 177 1.53 91 1.04 68 1.53 131 1.56 129 1.43 110 1.59 149 1.57 127
geometry-opt 1.47 158 1.57 177 1.05 91 0.92 68 1.37 131 1.20 129 1.38 110 1.28 149 1.28 127
(c) Only test-cases with wide views
Table 4.3 shows the results of measuring GPU frame time and estimating maximum
GPU memory usage for the 3DGS renderer as well as all the web-based renderers.
The metrics have been aggregated (average for frame time and maximum for memory
usage) per renderer and scene, and the average across all scenes for a specific renderer
is displayed in the rightmost column. Furthermore, different subsets of test cases are
considered in the three sub-tables. Table 4.3a includes all test cases, while Table 4.3b
and Table 4.3c only include test cases with close-up and wide views, respectively.
Results for only medium views are excluded here for brevity due to few additional
insights but can be found in Appendix A.1.1.
Based on Table 4.3a the following insights concerning frame time can be derived: The
3DGS-web renderer is on average somewhat slower than the 3DGS renderer for all
test case types, despite having roughly the same architecture. Meanwhile, the 3DGS-
web-opt renderer is for most scenes on average on par with or marginally faster than
the 3DGS renderer. Furthermore, the geometry renderer is on average considerably
faster than the 3DGS renderer and its derivatives for almost all scenes. Finally,
the geometry-opt renderer is on average almost twice as fast as the geometry
renderer.
29
4. Results
Moreover, concerning frame time for close-up and wide views, the 3DGS, 3DGS-web,
and 3DGS-web-opt renderers show an inverse trend compared to the geometry
and geometry-opt renderers. On average, 3DGS, 3DGS-web, and 3DGS-web-
opt perform significantly slower for wide views while geometry and geometry-opt
perform somewhat slower for close-up views. For close-up views, it can be seen that
3DGS and 3DGS-web-opt even significantly outperform the geometry renderer
for many scenes. The geometry-opt renderer on the other hand is only marginally
outperformed for a few scenes using close-up views.
Concerning memory usage, the 3DGS-web and 3DGS-web-opt renderers on
average use at least 100MiB less GPU memory for almost all scenes compared to
the 3DGS renderer. Furthermore, the geometry and geometry-opt renderers
use at least 200MiB less GPU memory than the 3DGS-web and 3DGS-web-
opt renderers for all scenes. Additionally, the 3DGS renderer requires noticeably
more memory for close-up views than wide views for all scenes. Meanwhile, the
amount of memory consumed by all the web-based renderers is independent of how
Gaussians are distributed and sized in screen space. It should be noted that the
reason 3DGS-web’s and 3DGS-web-opt’s GPU memory usage does not vary
based on proximity, despite being based on the architecture of the 3DGS renderer,
is due to the usage of a fixed upper limit for the number of 2D Gaussian instances,
as detailed in Section 3.2.1.
30
5
Discussion
Our work is motivated by the use of recent novel-view synthesis methods, specifically
3D Gaussian Spatting (3DGS), to allow for real-time interaction with scenes that
would typically be prohibitively expensive to render in real time. Specifically, our
work targets the web to allow for applications such as product visualization on
consumer hardware in a portable way. We have explored two questions that build
on 3DGS. Firstly, given a classical scene representation (i.e., polygonal mesh), what
effect does using the existing scene geometry to inform initialization have on the
optimization process and results of 3D Gaussian Splatting? Secondly, how can 3D
Gaussians be rendered efficiently within the constraints of the web?
5.1 Mesh-Supervised 3D Gaussian Optimization
To assess the effect of using existing scene geometry to inform the 3DGS optimization
process, we propose and implemented a method for initializing a 3D Gaussian scene
from a triangular mesh, as presented in Section 3.1. We evaluated our method
compared to 3DGS across eight small-scale synthetic scenes and measured objective
reconstruction quality for multiple views of each scene, as described in Section 4.1.1.
The evaluation results show a distinct difference in reconstruction quality improve-
ment when observing the scene after only 7000 training iterations as opposed to
30 000 iterations. At 30 000 iterations, both our method and 3DGS on average result
in roughly the same reconstruction quality. Supposedly, since only the initialization
stage was augmented, after a large number of iterations the optimization processes
converge. Consequently, both methods arrive at roughly the same 3D Gaussian scene
and nearly the same reconstruction quality.
However, at 7000 iterations, our method consistently exhibits improvement across
almost all scenes. These results suggest that our method somewhat accelerates the
optimization to reach higher reconstruction quality with fewer iterations. This can
also be seen in Figure 4.1 where examples of our method’s largest improvement
are shown. Here our method was able to reconstruct certain small-scale details at
only 7000 iterations better than 3DGS at 30 000 iterations. These examples should,
however, be taken with a grain of salt since they are the very best cases. On average,
our method gives marginal improvement after 7000 iterations.
31
5. Discussion
Finally, despite both methods being initialized with 100 000 Gaussians, we observe
that using our method results in scenes with more 3D Gaussians than 3DGS. Part of
the reason for the higher reconstruction quality at 7000 iterations could be partly
due to this increase. It is unclear whether this increase in Gaussians is required to
reconstruct the scenes well. For example, in Figure 4.1b with the microphone at
30 000 iterations, no significant improvement can be seen despite our method using
almost 100 000 Gaussians more. Ideally, the number of 3D Gaussians should be as
low as possible, since a greater amount of Gaussians results in higher memory usage
and typically slower rendering performance.
Overall, these results suggest that there seems to be some merit in supervising the
3D Gaussian Splatting optimization process using existing scene geometry. However,
using our rather simple method for initialization is insufficient to create noticeable
visual differences across a variety of scenes.
5.1.1 Limitations
A major limitation of our method is that it will not adapt to differing geometrical
complexity, since it always creates a fixed number of 3D Gaussians. Furthermore, it
does not utilize the fact that a 3D Gaussian has three dimensions since each Gaussian
is flattened to align with each face of the triangular mesh. Finally, the heuristic used
to scale each Gaussian does not accurately cover each face’s area.
5.1.2 Future Work
For future work, it could be interesting to attempt a more sophisticated method
for initializing the 3D Gaussian scene. For example, a different method could
derive volume elements from the mesh and construct 3D Gaussians for each element.
Alternatively, a signed-distance field could be constructed from the mesh and used
to sample points inside the mesh’s volume.
Furthermore, examining the effect of performing supervision of the optimization
process beyond initialization could be interesting. For example, the loss function
could be augmented to “encourage” the Gaussians to follow the surface or volume of
the mesh, or at least penalize Gaussians that are outside the volume of the mesh.
As a slightly different approach, it could also be interesting to supervise the Gaussians
to follow the scene’s original geometry using additional per-view information (e.g.,
depth and normal render passes) rather than the mesh itself. For example, a depth
render pass per input view could be used to sample points on surfaces for initialization
or in the loss function to “encourage” the Gaussians to stay near the surface or at
least inside the scene’s volume. This approach to initialization could additionally
use the color and potentially a normal render pass for each view to approximate an
initial color and orientation for each Gaussian.
32
5. Discussion
5.2 Web-Based Renderer
To assess different ways of rendering 3D Gaussians on the web, we implemented two
methods (presented in Section 3.2), named 3DGS-web and geometry, as well as
identified a key optimization that is applied to both methods, named 3DGS-web-
opt and geometry-opt (presented in Section 3.2.3). The first method takes direct
inspiration from the architecture of the original 3D Gaussian renderer presented in
the 3DGS paper (see the 3DGS renderer in Section 2.2.3). Meanwhile, the second
method utilizes rasterization of proxy geometry through the traditional graphics
pipeline.
We evaluated image quality and run-time performance compared to the 3DGS
renderer through a set of 240 test cases encompassing a variety of small-scale scenes
from multiple camera angles with multiple proximity levels (close-up, medium, and
wide views), as presented in Section 4.2.1.
The evaluation results suggest that using the architecture of the original 3DGS
renderer to synthesize novel views in real time in a web-based environment is viable.
However, due to the limitations of the web platform, specifically WebGPU, the
original architecture could not be implemented identically, and compromises were
made. These compromises caused the images rendered with 3DGS-web to have a loss
of quality compared to the 3DGS renderer in some cases. However, the differences
are largely imperceptible. Furthermore, seemingly due to parts of the architecture
that could not be implemented in WebGPU, the same run-time performance with
regard to frame time was not achieved.
On the other hand, on average our geometry-based 3D Gaussian renderer noticeably
outperforms the original 3DGS renderer, as well as our WebGPU adaptions of it,
with regard to frame time and GPU memory usage. Presumably, due to the highly
optimized hardware in contemporary GPUs for geometry rasterization and shading,
it is in most cases more efficient to render 3D Gaussians as proxy geometry than to
use compute passes as in the 3DGS renderer’s architecture. However, the geometry
renderer performs worse than 3DGS for close-up views. In the case of a close-up
view, many 2D Gaussians will have a large mutual screen-space overlap. In this case,
as described in Section 2.2.3, the 3DGS renderer will terminate accumulation early
for pixels whose transmission becomes low. The observed performance inversion is
likely caused by the geometry renderer not having a way of discarding fragments
from highly occluded 2D Gaussians in this situation. Similarly, the 3DGS-web
renderer does not perform an early termination either, and as seen in the results, it
performs equally poorly for close-up views.
Additionally, with our rather straightforward optimization of using more accurate
2D Gaussian bounding boxes, the 3DGS-web-opt renderer is at least as performant
as the original 3DGS renderer. This can be explained by the fact that a smaller
bounding box means that each 2D Gaussian is likely to overlap fewer screen-space
tiles, and consequently, each thread corresponding to a pixel of a tile needs to load
fewer Gaussians from global GPU memory and do less work to accumulate them.
33
5. Discussion
Along the same lines, with our more accurate 2D Gaussian bounding boxes applied
to our geometry-based renderer, it significantly outperforms all renderers in our
comparison with regard to frame time and GPU memory usage. In this case, the
proxy geometry will have a smaller footprint in screen space, and naturally fewer
fragments need to be computed and blended.
Overall, these results suggest that 3D Gaussian Splatting as a scene representation
and rendering method can effectively be used on the web. When comparing whether
to use the original 3DGS architecture or the geometry-based method, we recommend
the geometry-based method due to lower memory usage and frame time in almost all
cases, as well as a simpler pipeline for implementation. Furthermore, the geometry-
based renderer’s memory usage is constant, unlike the 3DGS rendering architecture,
whose memory usage depends on the number of 2D Gaussian instances, usually
determined by the camera’s proximity to the scene. Finally, we conclude that these
methods could likely allow real-time interaction for scenes that would typically
be prohibitively expensive to render in real time, offering a potential avenue for
interactive and high-quality product visualization on the web.
5.2.1 Limitations
In our study of different ways of rendering 3D Gaussians on the web, we have
only evaluated our methods for a single type of GPU, a single set of output image
dimensions, and a single type of scene. The observations made might not hold under
a different set of conditions.
Regarding our methods, one major limitation of the 3DGS-web and 3DGS-web-
opt renderers is that a maximum number of 3D Gaussian instances has to be
manually configured. If this limit is set too low, all Gaussians might not be rendered
for some close-up views, and if it is set too high a large amount of GPU memory will
be allocated despite not being needed for all views.
5.2.2 Future Work
For future work, it would be interesting to see how the compared methods scale
when evaluated using large-scale environmental scenes, image dimensions closer to
what would be used for a real-world application1, and lower-end devices.
Moreover, several ideas could be explored for the geometry-based renderer. Firstly, as
previously discussed, the geometry-based renderer, supposedly, performs poorly for
close-up views due to a lack of an early termination when blending 2D Gaussians. It
would be interesting to address this issue to see if it can reduce the computation per
pixel. Secondly, it would be interesting to examine the effect of using more vertices
for a 2D Gaussian’s screen-space proxy geometry to more closely approximate its
elliptical boundary. Hopefully, this would further decrease the number of fragments
rasterized per 2D Gaussian.
1Results for this case were measured but not discussed due to lack of time. These results can be
seen in Appendix A.1.2.
34
5. Discussion
5.3 Risks and Ethics
When considering the risks and ethics of research, two aspects should be considered:
The research methodology and the research results. Our research methodology is
isolated to the theoretical study of computer graphics with small-scale empirical
experiments. Due to the scale of the study, and the lack of interaction with any
human participants, we see no meaningful negative economic, legal, societal, privacy,
security, ecological, or environmental ethical concerns.
Regarding our research results, we neither see any reason for negative ethical impact.
Our immediate results are not significant enough to cause any ground-breaking effects
that lead to immediate ethical issues. However, in a broader sense, advancement in
novel-view synthesis could be misused for malicious purposes, such as generating
fake imagery with the intent of misleading.
35
5. Discussion
36
Bibliography
[1] B. Mildenhall, P. P. Srinivasan, M. Tancik, J. T. Barron, R. Ramamoorthi,
and R. Ng, “NeRF: Representing scenes as neural radiance fields for view
synthesis,” in Computer Vision – ECCV 2020, A. Vedaldi, H. Bischof, T.
Brox, and J.-M. Frahm, Eds., ser. Lecture Notes in Computer Science, Cham:
Springer International Publishing, 2020, pp. 405–421, isbn: 978-3-030-58452-8.
doi: 10.1007/978-3-030-58452-8_24.
[2] Y. Xie, T. Takikawa, S. Saito, et al., “Neural fields in visual computing and
beyond,” Computer Graphics Forum, 2022, issn: 1467-8659. doi: 10.1111/
cgf.14505.
[3] M. Seefelder and D. Duckworth. “Reconstructing indoor spaces with NeRF,”
Google Research. (Jun. 14, 2023), [Online]. Available: https://blog.research.
google/2023/06/reconstructing-indoor-spaces-with-nerf.html
(visited on 08/29/2023).
[4] T. Müller, A. Evans, C. Schied, and A. Keller, “Instant neural graphics primi-
tives with a multiresolution hash encoding,” ACM Trans. Graph., vol. 41, no. 4,
102:1–102:15, Jul. 2022. doi: 10.1145/3528223.3530127. [Online]. Available:
https://doi.org/10.1145/3528223.3530127.
[5] B. Kerbl, G. Kopanas, T. Leimkuehler, and G. Drettakis, “3d gaussian splatting
for real-time radiance field rendering,” ACM Transactions on Graphics, vol. 42,
no. 4, 139:1–139:14, Jul. 26, 2023, issn: 0730-0301. doi: 10.1145/3592433.
[Online]. Available: https://dl.acm.org/doi/10.1145/3592433 (visited on
11/06/2023).
[6] G. Chen and W. Wang, A survey on 3d gaussian splatting, Jan. 8, 2024. arXiv:
2401.03890[cs]. [Online]. Available: http://arxiv.org/abs/2401.03890
(visited on 01/09/2024).
[7] S. Niedermayr, J. Stumpfegger, and R. Westermann, Compressed 3d gaussian
splatting for accelerated novel view synthesis, Jan. 22, 2024. doi: 10.48550/
arXiv.2401.02436. arXiv: 2401.02436[cs]. [Online]. Available: http:
//arxiv.org/abs/2401.02436 (visited on 05/16/2024).
[8] K. Kwok, Antimatter15/splat, Jan. 9, 2024. [Online]. Available: https://
github.com/antimatter15/splat (visited on 06/06/2024).
[9] A. Meißner, Lichtso/splatter, Oct. 19, 2023. [Online]. Available: https://
github.com/Lichtso/splatter (visited on 06/06/2024).
[10] kishimisu, Kishimisu/gaussian-splatting-WebGL, Oct. 26, 2023. [Online]. Avail-
able: https://github.com/kishimisu/Gaussian-Splatting-WebGL (visited
on 06/06/2024).
37
Bibliography
[11] Y. Sato, BladeTransformerLLC/gauzilla, May 14, 2024. [Online]. Available:
https ://github .com /BladeTransformerLLC /gauzilla (visited on
06/06/2024).
[12] M. Svensson, MarcusAndreasSvensson/gaussian-splatting-webgpu, Oct. 26, 2023.
[Online]. Available: https ://github .com /MarcusAndreasSvensson /
gaussian-splatting-webgpu (visited on 06/06/2024).
[13] M. Kellogg, Mkkellogg/GaussianSplats3d, May 9, 2024. [Online]. Available: ht
tps://github.com/mkkellogg/GaussianSplats3D (visited on 06/06/2024).
[14] M. Tyszkiewicz and A. Islamov, Cvlab-epfl/gaussian-splatting-web, Sep. 22,
2023. [Online]. Available: https://github.com/cvlab-epfl/gaussian-
splatting-web (visited on 06/06/2024).
[15] Wikipedia contributors, 68–95–99.7 rule, in Wikipedia, Page Version ID:
1214533040, Mar. 19, 2024. [Online]. Available: https://en.wikipedia.
org/w/index.php?title=68%E2%80%9395%E2%80%9399.7_rule&oldid=
1214533040#Table_of_numerical_values (visited on 04/24/2024).
[16] J. T. Barron, B. Mildenhall, D. Verbin, P. P. Srinivasan, and P. Hedman, “Mip-
NeRF 360: Unbounded anti-aliased neural radiance fields,” in 2022 IEEE/CVF
Conference on Computer Vision and Pattern Recognition (CVPR), ISSN: 2575-
7075, Jun. 2022, pp. 5460–5469. doi: 10.1109/CVPR52688.2022.00539.
[Online]. Available: https://ieeexplore.ieee.org/document/9878829
(visited on 06/06/2024).
[17] A. Knapitsch, J. Park, Q.-Y. Zhou, and V. Koltun, “Tanks and temples: Bench-
marking large-scale scene reconstruction,” ACM Transactions on Graphics,
vol. 36, no. 4, 78:1–78:13, Jul. 20, 2017, issn: 0730-0301. doi: 10 .1145 /
3072959.3073599. [Online]. Available: https://dl.acm.org/doi/10.1145/
3072959.3073599 (visited on 04/30/2024).
[18] P. Hedman, J. Philip, T. Price, J.-M. Frahm, G. Drettakis, and G. Brostow,
“Deep blending for free-viewpoint image-based rendering,” ACM Transactions
on Graphics, vol. 37, no. 6, 257:1–257:15, Dec. 4, 2018, issn: 0730-0301. doi:
10.1145/3272127.3275084. [Online]. Available: https://dl.acm.org/doi/
10.1145/3272127.3275084 (visited on 04/30/2024).
[19] J. L. Schönberger and J.-M. Frahm, “Structure-from-motion revisited,” in
Conference on Computer Vision and Pattern Recognition (CVPR), 2016.
[20] Z. Wang, A. Bovik, H. Sheikh, and E. Simoncelli, “Image quality assessment:
From error visibility to structural similarity,” IEEE Transactions on Image
Processing, vol. 13, no. 4, pp. 600–612, Apr. 2004, Conference Name: IEEE
Transactions on Image Processing, issn: 1941-0042. doi: 10.1109/TIP.2003.
819861. [Online]. Available: https://ieeexplore.ieee.org/document/
1284395 (visited on 04/12/2024).
[21] D. P. Kingma and J. Ba, Adam: A method for stochastic optimization, Jan. 29,
2017. doi: 10.48550/arXiv.1412.6980. arXiv: 1412.6980[cs]. [Online].
Available: http://arxiv.org/abs/1412.6980 (visited on 05/02/2024).
[22] M. Zwicker, H. Pfister, J. van Baar, and M. Gross, “EWA volume splatting,”
in Proceedings Visualization, 2001. VIS ’01., Oct. 2001, pp. 29–538. doi:
10.1109/VISUAL.2001.964490. [Online]. Available: https://ieeexplore.
ieee.org/abstract/document/964490 (visited on 12/20/2023).
38
Bibliography
[23] D. Merrill and M. Garland, “Single-pass parallel prefix scan with decoupled look-
back,” NVIDIA Corporation, NVR-2016-002, Mar. 1, 2016. [Online]. Available:
https://research.nvidia.com/publication/2016-03_single-pass-
parallel-prefix-scan-decoupled-look-back.
[24] CCCL Development Team, CCCL: CUDA c++ core libraries, Jun. 6, 2024. [On-
line]. Available: https://github.com/NVIDIA/cccl (visited on 06/06/2024).
[25] CCCL Development Team. “Cub::DeviceScan.” (Jun. 6, 2024), [Online]. Avail-
able: https ://nvidia .github .io /cccl /cub /api /structcub _1 _
1DeviceScan.html (visited on 06/06/2024).
[26] A. Adinets and D. Merrill, Onesweep: A faster least significant digit radix
sort for GPUs, Jun. 3, 2022. doi: 10.48550/arXiv.2206.01784. arXiv:
2206.01784[cs]. [Online]. Available: http://arxiv.org/abs/2206.01784
(visited on 11/06/2023).
[27] CCCL Development Team. “Cub::DeviceRadixSort.” (Jun. 6, 2024), [Online].
Available: https://nvidia.github.io/cccl/cub/api/structcub_1_
1DeviceRadixSort.html (visited on 06/06/2024).
[28] B. Kerbl, G. Kopanas, T. Leimkuehler, and G. Drettakis, Graphdeco-
inria/gaussian-splatting, Nov. 1, 2023. [Online]. Available: https://github.
com/graphdeco-inria/gaussian-splatting/tree/2eee0e26d2d5fd00ec
462df47752223952f6bf4e (visited on 06/07/2024).
[29] I. T. Jolliffe and J. Cadima, “Principal component analysis: A review and
recent developments,” Philosophical Transactions of the Royal Society A: Math-
ematical, Physical and Engineering Sciences, vol. 374, no. 2065, p. 20 150 202,
Apr. 13, 2016, Publisher: Royal Society. doi: 10.1098/rsta.2015.0202.
[Online]. Available: https://royalsocietypublishing.org/doi/10.1098/
rsta.2015.0202 (visited on 06/03/2024).
[30] R. Levine and R. Levien, Prefix-sum.wgsl, May 25, 2022. [Online]. Available:
https://github.com/reeselevine/webgpu-litmus/blob/67e61fd6e
6130a62a9f0af28d3932b0b9418c02c/shaders/prefix-sum.wgsl (visited
on 06/07/2024).
[31] R. Levien. “Prefix sum on portable compute shaders,” Raph Levien’s blog.
(Nov. 17, 2021), [Online]. Available: https://raphlinus.github.io/gpu/
2021/11/17/prefix-sum-portable.html (visited on 06/07/2024).
[32] R. Levien and R. Dodd. “Sorting,” Sorting. (Jan. 28, 2024), [Online]. Available:
https ://github .com /linebender /linebender .github .io /blob /
34ee60d6eecc08249c8930ed8e968ed39769492f /content /wiki /gpu /
sorting.md (visited on 05/13/2024).
[33] R. Levien, Googlefonts/compute-shader-101, Dec. 27, 2023. [Online]. Available:
https ://github .com /googlefonts /compute - shader - 101 /blob /
9f882d8d7d2fad98372d04350020c6cd672c1a72/compute-shader-hello/
src/shader.wgsl (visited on 06/07/2024).
[34] T. Harada and L. Howes, “Introduction to GPU radix sort,” Advanced Micro
Devices, Inc., 2011. [Online]. Available: https://gpuopen.com/download/
publications /Introduction _to _GPU _Radix _Sort .pdf (visited on
05/13/2024).
39
Bibliography
[35] S. Ashkiani, A. Davidson, U. Meyer, and J. D. Owens, “GPU multisplit:
An extended study of a parallel algorithm,” ACM Transactions on Parallel
Computing, vol. 4, no. 1, 2:1–2:44, Aug. 23, 2017, issn: 2329-4949. doi: 10.1145/
3108139. [Online]. Available: https://dl.acm.org/doi/10.1145/3108139
(visited on 05/13/2024).
[36] B. Kerbl, G. Kopanas, T. Leimkuehler, and G. Drettakis, Graphdeco-inria/diff-
gaussian-rasterization, Aug. 23, 2023. [Online]. Available: https://github.
com /graphdeco - inria /diff - gaussian - rasterization (visited on
06/06/2024).
[37] NVIDIA Corporation, CUDA c++ programming guide, Release 12.5, May 20,
2024. [Online]. Available: https://docs.nvidia.com/cuda/pdf/CUDA_C_
Programming_Guide.pdf (visited on 06/07/2024).
[38] Í. Quílez. “Working with ellipses.” (2006), [Online]. Available: https ://
iquilezles.org/articles/ellipses/ (visited on 05/23/2024).
[39] R. Zhang, P. Isola, A. A. Efros, E. Shechtman, and O. Wang, “The unreasonable
effectiveness of deep features as a perceptual metric,” in 2018 IEEE/CVF
Conference on Computer Vision and Pattern Recognition, ISSN: 2575-7075,
Jun. 2018, pp. 586–595. doi: 10.1109/CVPR.2018.00068. [Online]. Available:
https://ieeexplore.ieee.org/document/8578166 (visited on 04/12/2024).
[40] B. Kerbl, G. Kopanas, T. Leimkuehler, and G. Drettakis, SIBR gaussian
viewer, Nov. 1, 2023. [Online]. Available: https://gitlab.inria.fr/sibr/
sibr_core/-/tree/4ae964a/src/projects/gaussianviewer (visited on
06/07/2024).
[41] B. Jones, K. Ninomiya, and J. Blandy, “WebGPU: Timestamp query,” W3C,
W3C Working Draft, Jun. 2024. [Online]. Available: https://www.w3.org/
TR/2024/WD-webgpu-20240606/#timestamp.
[42] F. Beaufort. “What’s new in WebGPU (chrome 120),” Chrome for Developers.
(Dec. 8, 2023), [Online]. Available: https://developer.chrome.com/blog/
new - in - webgpu - 120 #timestamp _queries _quantization (visited on
06/07/2024).
[43] MDN contributors. “GPUCommandEncoder: writeTimestamp(),” Web APIs
| MDN. (Mar. 30, 2024), [Online]. Available: https://developer.mozilla.
org/en-US/docs/Web/API/GPUCommandEncoder/writeTimestamp (visited
on 06/07/2024).
[44] F. Beaufort. “Remove GPUCommandEncoder.writeTimestamp,” GitHub.
(Nov. 23, 2023), [Online]. Available: https : / / github . com / gpuweb /
gpuweb/commit/6402899da70eed1379ec002e37d7e6e2273d09f9 (visited on
06/07/2024).
[45] F. Beaufort. “Gate GPUCommandEncoder.writeTimestamp behind al-
low_unsafe_apis,” Google Git - Dawn. (Nov. 9, 2023), [Online]. Available:
https://dawn.googlesource.com/dawn/+/d61514719334478955a230b597f
57efec273e983 (visited on 06/07/2024).
[46] G. Tavares, Webgpu-memory, version 1.4.2, Oct. 15, 2023. [Online]. Available:
https://www.npmjs.com/package/webgpu-memory/v/1.4.2 (visited on
06/07/2024).
40
A
Additional Evaluation Results
A.1 Web-Based Renderer
A.1.1 Run-Time Performance for Medium Views
Table A.1: Run-time performance of the web-based renderers compared to the
3DGS renderer for medium views of each scene. The “Time” metric denotes GPU
frame time in milliseconds and the “Mem” metric denotes approximate maximum
GPU memory usage in mebibytes. The colored backgrounds indicate the 1st ,
2nd , and 3rd “best” method for each metric and scene.
Chair Drums Ficus Hotdog Lego Materials Mic Ship Avg.
Method Time Mem Time Mem Time Mem Time Mem Time Mem Time Mem Time Mem Time Mem Time Mem
3DGS 3.84 585 3.51 630 2.58 382 2.05 345 2.90 515 2.59 486 3.06 433 2.94 565 2.94 493
3DGS-web 5.04 398 4.63 419 2.54 326 2.44 300 3.80 369 2.78 367 3.26 346 4.27 388 3.59 365
3DGS-web-opt 3.96 398 3.81 419 2.17 326 1.87 300 2.95 369 2.37 367 2.60 346 3.11 388 2.85 365
geometry 2.35 158 2.23 177 1.34 91 2.08 68 2.34 131 1.52 129 1.41 110 2.89 149 2.02 127
geometry-opt 1.68 158 1.74 177 1.18 91 1.47 68 1.63 131 1.22 129 1.33 110 1.80 149 1.51 127
A.1.2 High-Resolution Output Images
The following results show the image quality (Table A.2) and run-time performance
(Table A.3) for the web-based renderers and the 3DGS renderer when using output
image dimensions of 2000× 2000 pixels.
Table A.2: Image similarity of rendered images from the web-based renderers
compared to the 3DGS renderer averaged over all test cases, but using output
image dimensions of 2000× 2000 pixels. For the three objective image similarity
metrics SSIM, PSNR, and LPIPS, the upward and downward arrows indicate whether
larger or smaller values, respectively, correspond to higher similarity. The colored
backgrounds indicate the 1st , 2nd , and 3rd “best” method for each metric.
Method SSIM↑ PSNR↑ LPIPS↓
3DGS-web 0.964 49.32 0.033
3DGS-web-opt 0.999 54.29 0.002
geometry 1.000 65.73 0.000
geometry-opt 1.000 57.82 0.000
I
A. Additional Evaluation Results
Table A.3: Run-time performance of the web-based renderers compared to the
3DGS renderer, but using output image dimensions of 2000× 2000 pixels. The
“Time” metric denotes GPU frame time in milliseconds and the “Mem” metric
denotes approximate maximum GPU memory usage in mebibytes. The colored
backgrounds indicate the 1st , 2nd , and 3rd “best” method for each metric
and scene.
Chair Drums Ficus Hotdog Lego Materials Mic Ship Avg.
Method Time Mem Time Mem Time Mem Time Mem Time Mem Time Mem Time Mem Time Mem Time Mem
3DGS 6.43 1430 5.40 1211 3.89 988 4.52 960 5.86 1368 3.38 897 3.76 986 7.93 1680 5.15 1191
3DGS-web 10.99 449 9.85 470 6.15 377 8.14 352 10.31 420 5.50 418 6.49 398 11.89 440 8.67 416
3DGS-web-opt 6.90 449 5.72 470 3.54 377 4.19 352 6.39 420 3.28 418 3.34 398 8.84 440 5.27 416
geometry 14.85 222 9.93 241 5.10 156 12.03 132 14.56 195 4.67 193 5.74 175 23.38 213 11.28 191
geometry-opt 5.89 222 4.71 241 2.57 156 4.60 132 5.81 195 2.53 193 2.50 175 8.91 213 4.69 191
(a) All test-cases
Chair Drums Ficus Hotdog Lego Materials Mic Ship Avg.
Method Time Mem Time Mem Time Mem Time Mem Time Mem Time Mem Time Mem Time Mem Time Mem
3DGS 9.88 1430 7.44 1211 5.30 988 7.44 960 9.53 1368 4.00 897 3.85 986 14.85 1680 7.79 1191
3DGS-web 17.58 449 15.99 470 11.64 377 14.18 352 16.99 420 8.86 418 10.88 398 18.49 440 14.33 416
3DGS-web-opt 11.53 449 8.49 470 5.29 377 7.22 352 10.79 420 3.98 418 3.87 398 16.70 440 8.48 416
geometry 35.51 222 22.24 241 11.39 156 26.08 132 33.81 195 9.44 193 12.91 175 56.41 213 25.97 191
geometry-opt 12.98 222 9.59 241 4.71 156 8.98 132 12.64 195 4.27 193 4.37 175 20.47 213 9.75 191
(b) Only test-cases with close-up views
Chair Drums Ficus Hotdog Lego Materials Mic Ship Avg.
Method Time Mem Time Mem Time Mem Time Mem Time Mem Time Mem Time Mem Time Mem Time Mem
3DGS 4.47 952 4.22 979 2.90 702 3.72 762 4.63 914 3.22 807 2.77 748 5.49 982 3.93 856
3DGS-web 9.87 449 8.53 470 3.74 377 8.05 352 10.28 420 4.55 418 3.91 398 13.06 440 7.75 416
3DGS-web-opt 4.70 449 4.52 470 2.66 377 3.44 352 5.19 420 3.19 418 2.60 398 6.60 440 4.11 416
geometry 7.04 222 5.55 241 2.74 156 8.51 132 8.11 195 3.21 193 3.02 175 11.66 213 6.23 191
geometry-opt 3.12 222 2.89 241 1.76 156 3.62 132 3.41 195 2.01 193 1.80 175 4.80 213 2.93 191
(c) Only test-cases with medium views
Chair Drums Ficus Hotdog Lego Materials Mic Ship Avg.
Method Time Mem Time Mem Time Mem Time Mem Time Mem Time Mem Time Mem Time Mem Time Mem
3DGS 4.95 844 4.55 891 3.48 650 2.39 598 3.42 768 2.91 753 4.67 700 3.44 824 3.73 754
3DGS-web 5.51 449 5.04 470 3.09 377 2.20 352 3.67 420 3.09 418 4.69 398 4.12 440 3.92 416
3DGS-web-opt 4.48 449 4.14 470 2.67 377 1.93 352 3.19 420 2.67 418 3.54 398 3.21 440 3.23 416
geometry 2.02 222 2.00 241 1.16 156 1.49 132 1.77 195 1.36 193 1.29 175 2.08 213 1.65 191
geometry-opt 1.58 222 1.66 241 1.25 156 1.21 132 1.39 195 1.30 193 1.32 175 1.46 213 1.40 191
(d) Only test-cases with wide views
II