Skip to content

Commit 94ab78d

Browse files
Merge pull request #266 from yaw2014/master
Pull request for NeRF course notes
2 parents 9ed75ce + 9e95fb1 commit 94ab78d

File tree

5 files changed

+145
-0
lines changed

5 files changed

+145
-0
lines changed

CS_231n__NeRF_write_up.pdf

1.02 MB
Binary file not shown.

assets/NeRFresults.png

259 KB
Loading

assets/fourier.png

370 KB
Loading

assets/raydiagram.png

153 KB
Loading

nerf.md

Lines changed: 145 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,145 @@
1+
Tl;dr What is NeRF and what does it do
2+
======================================
3+
4+
NeRF stands for Neural Radiance Fields. It solves for view
5+
interpolation, which is taking a set of input views (in this case a
6+
sparse set) and synthesizing novel views of the same scene. Current RGB
7+
volume rendering models are great for optimization, but require
8+
extensive storage space (1-10GB). One side benefit of NeRF is the
9+
weights generated from the neural network are $\sim$6000 less in size
10+
than the original images.
11+
12+
Helpful Terminology
13+
===================
14+
15+
**Rasterization**: Computer graphics use this technique to display a 3D
16+
object on a 2D screen. Objects on the screen are created from virtual
17+
triangles/polygons to create 3D models of the objects. Computers convert
18+
these triangles into pixels, which are assigned a color. Overall, this
19+
is a computationally intensive process.\
20+
**Ray Tracing**: In the real world, the 3D objects we see are
21+
illuminated by light. Light may be blocked, reflected, or refracted. Ray
22+
tracing captures those effects. It is also computationally intensive,
23+
but creates more realistic effects. **Ray**: A ray is a line connected
24+
from the camera center, determined by camera position parameters, in a
25+
particular direction determined by the camera angle.\
26+
**NeRF uses ray tracing rather than rasterization for its models.**\
27+
**Neural Rendering** As of 2020/2021, this terminology is used when a
28+
neural network is a black box that models the geometry of the world and
29+
a graphics engine renders it. Other terms commonly used are *scene
30+
representations*, and less frequently, *implicit representations*. In
31+
this case, the neural network is just a flexible function approximator
32+
and the rendering machine does not learn at all.
33+
34+
Approach
35+
========
36+
37+
A continuous scene is represented as a 3D location *x* = (x, y, z) and
38+
2D viewing direction $(\theta,\phi)$ whose output is an emitted color c
39+
= (r, g, b) and volume density $\sigma$. The density at each point acts
40+
like a differential opacity controlling how much radiance is accumulated
41+
in a ray passing through point *x*. In other words, an opaque surface
42+
will have a density of $\infty$ while a transparent surface would have
43+
$\sigma = 0$. In layman terms, the neural network is a black box that
44+
will repeatedly ask what is the color and what is the density at this
45+
point, and it will provide responses such as “red, dense.”\
46+
This neural network is wrapped into volumetric ray tracing where you
47+
start with the back of the ray (furthest from you) and walk closer to
48+
you, querying the color and density. The equation for expected color
49+
$C(r)$ of a camera ray $r(t) = o + td$ with near and far bounds $t_n$
50+
and $t_f$ is calculated using the following:
51+
52+
$C(r) = \[ \int_{t_n}^{t_f} T(t)\sigma(r(t))c(r(t),d) \,dt \]
53+
where
54+
\[T(t) = exp(-\int_{t_n}^{t}\sigma(r(s))\, ds)\]$
55+
56+
\
57+
To actually calculate this, the authors used a stratified sampling
58+
approach where they partition $[t_n, t_f]$ into N evenly spaced bins and
59+
then drew one sample uniformly from each bin:
60+
61+
$$\hat{C}(r) = \sum_{i = 1}^{N}T_{i}(1-exp(-\sigma_{i}\delta_{i}))c_{i}, where T_{i} = exp(-\sum_{j=1}^{i-1}\sigma_{j}\delta_{j})$$
62+
63+
Where $\delta_{i} = t_{i+1} - t_{i}$ is the distance between adjacent
64+
samples. The volume rendering is differentiable. You can then train the
65+
model by minimizing rendering loss.
66+
67+
$$min_{\theta}\sum_{i}\left\| render_{i}(F_{\Theta}-I_{i}\right\|^{2}$$
68+
69+
![In this illustration taken from the paper, the five variables are fed
70+
into the MLP to produce color and volume density. $F_\Theta$ has 9
71+
layers, 256 channels](assets/raydiagram.png "fig:") [fig:Figure 1]
72+
73+
\
74+
In practice, the Cartesian coordinates are expressed as vector d. You
75+
can approximate this representation through MLP with
76+
$F_\Theta = (x, d) \rightarrow (c, \sigma)$.\
77+
**Why does NeRF use MLP rather than CNN?** Multilayer perceptron (MLP)
78+
is a feed forward neural network. The model doesn’t need to conserve
79+
every feature, therefore a CNN is not necessary.\
80+
81+
Common issues and mitigation
82+
============================
83+
84+
The naive implementation of a neural radiance field creates blurry
85+
results. To fix this, the 5D coordinates are transformed into positional
86+
encoding (terminology borrowed from transformer literature). $F_\Theta$
87+
is a composition of two formulas: $F_\Theta = F'_\Theta \cdot \gamma$
88+
which significantly improves performance.
89+
90+
$$\gamma(p) = (sin(2^{0}\pi p), cos(2^{0}\pi p),...,sin(2^{L-1}\pi p), cos(2^{L-1} \pi p)$$
91+
92+
L determines how many levels there are in the positional encoding and it
93+
is used for regularizing NeRF (low L = smooth). This is also known as a
94+
Fourier feature, and it turns your MLP into an interpolation tool.
95+
Another way of looking at this is your Fourier feature based neural
96+
network is just a tiny look up table with extremely high resolution.
97+
Here is an example of applying Fourier feature to your code:\
98+
99+
B = SCALE * np.random.normal(shape = (input_dims, NUM_FEATURES))
100+
x = np.concatenate([np.sin(x @ B), np.cos(x @ B)], axis = -1)
101+
x = nn.Dense(x, features = 256)
102+
103+
![Mapping how Fourier features are related to NeRF’s positional
104+
encoding. Taken from Jon Barron’s CS 231n talk in Spring
105+
2021](assets/fourier.png "fig:") [fig:Figure 2]
106+
107+
NeRF also uses hierarchical volume sampling: coarse sampling and the
108+
fine network. This allows NeRF to more efficiently run their model and
109+
deprioritize areas of the camera ray where there is free space and
110+
occlusion. The coarse network uses $N_{c}$ sample points to evaluate the
111+
expected color of the ray with the stratified sampling. Based on these
112+
results, they bias the samples towards more relevant parts of the
113+
volume.
114+
115+
$$\hat{C}_c(r) = \sum_{i=1}^{N_{c}}w_{i}c_{i}, w_{i}=T_{i}(1-exp(-\sigma_{i}\delta_{i}))$$
116+
117+
A second set of $N_{f}$ locations are sampled from this distribution
118+
using inverse transform sampling. This method allocates more samples to
119+
regions where we expect visual content.
120+
121+
Results
122+
=======
123+
124+
The paper goes in depth on quantitative measures of the results, which
125+
NeRF outperforms existing models. A visual assessment is shared below:
126+
127+
![Example of NeRF results versus existing SOTA
128+
results](assets/NeRFresults.png "fig:") [fig:Figure 3]
129+
130+
Additional references
131+
=====================
132+
133+
[What’s the difference between ray tracing and
134+
rasterization?](https://blogs.nvidia.com/blog/2018/03/19/whats-difference-between-ray-tracing-rasterization/)
135+
Self explanatory title, excellent write-up helping reader differentiate
136+
between two concepts.\
137+
[Matthew Tancik NeRF ECCV 2020 Oral](https://www.matthewtancik.com/nerf)
138+
Videos showcasing NeRF produced images.\
139+
[NeRF: Representing Scenes as Neural Radiance Fields for View
140+
Synthesis](https://towardsdatascience.com/nerf-representing-scenes-as-neural-radiance-fields-for-view-synthesis-ef1e8cebace4)
141+
Simple and alternative explanation for NeRF.\
142+
[NeRF: Representing Scenes as Neural Radiance Fields for View
143+
Synthesis](https://arxiv.org/pdf/2003.08934.pdf) arxiv paper\
144+
[CS 231n Spring 2021 Jon Barron Guest
145+
Lecture](https://stanford-pilot.hosted.panopto.com/Panopto/Pages/Viewer.aspx?id=66a23f12-764c-4787-a48a-ad330173e4b5)

0 commit comments

Comments
 (0)