CUDA Path Tracer

University of Pennsylvania, CIS 565: GPU Programming and Architecture, Project 3

Weiyu Du
Tested on: CETS Virtual Lab

Part 2 Refraction

Refraction rendering with Frensel effects using Schlick's approximation

Depth of Field

From left to right: focus on foreground, focus on background

Stochastic Sampled Antialiasing

From left to right: rendering with antialiasing, rendering without antialiasing. Please zoom in to see the difference on the edge of the sphere. The left has a smooth edge while the one on the right is more rigged.

Arbitrary OBJ Mesh Loader

Performance comparison regarding bounding volume interseciton culling (measured in time per iteration):

OBJ file	bounding volume intersection culling	naive implementation
Sphere	98.122	129.479
Wahoo	1068.55	1453.84
Stanford Bunny	11970.6	22964.9

We observe that such optimization reduces the run time per iteration consistenly across different obj files, specifically, the more vertices an obj file has, we observe more significant improvement using bounding volume intersection culling.

Stratified Sampling

Comparison of stratified sampling (10x10 grid, left) and uniform random sampling (right) at 5000 iterations

Comparison of stratified sampling (10x10 grid, left) and uniform random sampling (right) at 100 iterations

Motion Blur

Defined motion in scene file

User input camera motion (user drag the camera while rendering)

Part 1 Render Result

Analysis

Plot of elapsed time per iteration versus max ray depth (timed when sorting_material set to true)

We expected that sorting the rays/path segments by material should improve the performance, because this will make the threads more likely to finish at around the same time, reducing waiting time for threads in the same warp. However, in reality we found that rendering without sorting is actually significantly faster. This may because that there isn't a variety of different materials in the scene. Since we're sorting the entire set of rays, this operation takes much more time than it saves.
From the plot above we see that increasing max ray depth results in longer run time per iteration. Rendering using first bounce cache is consistently faster than rendering without cache, though not by a large margin. This is expected as we save time by avoiding the initial intersection computation.

Name		Name	Last commit message	Last commit date
Latest commit History 122 Commits
cmake		cmake
external		external
img		img
scenes		scenes
src		src
stream_compaction		stream_compaction
.cproject		.cproject
.gitignore		.gitignore
.project		.project
CMakeLists.txt		CMakeLists.txt
GNUmakefile		GNUmakefile
INSTRUCTION.md		INSTRUCTION.md
Project3-CUDA-Path-Tracer.launch		Project3-CUDA-Path-Tracer.launch
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

CUDA Path Tracer

Part 2

Refraction

Depth of Field

Stochastic Sampled Antialiasing

Arbitrary OBJ Mesh Loader

Stratified Sampling

Motion Blur

Part 1

Render Result

Analysis

About

Releases

Packages

Languages

WeiyuDu/Project3-CUDA-Path-Tracer

Folders and files

Latest commit

History

Repository files navigation

CUDA Path Tracer

Part 2

Refraction

Depth of Field

Stochastic Sampled Antialiasing

Arbitrary OBJ Mesh Loader

Stratified Sampling

Motion Blur

Part 1

Render Result

Analysis

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages