Skip to content

WeiyuDu/Project3-CUDA-Path-Tracer

 
 

Repository files navigation

CUDA Path Tracer

University of Pennsylvania, CIS 565: GPU Programming and Architecture, Project 3

  • Weiyu Du
  • Tested on: CETS Virtual Lab

Part 2

Refraction

Refraction rendering with Frensel effects using Schlick's approximation

Depth of Field

From left to right: focus on foreground, focus on background

Stochastic Sampled Antialiasing

From left to right: rendering with antialiasing, rendering without antialiasing. Please zoom in to see the difference on the edge of the sphere. The left has a smooth edge while the one on the right is more rigged.

Arbitrary OBJ Mesh Loader

Performance comparison regarding bounding volume interseciton culling (measured in time per iteration):
OBJ file bounding volume intersection culling naive implementation
Sphere 98.122 129.479
Wahoo 1068.55 1453.84
Stanford Bunny 11970.6 22964.9

We observe that such optimization reduces the run time per iteration consistenly across different obj files, specifically, the more vertices an obj file has, we observe more significant improvement using bounding volume intersection culling.

Stratified Sampling

  1. Comparison of stratified sampling (10x10 grid, left) and uniform random sampling (right) at 5000 iterations

  1. Comparison of stratified sampling (10x10 grid, left) and uniform random sampling (right) at 100 iterations

Motion Blur

  1. Defined motion in scene file

  1. User input camera motion (user drag the camera while rendering)

Part 1

Render Result

Analysis

  1. Plot of elapsed time per iteration versus max ray depth (timed when sorting_material set to true)

  • We expected that sorting the rays/path segments by material should improve the performance, because this will make the threads more likely to finish at around the same time, reducing waiting time for threads in the same warp. However, in reality we found that rendering without sorting is actually significantly faster. This may because that there isn't a variety of different materials in the scene. Since we're sorting the entire set of rays, this operation takes much more time than it saves.
  • From the plot above we see that increasing max ray depth results in longer run time per iteration. Rendering using first bounce cache is consistently faster than rendering without cache, though not by a large margin. This is expected as we save time by avoiding the initial intersection computation.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • C++ 76.5%
  • Cuda 11.7%
  • CMake 6.8%
  • C 4.6%
  • Makefile 0.4%