Surface Normals & Properties Estimation on Point Sets
Point sets or point clouds (PC) are collection of points that represent a 3D shape. It is an inherently discontinuous primitive compared to triangles and volumetric for mesh representation. Often PC’s are acquired without normals from 3D scanners for engineering, medical, or visual effects applications. Point Normals are crucial for shading, determining connectivity among neighboring points, forming surfaces, and geometry processing operations such as denoising and resampling. The purpose of this project is to implement normals estimation techniques for point clouds, and explore the relevant surface parameters.
NB: Posts are excerpts from ongoing thesis reports.
Summary video of completed thesis
3 new edit brushes add new structured points & retopologize the existing pointcloud:
- Sharpen: add creases like NURBS weight, except this allows a drawn curve on an arbitrary point surface. Existing features can be sharpened using other previous brushes.
- Uniform: simple color line.
- Tapered: brush width grows & tapers by a remap function, similar to Photoshop shape fade brush.
Added real-time smoothing of the drawing curve. Editing in a raytracing GPGPU pipeline is a bit odd, so each brush has specific requirements scattered throughout the raytracer. For example, the smoothing must be real-time to smooth each latest point because the neighbors disappear once a ray finishes.
Implicit surface updates to editing in realtime.
Edit functions: change radius for smoothness or sharpness, draw & visualize point attributes, draw new points along surface or extend to fill holes.
Octree leaves artifacts during editing; restores to normal after file save & reopen. It’s a tradeoff of speed for blockiness. I’m considering if brush effect can remain consistent across voxels without sharing data. Thread syncing is fast but threads/rays of different SM’s that hit adjacent voxels can’t sync.
Adding read/write ability to data on GPU does cost some overhead, along with extra management of new points and changing data. Fortunately some tweaks and cleanup sped up the original timing, such that the view updates in realtime while editing.
The new points appear after a drawn stroke since updating immediately allows point strokes to fly off the surface. It’d be good for drawing fur, like one of the tangent brushes in zbrush, but the purpose here is to add points on the implicit surface.
Two other methods to control feature sharpness’ll be shown later if I can get them to cooperate with the editable setup.
Houdini fileSOP uses gply external command to convert a [b]geo geometry to ply (in Houdini bin). Partly due to PLY’s flexibility, the output isn’t compatible with the PLY reader in Point Cloud Library (PCL), and possibly other common PLY readers. The PLY_exporter I wrote today is an alternative in a Python SOP node. The file header is in a simple heredoc variable in the script so users can easily customize it to exactly how their input software requires.
It’s bare-bones so it assumes P[xyz] and N[xyz] point attributes. Primitives should behave the same as gply. For my raytracer application, I transfer primitive normals to point normals, delete primitives (faces), then export the points & normals. Actually normals is not necessary since the app can estimate normals. In the figure notice most apps don’t use Pw (weight) for polygon/point data, but seems there’s no way to stop Houdini from outputting Pw.
(Left: PLY_exporter. Right: gply)
I’m not familiar how to share tools besides email. So the host could be slow:
xyzrgb dragon 1.2 million pointcloud
Count: 1.2 million points (1204929)
Model: XYZRGB dragon decimation (nonuniform point spacing)
Speed: similar to bunny model
Implicit surface & disk renders of pointcloud with normals, diffuse & shadow.
Octree construction memory decreased by 1/1600 to allow models greater than 400k points.
Normals estimation: 2 learning iterations
Normals unification: Multi-degree with metrics confidence, angle & residual.
Radius for shading: geometric pair fill & triangle fill
Reflection & Radius fix
Fixed major bugs in shading radius. Now speed is almost constant regardless of points count.
Fixed a reflection bug. Larger models are slower in reflection.
Models: 36k, 435k & 1204k points.
Varying radius affects both shading & reflection smoothness.
GTX 465 is quite old by now, but after the bug fixes the speed is around 30ms at the default far view, then gets slower with more zoom. Reflection’s about 4x slower, with 5 rays per pixel it seems reasonable.
Implicit surface: higher quality than disk render, more accurate surface silhouettes.
Raytracing at interactive rates (video sped up)
1) No octree
2) In octree, radius 0.1: takes 60% time, very small speedup (compared to disk) since smooth surface requires more points. A denser model will benefit more from octree.
3) Radius 0.05: 36% time of 0.1 radius is a significant speedup. Smaller radius has less octree artifact due to less overlap, but less points mean it is statistically less stable as shown by the noise around ears.
Clustering – Point Cloud Simplification
Read the rest of this entry
Iteratively project point x onto implicit surface, until x converges:
Video details: Diffuse pass, 1 directional light.
1) Octree partition artifact: overlap points duplicated across octree partitions to remove holes.
2) Redundant points from different leaves: fixed to avoid shading an overlap point twice.
3) Some backface and frontface fixes (a bit hacky), so the far frontface gets occluded by near frontface, regardless how large a leaf is.
4) Smooth Silhouette: removes the “disk” look.
Diffuse 01: previous artifacts
Octree Partition artifact:
Any duplicate points may cause discrete shading, which isn’t addressed in any papers that suggest redundant points in voxels. Tracing 2 rays 1 pixel apart, pixel 85 hits 4 leaves while pixel 84 hits 5 leaves. 2 overlap points appear in 3 leaves. Each point can appear in up to 7 leaves, and both rays hit 3 of the leaves.
p2 [85, 256]: dist prune IN. pos: 1.398461 1.886459 1.376074
p1 [85, 256]: dist prune IN. pos: 1.395789 1.871096 1.369734
Pixel 85 gathers 2 duplicates per point, while pixel 84 gathers 3. So the former shades 17 points while later 19.
Problem 1: Different nnr_count (final shaded points count) results in different shading. I didn’t find a shading method that ignores duplicates (without explicit duplicates removal).
Problem 2: Overlap points are weighted multiple times if they exist in multiple leaves the ray hits.
I considered over three methods beforehand, but assumed the problem wouldn’t be visible and trusted the way mentioned in the papers.
Solutions: Either need to ensure overlap points are “unique” at octree construction phase. Or include the entire voxel which a point overlaps, forming a kind of nearest neighbor voxel structure.
High % of extra overlap points makes storage inefficient. Previous papers cite 3x-8x increase. Overlap points grow at more than 2x per level when the level scale is small, so need to detect when to terminate a branch. Yet point count has great effect on time, eg 150 points 160ms, 75 points 90ms. So need to balance between overlap redundancy vs point count per voxel.
video speed 4x. Screenshot explanations below.
Interactive octree visualization of the bunny ears. Red shows leaf nodes, green internal nodes. Various pointers and memory issues finally fixed after a whole week merging the Nvidia traversal algorithm with the system.
device time: 8.40~20.55ms
Raytracing oriented disks. Silhouette shows disk primitives, but has the least artifact from octree partitioning.
device time: 17~71ms
With nnr_count>=3, silhouette shows the artifacts. Compare with non-octree renders (previous post).
device time: 71ms
Octree partitioning artifact is more obvious for large radius. Points in a voxel are missing neighbors required to calculate a continuous silhouette & shading.
device time: 73ms
Each ray merges all points from leafnodes traversed, so two ears appear stacked on top transparently.
Non-octree raytrace times:
device time: 928~950ms
host time: 935~1000ms
Octree traversal overhead is small (<21ms). And octree PCD render takes 8% the time, or ~13x speedup due to octree. The host time is however highly unusual, seems to be 7~8x the device time. Device code supposedly does not send image data to host. There is a significant amount of CUDA printf’s compared to the non-octree version, but it should be counted as part of CUDA event?
Speed is highly sensitive to hit rays count, suggesting each ray shading is slow.
1) Diffuse shading. Currently only flat point color rendered. Diffuse exposes artifacts and noise better.
2) Octree partitioning artifact:
A common method [Kashyap10] describes is insert duplicate points into every voxel that contacts the point with respective radius. They further implement culling rays that remove the added duplicates if they are not visible from test rays. Resulting points data is 2~3x the original. Both the voxel-sphere hit test and culling take significant preprocessing time.
Another method briefly suggested is each leaf maintains neighbor leaves, but how a leaf discovers and stores neighbors from the same or different level requires further research.
3) Leaf occlusion: distance threshold heuristics?
4) Discover device and host time relationship
This is the first version to actually render using PCD and octree. So there is no optimization yet. Focus is on making parts work and fixing artifacts first. A later priority is to simplify the multiple for loops in gathering leaf points and shading.
An inefficiency issue is a few points (eg right ear tip) occupy a large voxel at coarser level (eg level 2). It means many rays hit the large voxel that contains points in a much limited bounding region. Here KD-tree or BVH has pruning advantage, though the octree suitability for GPU has its own strength that may need benchmark comparisons.