New NVidia Cards

OK, not exactly a feature request. With the new Volta NVidia cards coming to market soonish, and there ability to do “real-time ray tracing”, makes me think this could directly relate and impact Reality Capture’s performance. I believe the ability is enabled by the tensor cores and the hardware is already available on the market in the latest Tesla and Titan GPUs. Is there anything in testing? Is the tech incompatible with your current algorithms? Has this been considered? Or am I just way off the mark here?

How much time will it save? i find with my 2 GTX 980 Ti’s, most of the time is wasted with CPU computation or HD operation. My GPU only reallly works for 30% of the reconstruction stage and thats it. The rest of the work seems to come from the CPU, maybe im mistaking? (Or I should be caching files on an SDD)

I believe the biggest chunk is reconstruction up to 70%. Then a sprinkling of GPU acceleration here and there ( the remainder of reconstruction, coloring, texturing, ect…)

From observation of files accessed it looks like up to 69% RC feeds your images into the GPU and stores the results in small chunks in the cache then 70% and up it feeds those cache files back into memory to be aggregated into larger chunks. 

Copy is data going in and out of the GPU and is still being used. I thought about this part a lot as I don’t see any particular bottle neck (HDD, SSD, GPU, CPU, ect…) and just assume this must be due to this part of the calculation must be done on both the GPU and CPU. It might also be that this computation is not very parallel processing friendly, and one component must wait for the results from the other before it can continue. Don’t take this too literal as only the RC devs could know for sure. 

Task manager is a good way to see what your bottle neck in you system is at a particular point in the process. Is the CPU/GPU pegged close to 100% or is the HDD with the cache on it at 100% with low CPU or GPU utilization.

Mike i just did some hard drive bench-marking

Results are in seconds

https://docs.google.com/spreadsheets/d/12IcYfNLF6szGXG2qcaJFregFYY6SoaFbznerD0THl1E/edit?usp=sharing

under test 2 & 3 look at the mechanical drive. If 2 cells are merged it because i didn’t catch it at the right time and had to just use the total. I know that might not make sense.

Your test means proves that cache on SSD drive speeds up total processing 2 times?
Cache on SSD = 2 time faster processing. I understand it correctly? For your dataset and your PC specs ofcourse.

 

You have to understand that this test was designed to stress the drive to be utilized more than your typical work load, but I would feel comfortable saying with a decent CPU and GPU getting an SSD would at least run 1.5 times faster. If you have a 10yr old CPU, your not going to see any speed increase.

The random read and write specs are whats important here as the cache files seem to be just a ton of small files. 

 

NVME SSD 

you can see cache files are quite small and that 4KiB Q1T1 is about the performance you can expect for this drive to read these files. 

 

Mechanical HDD

As you can see a mechanical HDD is much slower at reading these files. 

 

Thank you for this test Steven Smith. That’s what I tried to realize.

I have good PCs and good SSDs, but I using SSD not optimal. Now I know what to do!

Your welcome :wink:

I’m now guessing there will be a speed improvement from the tensor cores, but not from the ray tracing aspect. :wink:

3 600$ Nvidia Titan V
5 120 CUDA cores
640 Tensor cores
Let’s test it!
https://www.amazon.com/NVIDIA-TITAN-VOLTA-12GB-VIDEO/dp/B078G1VHYN

Yes sir, send me half the money and I’ll start testing right away!

By the way those are $3500 

Volta has alllegedly more faster cores, but it’s not enough for ‘real time’, we’ll perform some tests though and we’ll see

Yeah, “real time” ray tracing, I believe is selectively tracing a few rays and then using noise reduction to smooth out the gaps. It also appears that this is enabled by the tensor cores.

I have no expertise in  this field. That being said, I find myself wondering how much calculus and trig is similar between these 2 applications. I imagine there might be another layer of optimization from this new architecture. Not just a raw compute increase.

I believe this was geared towards games and the api’s might not be released. I would imagine you guys have good contacts with NVidia’s engineers and might be able to snag a preview CUDA sdk or GameWorks api’s in a beta testing version. 

I know if you did, you most likely couldn’t spill the beans. Just fishing for a teaser, a “Something Good is Coming”

Anyone have the new RTX 20XX series benchmarks?