scalability

Anonymous_f1f6392806 · July 2, 2017, 8:55am

Hi there !

I have been experimenting with RC for 9 months now. I am processing on a ROG laptop with GTX 1070 and 64 GB RAM that is quite fast and (almost) portable. I am also using a lighter ROG with 980M and 32 GB RAM.
Now experimenting with high resolution images (42 MP and 50 MP), processing time (even with Reality Capture) tend to be much longer even for a limited number of images (around 1000-1500). And I guess that the memory is a little bit low (I hit the limit and had a hard RAM swap on a reconstruction, had to force reboot and start again).

What computer can I get to improve the speed of processings ? SLI of 1080TI ? 128 GO RAM or even much (256 GO ?)
I guess that the limit is set by the license : 3 GPUs and 32 CPU cores.

Do you have any examples of configs for sale in the real world ?

Of course, cluster processing would be great, but it seems to be not ready yet : Cloud computing support (distributed rendering)?

Thanks for any experience or advice.

wishgranter042 · July 2, 2017, 9:34pm

Hi Jonathan_Tanant

First take a look on hw + sw specification here OS and hardware requirements

So RAM is mostly limiting on ALIGNMENT step, but not so much on other processing steps. RAM is in most cases is used only as CACHE, to speedup the process, not so much as requirement.

What computer can I get to improve the speed of processings ? SLI of 1080TI ? 128 GO RAM or even much (256 GO ?)

For recommendation, please click on MODEL in 1Ds view, so we can see processing times ( DEPTH MAPS, MESHING, Tx and etc ) on particular steps.

Anonymous_aa64e24c61 · July 3, 2017, 6:35am

Hehe, “a little bit low” is quite subjective!
I would also like to know what you are looking at exactly.
Have you tried the same set with other SW?

Anonymous_8c4f113b1e · July 4, 2017, 11:56am

Thanks for the advices and the link.

So if I understand well, memory is only an issue on alignment, it shouldn’t be on reconstruction and texturing ?
Then GPU is at the work during reconstruction.

What I am basically trying to do is raise the quality of reconstruction and at the same time lower the processing time (yes, I know, raise quality should raise the processing time, and that is why I need a more powerful hardware).
And I know that the same alignment on Photoscan would have taken maybe x10 the time.
With 50MP or 42MP pictures, I have about 400,000 features per image (low sensitivity - small overlap). Maybe that is a little bit too much and I should align with bigger overlap and/or a x2 downsampling ? I already tried (this is much faster) and I still have to do more tests to look at the quality impact.

So the hard memory swap I had during reconstruction can’t be explained by a lack of memory ? It happened 2 times, at the end of the processing (when the very last parts are reconstructed with a small number of vertices (a few hundreds) - this was on a 400 parts model, calculation duration was about 8 hours).

What I am looking at is real configs for sale in the real world. I just can’t find a config with 128 GB/256 GB RAM with multi CPU (up to 32 cores max (including hyper-threading ?)) and 3xGPU for sale… I guess I should build it or purpose. Any advices on motherboards with 128 GB/256 GB RAM and 3xPCI fast express welcome !

Thanks !

Anonymous_8c4f113b1e · July 4, 2017, 12:32pm

I found this motherboard :
https://www.asus.com/Motherboards/Z10PED8_WS/

-2 CPUs (with 2 8-cores 16-threads i7 this would give us 32 cores ?)
-512 GB RAM.
-4 fast (x16) PCI express ports, let’s put 3x 1080 Ti.
Would be quite a monster, what do you think ?

Anonymous_aa64e24c61 · July 4, 2017, 12:54pm

Hey Jonathan,

Don’t have the time right now to follow up on that but if you find a nice setup with good value/cost ratio, let me know!
BTW, what is your budget? Because feeding those 512 GB will leave a big hole in it! :lol:

Anonymous_aa64e24c61 · July 4, 2017, 12:56pm

I think I’d rather dump it onto one or 2 GPUs with as many CUDA cores as possible.
But sadly, a second GPU will only result in about 30-40% more speed…

Anonymous_8c4f113b1e · July 4, 2017, 1:30pm

Yes, you are right, 512 GB of RAM is going to be a little bit costly
So you think that double the GPU does not double the processing power ? only 30-40% increase ? Can someone from CR confirm ?

Anonymous_56a2620268 · July 4, 2017, 1:46pm

Hi Jonathan,
I’m new to RC, but my experience so far is that the GPU processing portions of the workload seem to get done very quickly and then we are back stuck waiting for the cpu bound tasks.
I’m using dell t7910 (dual 4C xeon@3ghz, 128gb ram, twin gtx1080 NOT sli configured) and atm pushing 300ish photos in per model.
With this workload/configuration RC seems to use peak 50gb ram, generally more like 20gb most of the time. My modelling runs are taking 30 minutes ish using the “workflow” creating reality defaults. Aligning images only usually takes 5 minutes or so (I’m still getting used to the types of pictures and required control points so this may change.

Given the GPU’s get their part of the job done very quickly anyway, I’d suggest spending more on ram and cpu than lots of gpu for a better balance for the whole workflow.

Just my $0.02
Jen

Anonymous_56a2620268 · July 4, 2017, 2:03pm

Jonathan_Tanant wrote:

I found this motherboard :
https://www.asus.com/Motherboards/Z10PED8_WS/

-2 CPUs (with 2 8-cores 16-threads i7 this would give us 32 cores ?)
-512 GB RAM.
-4 fast (x16) PCI express ports, let’s put 3x 1080 Ti.
Would be quite a monster, what do you think ?

For the record - not i7 there - you need xeon chips for this. But on the upside, with this design you get more processor attached PCIe lanes. But for most CUDA processing, the bus bandwidth isn’t so important so whatever you can get. High clock speed xeons tend to be pricy and server mainboards prefer ECC ram (also pricy). Having said that - you can stick 18C processors in each socket, 4xgtx1080 (or even 4x titan Xp) and up to 265gb ram. Much go fast… almost as fast as it would depreciate

Anonymous_aa64e24c61 · July 4, 2017, 2:55pm

Hey, somebody with serious HW knowledge!
Watch out Jennifer, soon you’ll get PMs with advice requests! :lol:

Jonathan, these numbers don’t come from me but *drumroll and fanfare* the Internet.
Some of it in this forum.
It’s almost impossible to get a real number because it varies greatly with the setup and image set etc.
But as a general rule you get a notable increase with the second, a tiny one with the 3rd and from then on it’s homeopathic.
So it’s nowhere near double or triple.
I have no statistics, this is from memory of my own research(es).

I would be inclined to agree with Jennifer that mor CPU is probably more sensible.
But maybe you are lucky and Wishgranter will leave a post…

Anonymous_aa64e24c61 · July 4, 2017, 2:58pm

But then again, I am doing sizeable stuff (upt to 2500 imgs) and my setup is pittyful, as you can see.
Yes, the calculation times can get long, but do you want to spend 10 grand only so your model will be finished in 5 instead of 12 hours? Computers don’t get paid overtime or weekend bonus…

Anonymous_8c4f113b1e · July 4, 2017, 3:12pm

Wow, this is getting serious…
Thanks Jennifer and Götz, these informations are very valuable !
The Dell T7910 looks really good, I am trying to see what would be the best value/cost ratio.
Putting my numbers I am getting a nearly 20k€ config… Maybe a little bit expensive

So I guess that based on what you said, a good compromise (that would still be a beast) would be :
256 GB RAM
DUAL XEON E5 16C
DUAL 1080 Ti (I can’t see the option on the website, did you install them yourself Jennifer ?)

My typical project in RC :
-1000-2000 pictures to align at once (42 MP - 50 MP DSLR/Hybrid)
-merge about 10 to 20 components and reconstruct around 10 000 - 30 000 pictures models.

Milos, any idea ?

Anonymous_aa64e24c61 · July 4, 2017, 3:46pm

Jees, 20k?
What are you still thinking about?
For that money you can’t do so much wrong.
Surely you know this page already?
https://www.pugetsystems.com/recommende … mendations
It’s for a competitor but I guess the general requirements are similar…

ssh4 · July 4, 2017, 6:55pm

256 GB RAM
DUAL XEON E5 16C
is enough for 100K images project with components workflow.

Check Nvidia presentation where Capturing Reality show such project with 100K images.

Anonymous_56a2620268 · July 5, 2017, 12:13am

Jonathan_Tanant wrote:

Wow, this is getting serious…
Thanks Jennifer and Götz, these informations are very valuable !
The Dell T7910 looks really good, I am trying to see what would be the best value/cost ratio.
Putting my numbers I am getting a nearly 20k€ config… Maybe a little bit expensive

Hi Jonathan… yes the GTX1080’s were installed after purchase.
The base machine was about $AUD 6000 and the graphics cards were about $1k each.
Generally Dell sell these systems to companies via value adding suppliers who get a much better price than the one listed on the website. You can also watch their factory outlet if they have one in your country for good deals on dell systems.

You can’t see the option for the gtx graphics because these machines are usually provided with Quadro series cards (professional graphics rather than gaming). If you just want GPU compute performance, the TitanX pascal /Titan Xp give you better compute and a bit more memory on a per-slot basis. PSA: you can’t close the T7910 case with a full height double slot card in the upper bank of PCIe slots. For what it’s worth, my system is configured for DeepLearning, but turns out it works pretty well for VR and this image processing stuff too.

Jonathan_Tanant wrote:

So I guess that based on what you said, a good compromise (that would still be a beast) would be :
256 GB RAM
DUAL XEON E5 16C
DUAL 1080 Ti (I can’t see the option on the website, did you install them yourself Jennifer ?)

My typical project in RC :
-1000-2000 pictures to align at once (42 MP - 50 MP DSLR/Hybrid)
-merge about 10 to 20 components and reconstruct around 10 000 - 30 000 pictures models.

Milos, any idea ?

Not much compromise in there The high core count xeons are adding a lot to your cost. You could probably pull back to dual 6-8 cores and a slightly higher clock rate and still be happy. (and save $10k) But if you don’t need the extra PCIe bandwidth, maybe even wait a bit and see how well AMD does with the threadripper/x399 systems.

Happy shopping
Jennifer

wishgranter042 · July 5, 2017, 8:39am

HI all

Highly recommend to wait on AMD offers ( Threadripper and EPYC ) as it will shake INTEL prices and we will see some interesting solution ( price/performance )

for now AMD solution is focused on storage ( https://www.supermicro.nl/products/nfo/ … .cfm?pg=SS ) and they will release Workstation/HPC solution soon ( 1-2 months max away )

But its better to have lower core count at higher clock as many core and low frequency ( low singlethread performance )
For GPUs for now is the sweetspot with 3 GPU, adding 4th add just approx 10-12 % speedup, and this is very well know issue with PCI-E latency what is hard to overcome sometimes.

Anonymous_56a2620268 · July 5, 2017, 2:19pm

Though we are getting out of the commonly available hardware, the multi-cpu systems to help with PCIe loading since you have busses distributed around the cpus (providing your affinity code works correctly anyway).
And not all video cards are the same when it comes to cuda processing… the new pascal cards are much faster for GPU processing than previous generations. And the Titans have more memory and better bandwidth per card. P100’s are better yet.

Just depends on how much your time is worth… as Götz said - sometimes better to just let it run over a weekend than over-capitalise. Also, no one has mentioned using cloud resources… All the big cloud vendors have GPU accelerated instances. You can just pay for your hardware by the minute rather than spending $10K on a box you only keep busy 5% of the time.
Cheap as chips and just pay for what you need that day, or hour by hour even… save more if you can wait and get spot priced instances. (Also a great way to get into machine learning)

For me, this is a basic deep learning system that I’m just hijacking to do some extra prototyping for some workflows around the company. If we do more/lots, I’ll probably foist it off on azure rather than buy hardware.
Fun discussion…
Jennifer

Anonymous_aa64e24c61 · July 5, 2017, 3:37pm

Hey Jennifer,

yeah, some nice info here!
With Jonathans estimated project size I think he needs to dip a bit into the higher end though…

How does that cloud GPU stuff work? I’ve been looking on and off for something like that because I also thought wouldn’t it be nice to just rent some processing capacities? No worries about copyright or anything since only bits and bytes will be flowing through the pipes instead of whole models (good for paranoid users like me).
But I guess the software needs to be able to support it from within, right?

Anonymous_56a2620268 · July 6, 2017, 12:57am

Götz Echtenacher wrote:

Hey Jennifer,
…
How does that cloud GPU stuff work? I’ve been looking on and off for something like that because I also thought wouldn’t it be nice to just rent some processing capacities? No worries about copyright or anything since only bits and bytes will be flowing through the pipes instead of whole models (good for paranoid users like me).
But I guess the software needs to be able to support it from within, right?

Most of the current cloud providers just rent you “systems” of various sizes and optimizations (compute, storage bandwidth, gpu, etc). So you could grab a tiny instance (1 core, couple gb of ram but good network allowance) to load your images, maybe do some control pointing. Then get a large instance with lots of cpu and ram for image alignment (24 cores, 600gb ram?) then swap to a GPU enhanced instance for meshing. They make it easy by having preconfigured systems so you can grab a windows10 pre-made (no worries about OS licence or configuration). We’re a microsoft house so tend to use azure, but a lot of the machine learning people are using google or AWS and there are pre-made configs for most of the machine learning platforms.
ATM the GPU optimized instances tend to use K80 GPUs and you pick a size (1/2 k80 - 2xk80) for the job. Google are promising P100 accelerated instances (~40% faster). For the more expensive instance types, if you aren’t in a hurry, you can wait and pick up spot pricing (spare capacity on sale if someone hasn’t booked it). There are apps to help you save money on your instance bookings if you are doing this a lot. (The machine learning beginners are always looking to suck up all the low cost compute they can get )
Yell if you want specifics… I’m hesitant to get too far off the mainstream in Jonathan’s discussion
Jennifer