Giving up... for now

Wishgranter- I cleaned the inside of laptop not too long ago when I upgraded ram so i don’t think that is an issue.

Out of interest I re-ran a little project that I had successfully processed a couple of months ago (so a previous build of RC).
Only 105 images, small flight over a couple of stockpiles- fantastic output.

Guess what… it failed this time! Same thing “Unknown error, CUDA error : 30 :unknown error”
There has got to be something going on in the latest build that has changed and it doesn’t like my hardware or OS?

Can someone reset my activation period so I can try this on same machine but with windows 10?

are you running dual gpu in a laptop?

that looks like a gpu over heating error.

if you are running dual, I’d disable one.

chris wrote:

are you running dual gpu in a laptop?

that looks like a gpu over heating error.

if you are running dual, I’d disable one.

Yes. but doesn’t make sense why it worked previously with 2x GPU for this small project and now it doesn’t?? I’ll give it a go however.

Hey ,

I am running Windows 7…

tried both combinations- 1x GPU, 2xGPU no luck fails in both instances

Hi
We have NOT touched the reconstruction algorithms for over 2 years… So it does not depend on the actual RC build

What is your PRECISE notebook model ? I would like to check its cooling capabilities

The thing is you need to understand that NVDIA/AMD have their GPUs segmented per use-case.
The x80 x70 are the power kings
x60 x50 are middle class
x40 x20 x10 are a LOW-end class

This classification is done on power ( TDP ) and performance and few more things.
What is nowhere presented so much is that their COOLING requirements are set per this segmentation. And mostly you can find a specification of occasional gaming, or fast processing etc.
And this means the x40 - x10 GPU series is developed for certain power AND workload. The x40 - x10 are for OCCASIONAL gaming or work, nothing that lasts for a LONG time, as the DESIGN of the GPUs is not capable of sustained, high duration use. So in short, the cooling design and so on is NOT designed so that they can work for a longer time (30+ mins ), as the power and cooling is NOT capable of sustainability over longer time.
This can be clearly seen on the reported user base, as CUDA error 30 ( + few more ) is related ONLY to the low-end GPUs.
We have NO reports on this issues on x70 x80 cars in desktop PCs - if they are properly cooled !!
There have been few x70 x80 in notebooks but, when we have inspected the THERMAL tests, we have found out that they have PROBLEMS with a proper longer sustained workload, because the notebooks have NOT been designed for this sort of work, mostly SLIM or very low-weight versions.

Take a look here: http://www.notebookcheck.net/Review-Ali … 257.0.html
Scroll down to the TEMPERATURE
However, the reader should note that this an extreme scenario (full load for several hours) which will probably not occur in an everyday use.
BUT keep in mind that you have a DUAL 460M cards there !!! so it raises the temperature a lot…

Wishgranter, thanks for the info.
Laptop model is Alienware M18x R1. Might look at upgrading GPUs to something like a GTX 765

or 2 of these…
http://www.ebay.com.au/itm/Top-Nvidia-GTX-770M-3GB-GDDR5-MXM-3-0b-Video-Card-for-DELL-ALIENWARE-M17-M18X-R1-/252512168188

What I find odd is that this machine has been working OK- yes it would fail on bigger projects or on higher detail but now fails almost every time- now I can’t even reconstruct data that I had successfully processed previously!? Also seems odd that the GPU is overheating (if that is the issue) so quickly now (within a minute or 2 of starting the reconstruction)- it’s not like it’s running for 30mins + before it fails.

Hi
Yup, a better solution than the 460M GPU…

I also find it odd that projects were working before for and now is no longer working and projects that weren’t working for me are working now? I know you say the algorithm hasn’t changed but what did change between then and now? Surely, some bug fixes? I know I let RC send crash reports for many of my crashes.

Also, as a developer of CUDA based programs and a speaker at GTC, wouldn’t you want to lobby NVidia to provide better reporting for this type of failure? I mean “unknown error” isn’t really acceptable. If the driver restarted because of thermal fail safe, then it should report as such. The projects I have run recently are running at the SAME temp as they were when they were failing. I ran a project for 6 hrs a few nights ago with sustained temps, it didn’t give up on me.

I would say, provide with an earlier build and have him test it. Sometimes seemingly unrelated things are indeed related.

BTW, I am happy to have stability now, I am just the type of person that wants to know how and why something works :slight_smile:

Waited for licence to renew and loaded RC on other work laptop. It is a HP zbook i7-4800MQ @2.7GHz, 32GB ram Nvidia quadro K3100M GPU

Loaded up a new project with 335 images- ran for a couple of hours then failed- out of memory. I can’t seem to get anything to work with this program anymore.

Hi
Contact me at milos.lukac@capturingreality.com. We have an internal build and it seems it could be the problem solver for this… so a perfect testing environment…

email sent, thanks
pete

that looks like you need to make you pagefile larger.

don’t let windows manage it. it will crash before it makes it bigger.

Chris- could well be. Noticed windows hasn’t got a very large allocation for this, and at present I’ve only got one conventional HDD on this machine so I’m limited to space full stop.

How would you recommend page file be managed? I can’t really change the boot/OS drive on this laptop as it’s a corporate machine- but can add in an extra SSD as a scratch disc- would you put page file on that?

cheers
pete

yeah if you running out of ram and need pagefile, then as fast as you can get it good.

and hard drives are very bad for it.

I’m running 2 x 950 in raid 0 for my pagefile. but just getting a single ssd will make a huge difference.

in the past I’d run in on hard drive and single ssd. the biggest jump is getting away from hard drives and their terrible response times.

also having temp drive on ssd is good. but page file is more important to be fast. you could probably get away with both pagefile and temp on same drive if it had enough space. it doesn’t seem to hit them both hard at the same time.