While upgrading from 5.5 to 5.6 it appears bExperimentalVMEnabled was removed and with it the legacy ExecVM code path forcing use of the newer code path which revealed a crash due to memory stomp in a few of our Niagara Systems. This crash is also present in 5.5 but we weren’t seeing it due to using the legacy code path.
This occurs at this line due to ConstMapCacheIdx[i] being out of bounds:
The reason that ConstMapCacheIdx was stomped is because in AllocVectorVMState() there is overlap between ExtFunctionTable and ConstMapCacheIdx as the former is allocated using less bytes than the actual size of the function table.
The system causing this crash generated a shader with 304 function calls and over 11k lines. OptimizeVectorVMScript() does not handle the possibility of the function table exceeding 256 as it used uint8 variables without checking for an overflow:
Another issue that I noticed is that there also seems to be a missed optimization opportunity here as well. The function table passed into OptimizeVectorVMScript() is a list of every single function call made in the shader every time it’s made which introduces duplicate entries. In this ~11k shader all of the entries were the same function. I only took a brief look at OptimizeVectorVMScript() without digging deeper into it for now but it may be possible to de-duplicate the table and remap the function indices although that wouldn’t remove the need to handle >255 functions in case there’s a script that manages to have >255 unique function calls.
Create a particle system using NDC with a large number of user attributes and several nodes that end up yielding a number of function calls in the SystemUpdate shader that exceeds 256.
Compile system
Attempt to render the system in an editor viewport or cooked build
Thanks for the report Timothy, will need to chat to the people closer to the VM to figure out how we should be handling this case better.
I can’t imagine changing from uint8 to uint16 is going to increase byte code memory considerably, although I’m not sure how this spreads around the code without digging a bit more. Either way we should at least fail compilation if we hit this implicit limit vs crashing.
I’m curious how this worked under the old VM, or if you found the old VM didn’t perhaps produce the correct data?
The op output from the shader compiler into the external function table has always been a byte so it would be using the wrong function inside the VM in the old version.
I haven’t looked at what was being generated for the old VM but I’m guessing it was doing something similar with generating a function call table of all calls in the order they’re called without checking for duplicates.
For the script that’s causing the memory stomp on the new VM it happens to call the same function numerous times with just passing different parameters so it likely just happened to work because the entry at the truncated function index happened to be the same function.
Here’s the link to the Jira case, we have someone looking into it so it should be fixed for 5.7. My initial thought is we will use a bit to decide if we need to load an extra byte or not, or have an ‘extended’ op to avoid increasing the size of all existing byte code, but that’s tbd.
We did audit all of FN’s content and the max we hit was 41, this was due to a lot of curve samples inside the asset at the system/emitter level, so I’m curious how well this system performs?
Thanks for the JIRA! Looks like it’s already been fixed. The CL in the JIRA is restricted but I found it in main as CL 44864527 so I’ll cherry pick that fix.
Yeah I suspected FN wasn’t hitting anywhere near 255 function calls. This system doesn’t perform particularly well (the generated script being >11k LOC is another red flag) but it had previously slipped notice as it’s a short lived system so its hit on GPU time isn’t sustained. The large number of function calls appear to related to the number of parameters being multiplied by the nodes in the system and it has parameters for each material type it’s intended to handle which isn’t ideal. The system will need to be redesigned to deal with its performance issues, we’re going to work on some guidance for our VFX artists on how to approach these types of systems to avoid this in the future.
Fortunately the new VM was already storing the index as a u16, so we just opted for using an u16 during shader compilation. And we also now issue a failure if you blow the limit.