Lumen & Nanite on MacOS

jon_eg · April 8, 2023, 2:16am

Hi all, I have some news to report from Epic. A checkin just went in to both the 5.2 and ue5-main branches on github that adds very experimental and (for now) unsupported, disabled-by-default capability for Nanite on M2-based Macs. Here’s a link to the 5.2 commit. (If you don’t have github access that will give you a 404 error.) There are many caveats about this, but someone was bound to notice it so we may as well stay ahead of the messaging curve here.

The only way to enable it is to modify the code and build the engine from source. The modifications are as follows:

Set PLATFORM_MAC_ENABLE_EXPERIMENTAL_NANITE_SUPPORT=1 in UEBuildMac.cs
source.
Enable UE_EXPERIMENTAL_MAC_NANITE_SUPPORT in spirv_msl.hpp
Rebuild ShaderConductor per the instructions in UEBuildMac.cs where the Nanite define is.

And just to reiterate, this will only work on M2 hardware.

There are no guarantees about performance or it even working at all. It is really just there as a foundational step for further development. That said, you are welcome to try it out and report any feedback here.

Thanks for your patience about this topic.

SpideyComic_1511 · April 9, 2023, 11:26am

Well, a small step forward is better than no step at all! Thanks!

SpideyComic_1511 · April 9, 2023, 11:26am

How do I get GitHub access anyhoo?

SupportiveEntity · April 9, 2023, 1:15pm

Hey there @SpideyComic_1511! Just follow the steps here and the system will automatically invite your account, just remember to accept the invite via email or Github notification. Unreal Engine on GitHub - Unreal Engine

SpideyComic_1511 · April 9, 2023, 10:22pm

Many thanks!

philipturner · April 10, 2023, 1:56am

Thanks for finally getting this started! The hardware atomics would be easiest to begin with, but I do hope you expand to older Macs. The current commit added the logic for accessing from a buffer instead of a texture.

Next, I would recommend trying to debug my in-place emulated 64-bit atomics, then incorporating them into UE5. They were working in my other repo but I cannot understand why they’re failing when isolated from that repo (not enough time to investigate). My original idea (the only emulated atomic64 currently working, which splits up into non-contiguous representation) does not seem maintainable from a software engineering perspective. However, the original idea is also the only one that would run on Intel Macs.

Calrizien · April 10, 2023, 4:59pm

is there any hope this could ever be implemented for M1 devices?

philipturner · May 9, 2023, 1:50am

It has been a month since we last heard about M1 Nanite support. I think the original, more compatible, non-buggy atomics are better after all. This workaround should also make Nanite work on DirectX 11 for Windows. Can a staff member update us on this issue?

72 lines of shader code

// Returns 0 if the test failed. Otherwise, the return value must be >= 1
// because we first increment the pixel's counter, then return it.
inline ushort test_depth(uint index,
                         float depth,
                         device atomic_uint *depthBuffer,
                         device atomic_uint *countBuffer,
                         DataRaces dataRaces) {
  // Represent depth as 24-bit normalized integer.
  uint clamped_depth( saturate(depth) * float(MAX_DEPTH) );
  
  // Masks the lower 8 bits with zeroes. This means it can't be
  // considered "larger" than another value with the same depth, but a
  // different lower 8 bits.
  uint comparison_depth = clamped_depth << 8;
  device atomic_uint* depth_ptr = depthBuffer + index;
  
  // Spin until you access the atomic value in a sanitized way.
  ushort output;
  ushort num_data_races_1 = 0; // for profiling
  ushort num_data_races_2 = 0; // for profiling
  while (true) {
    uint current_depth = atomic_load_explicit(depth_ptr, memory_order_relaxed);
    if (comparison_depth <= current_depth) {
      output = 0;
      break;
    }
    
    uchar current_counter(current_depth);
    uchar next_counter = current_counter + 1;
    
    uint next_depth = comparison_depth | next_counter;
    bool atomic_succeeded = atomic_compare_exchange_weak_explicit(
      depth_ptr, &current_depth, next_depth,
      memory_order_relaxed, memory_order_relaxed);
    if (!atomic_succeeded) {
      // If there's a data race, the atomic cmpxchg fails.
      num_data_races_1 += 1;
      continue;
    }
    
    // Although we could pack two 16-bit counters into one word, that
    // creates a theoretical possibility to overflow the lower half, leaking
    // into the upper half. A thread accessing the upper half would be
    // thrown into an infinite loop because the 16-bit counter is always 1
    // ahead of the lock's 8-bit counter.
    //
    // Since it's 32-bit instead, we could store the lock and count
    // contiguously in memory. That decreases bandwidth utilization in the
    // texture reconstruction pass, so it's not a good idea.
    device atomic_uint* count_ptr = countBuffer + index;
    uint current_count = atomic_fetch_add_explicit(
      count_ptr, 1, memory_order_relaxed);
    if ((current_count & 255) != current_counter) {
      // If there's a data race, the counter isn't what you expect.
      num_data_races_2 += 1;
      continue;
    }
    
    // Exit the loop with a success.
    output = ushort(current_count + 1);
    break;
  }
  
  if (num_data_races_1 > 0) {
    dataRaces.recordEvent(1, num_data_races_1);
  }
  if (num_data_races_2 > 0) {
    dataRaces.recordEvent(2, num_data_races_2);
  }
  
  return output;
}

Once you get the atomically generated counter (and if the counter is nonzero), you use that as the upper 16 bits of another atomic32 value. Sort of like how depth is the upper 32 bits of an atomic64 value with the current Nanite. The only difference; we have to split up the color into two 32 / 2 = 16-bit parts. The current Nanite can keep them as one solid 32-bit piece. Hence, we need to “reconstruct” the split color data into solid pieces.

is there any hope this could ever be implemented for M1 devices?

That reconstruction requires non-trivial but straightforward source changes (I could list each one if you ask), so it isn’t plug-and-play like the M2 solution. That’s why I suggested the in-place atomic64 that only works on M1. That is also why only people with extensive experience with engine internals (a.k.a. staff) can implement the more cross-vendor-compatible workaround.

jon_eg · May 16, 2023, 3:33pm

At this time the scope of the planned work is limited to M2. If that ever changes we will be sure to let you all know here, but it is unlikely that performance via emulated 64-bit atomics would be acceptable for shipping the feature.

This workaround should also make Nanite work on DirectX 11 for Windows

Lack of Nanite support in DX11 is not a technical limitation, in fact it used to be supported in 5.0 Early Access via extensions that expose atomics. However there are a lot of other reasons that this introduced unsustainable technical debt and complications.

philipturner · May 17, 2023, 2:19am

A lot of users got M1 machines recently and are going to keep them for numerous years. In addition, not a single iOS device (except iPad Pro) currently has native 64-bit atomics.

Has this assumption been tested and quantified? For example, if L2/L3 bandwidth is not the primary bottleneck, it could be only a factor of 2. I also suggested an M1-specific variant with less instructions and work redistribution to reduce divergence.

I agree that using an older API is not a wise choice, but using older hardware is something separate. By DX11, I meant Windows GPUs that do not support Shader Model 6.6. The workaround could run on DX12, iPhone 14 Pro, Android devices, Linux computers, etc.

Bigcatrik · May 18, 2023, 3:05pm

I installed UE5.2 on my Air 2020 once I found out about the Apple Silicon support. It runs better than I expected for having a “low spec” machine. Any tutorial covering Lumen has worked as expected (very simple scenes probably).

With Nanite it’s a different story, I can understand that I’m not on super-fast hardware, but it’s odd that all the menu items are there and, ostensibly, active since they’re not grayed-out and are selectable, though it took internet searching to discover that it’s not active. I may have overlooked some fine print in the Apple Silicon announcement, but one would think the Nanite menu items would either be grayed-out or produce a box saying “Not supported in MacOS” if they, in fact, do nothing.

philipturner · May 23, 2023, 3:12am

@jon_eg with all due respect, the main point I think many would agree with is, we just want to access this feature. I remember the UE5 trailer in 2020, how futuristic it was, thinking I could combine it with AR on the iPhone. Even if it’s not the fastest, that it works at all, at something over 1 FPS, would be awesome

A good analogy is ML. Often you can only train on GPUs which are Nvidia/CUDA GPUs, because they are the fastest. They have tensor cores that are 10x faster than AMD. But it’s not fair that AMD GPUs can’t be used at all, because even slower GPUs are faster than a CPU. That’s how I view this situation: a few million Windows gaming rigs, compared to billions of mobile devices with not-to-shabby GPUs.

jon_eg · May 26, 2023, 12:34am

@philipturner We do appreciate the feedback. If anything changes we’ll be sure to let you know.

jon_eg · May 26, 2023, 12:35am

With Nanite it’s a different story, I can understand that I’m not on super-fast hardware, but it’s odd that all the menu items are there and, ostensibly, active since they’re not grayed-out and are selectable, though it took internet searching to discover that it’s not active.

This is a fair point. In theory you could be opening a project that is also being worked on by people on Nanite-capable machines, and might want to see and even modify the various settings whether or not you are using Nanite. In fact the same goes for just setting “r.Nanite 0” in the editor on any machine. That said, there ought to be some way to easily distinguish whether or not Nanite is available on the current hardware/RHI. I will log a request for this.

cannasoftware · June 7, 2023, 7:38pm

Well, it appears Lumen is broken. I get purple spots that disappear if I turn off Lumen. With Lumen on, the Purple Spots are really bad in a dark room.

This is very disappointing. Just to hunt down the cause I created a 3rd Person Shooter default level.

My Mac Specs

ZacD · June 7, 2023, 8:01pm

What does the Lumen debug scene look like under Lit?

cannasoftware · June 8, 2023, 7:16pm

Nada. Much of the debug stuff doesn’t appear to be working. I filed a bug and Epic said it’s a dupe of an already filed bug. The report says that it’s most likely to be fixed in 5.3. That would be nice.

Krustenkaese · July 3, 2023, 6:20pm

As noted in the Roadmap for the upcoming release 5.3, UE will support Nanite on all macbooks with at least a M2 chip. Most likely the M1 is missing due to the 64-bit atomics restriction!

But luckily this is a huge leap forward enabling the Macbooks to portable, efficient game development and gaming machines!

See more details here:
Support for Nanite on Apple M2 Devices - Unreal Engine Public Roadmap | Product Roadmap (productboard.com)

philipturner · July 5, 2023, 5:22am

I don’t have an M2 and plan to keep my M1-based hardware for the greater part of a decade, so I guess I’m going to be left out of this. At least my rendering needs are not in gaming.

3 years waiting, time to give up on Lumen and Nanite.

Calrizien · August 1, 2023, 1:45am

Can you list the non-trivial-but-straightforward source changes to implement nanite on M1 hardware? I’d like to implement this myself