DDC - permissions problem

Hi,

We’re using DDC to speed up load times for project files shared by projects used by multiple artists. The files are stored on an NFS server. After an artist opens the scene for the first time (and therefor triggers the caching) it is stored correctly, however when other artists open the scene afterwards (and in theory should read from the caching) we get permission issues. Example error message:

LogDerivedDataCache: Warning: FFileSystemDerivedDataBackend: Could not write temp file /jobs/KVA/cache_seq/unreal/DDC/9/0/1/temp.6B57DF628A794495979C568FDA50F471! Error: 0 (errno=13 (Permission denied))

The permissions on the directories are world read,write,execute, i.e., drwxrwxrwx. Any ideas why this might be happening?

Any help much appreciated!

Cam

When this happens, try writing the file manually at the same location as a test and see if you run into the same perm issue. Also maybe examine NFS logs for more info about the denial.

We’ve tried this and the clients showing this issue can write files to these locations. Our NFS server isn’t logging denials (due to the volume of traffic on the server). A bit of speculation: does UE fully support POSIX permissions? Just wondering if it might be expecting ACL based permissions (which are disabled on our NFS server)

UE uses POSIX file API on Linux when writing files, and doesn’t really expect much of the permissions. However, having taken look at the code (https://github.com/EpicGames/UnrealEngine/blob/4.25/Engine/Source/Runtime/Core/Private/Unix/UnixPlatformFile.cpp#L930), there are other failure points in the file opening process that could have resulted in the permissions error.

A particular suspect is the flock() call that Unreal does to imitate exclusive open mode from Windows (this is because sometimes multiple instances of UE will attempt to open the same file for writing, expecting to fail if the file is already being written to - log files being the prime example of such behavior: each UE instance tries to open the same name.log, then name_2.log, then name_3.log and so on - without the exclusive file access they would all be writing to name.log file without any synchronization). flock() may be unsupported over NFS and it may be that call that is causing EPERM.

You can test the above conjecture and if true, you can try commenting out flock() call to get it to work over NFS. DDC backend first writes to a temporary file, then it attempts to move it to the proper destination. A temporary file has a random GUID in its name, so hopefully for the DDC use case, even without the exclusive write access, two Unreal instances are very unlikely to cause DDC corruption by writing to the same temporary file. Your local log files though will be messed up if you start more than one Unreal instance…

Hi,

Thanks for the answer. flock() does support remote locking over nfs (man 2 flock, man 5 nfs) since kernel 2.6.37 by default (local locks have to be specified as a mount option). Assuming a scenario where a client wanted to modify a file, I assume the writes are made to the cache directly (since there is no single source filesystem this is caching), and as you describe, the file is written first as a temporary file, then moved to the original file name. However, I would have expected the error to not be that it couldn’t write the temp file (as these are very likely to be unique with a random GUID in the name, but rather in trying to overwrite the original file, which would have the lock on it.

I’ll take a look at the source code when I get a chance (should have probably done that first really…).

Thanks,

Cam

I should also mention that this error comes from other users trying to read the files, which shouldn’t place an exclusive lock on.

The locking happens on a much lower level than DDC and it is not needed for the DDC. Any time Unreal opens a file for writing, it is flock()ed… So it’s the temp file that will be locked. If flock() doesn’t return the EPERM then I’m not sure what else would - you’d need to debug locally… the error you quoted comes from writing a file, and there are three libc functions involved: open(), flock() and ftruncate().

I should also mention that this error comes from other users trying to read the files, which shouldn’t place an exclusive lock on.

The log message you quoted in the question is in PutCachedData(), i.e. a write operation to DDC. Also the failing function attempts to save to a (temporary) file. This doesn’t agree with the that latest statement about the problem happening when users try to read files. I.e. the artist may be expecting the DDC files to be only read, but there’s definitely something missing in the cache that makes the artists’ machine generate new DD and (attempt to) put it into the cache (which fails).

Thanks for the reply. Sorry, yes of course it is a write operation, not sure what I was thinking at the time. I’ll look at testing without flock and see what happens.