Cloud DDC Support for AWS Keyspaces

Would like Cloud DDC to support AWS Keyspaces for Cassandra (Scylla) infrastructure.

[Attachment Removed]

Steps to Reproduce

    Scylla:
      ConnectionString: Contact Points=cassandra.us-west-2.amazonaws.com;Port=9142;Default
        Keyspace=ddc;Username=$IAM_KEYSPACES_USERNAME;Password=$IAM_KEYSPACES_PASSWORD
      KeyspaceReplicationStrategy:
        class: SimpleStrategy
        replication_factor: "3"
      LocalDatacenterName: us-west-2
      LocalKeyspaceSuffix: ddc
      UseAzureCosmosDB: true
      UseSSL: true

Working on getting Cloud DDC connected to Keyspaces. It was able to connect and create the keyspaces / tables, but then the webserver fell over with the following messages.

{“Timestamp”:“2025-11-13T22:21:25.3375883+00:00”,“Level”:“Fatal”,“MessageTemplate”:“Host terminated unexpectedly”,“Exception”:"Cassandra.InvalidQueryException: DISTINCT is not yet supported\n at Cassandra.Requests.PrepareHandler.SendRequestToOneNode(IInternalSession session, IEnumerator`1 queryPlan, PrepareRequest request)\n

[Attachment Removed]

Hey

There is only one place were we use the distinct keyword and that query is not used very often. As such I think its totally realistic to simply do the unique filtering on the client side behind a option, probably something similar to the UseAzureCosmosDB but one for Keyspaces.

Might need a bit of back and forth with you to actually identity and iron out all the issues we would see as we have no intention of actually running against keyspaces ourselves.

I will be out of the holidays as of today but back in January so will get something added once I am back. Are you okay testing this by building your own docker images so we can iterate on it without having official releases in between?

[Attachment Removed]

Good question, there were a few key reasons why we preferred Scylla over a managed offering.

  • First and foremost because we wanted to run something that we knew licensees could run no matter if they were on premise or in other clouds then AWS. We have typically always chosen options that means our setup can be replicated by licensees should they want to (but try to avoid having to force it to be identical).
  • Secondly we did consider AWS Keyspace for a while after having setup and proven Scylla as a way to simplify our management (once we knew Scylla worked we felt confident we could use something different and still be certain it would work for licensees). When we did the napkin math for Keyspace with the data we had in Scylla it ended up costing quite a lot more and this was in line with what we saw in our first prototype for Cloud DDC which used DynamoDB.

Our current thinking is that we would rather invest in removing the need for a DB layer at all and rely exclusively on the object store (S3) so that we could simplify the setup for all licensees instead of making the operational aspects of the DB easier and cheaper, but this isn’t something we are actively pursing and even once we do its likely to be a fairly long term project (fundamental problem is that S3s operations are to slow for a lot of the operations we need so we need a faster index).

[Attachment Removed]

Hey friends! I work with Taylor and have been iterating on setting up Cloud DDC with a Keyspaces backend using the provided CL. We’re running into a couple new issues now:

  1. SSL settings are not applied unless UseCosmosDB is enabled. Jupiter won’t connect unless both settings are true (cosmos + keyspaces)
  2. Username / Password need to be hardcoded into the scylla connection string (via Helm values). Using environment variables would be preferred, is that possible?
  3. New fatal query error

```

{“Timestamp”:“2026-01-13T19:25:38.9059553+00:00”,“Level”:“Fatal”,“MessageTemplate”:“Host terminated unexpectedly”,“Exception”:“Cassandra.SyntaxError: line 1:67 mismatched input ‘(’ expecting ‘)’ (… EXISTS block_context_by_time ON block_context ([(]…)\n at Cassandra.Tasks.TaskHelper.WaitToComplete(Task task, Int32 timeout)\n at Cassandra.Tasks.TaskHelper.WaitToCompleteWithMetrics(IMetricsManager manager, Task task, Int32 timeout)\n at Cassandra.Tasks.TaskHelper.WaitToCompleteWithMetrics[T](IMetricsManager manager, Task`1 task, Int32 timeout)\n at Cassandra.Session.Execute(IStatement statement, String executionProfileName)\n at Cassandra.Session.Execute(IStatement statement)\n at Jupiter.Implementation.ScyllaBlockStore..ctor(IScyllaSessionManager scyllaSessionManager, IOptionsMonitor`1 scyllaSettings, Tracer tracer) in /app/Programs/UnrealCloudDDC/Jupiter/Implementation/Builds/ScyllaBlockStore.cs:line 47\n at System.RuntimeMethodHandle.InvokeMethod(Object target, Void** arguments, Signature sig, Boolean isConstructor)\n at System.Reflection.MethodBaseInvoker.InvokeDirectByRefWithFewArgs(Object obj, Span`1 copyOfArgs, BindingFlags invokeAttr)\n at System.Reflection.MethodBaseInvoker.InvokeWithFewArgs(Object obj, BindingFlags invokeAttr, Binder binder, Object[] parameters, CultureInfo culture)\n at Microsoft.Extensions.DependencyInjection.ActivatorUtilities.ConstructorMatcher.CreateInstance(IServiceProvider provider)\n at Microsoft.Extensions.DependencyInjection.ActivatorUtilities.CreateInstance(IServiceProvider provider, Type instanceType, Object[] parameters)\n at Microsoft.Extensions.DependencyInjection.ActivatorUtilities.CreateInstance[T](IServiceProvider provider, Object[] parameters)\n at Jupiter.JupiterStartup.BlockIndexFactory(IServiceProvider provider) in /app/Programs/UnrealCloudDDC/Jupiter/Jupiter.Startup.cs:line 248\n at Microsoft.Extensions.DependencyInjection.ServiceLookup.CallSiteVisitor`2.VisitCallSiteMain(ServiceCallSite callSite, TArgument argument)\n at Microsoft.Extensions.DependencyInjection.ServiceLookup.CallSiteRuntimeResolver.VisitRootCache(ServiceCallSite callSite, RuntimeResolverContext context)\n at Microsoft.Extensions.DependencyInjection.ServiceLookup.CallSiteVisitor`2.VisitCallSite(ServiceCallSite callSite, TArgument argument)\n at Microsoft.Extensions.DependencyInjection.ServiceLookup.CallSiteRuntimeResolver.VisitConstructor(ConstructorCallSite constructorCallSite, RuntimeResolverContext context)\n at Microsoft.Extensions.DependencyInjection.ServiceLookup.CallSiteVisitor`2.VisitCallSiteMain(ServiceCallSite callSite, TArgument argument)\n at Microsoft.Extensions.DependencyInjection.ServiceLookup.CallSiteRuntimeResolver.VisitRootCache(ServiceCallSite callSite, RuntimeResolverContext context)\n at Microsoft.Extensions.DependencyInjection.ServiceLookup.CallSiteVisitor`2.VisitCallSite(ServiceCallSite callSite, TArgument argument)\n at Microsoft.Extensions.DependencyInjection.ServiceLookup.CallSiteRuntimeResolver.Resolve(ServiceCallSite callSite, ServiceProviderEngineScope scope)\n at Microsoft.Extensions.DependencyInjection.ServiceProvider.CreateServiceAccessor(ServiceIdentifier serviceIdentifier)\n at System.Collections.Concurrent.ConcurrentDictionary`2.GetOrAdd(TKey key, Func`2 valueFactory)\n at Microsoft.Extensions.DependencyInjection.ServiceProvider.GetService(ServiceIdentifier serviceIdentifier, ServiceProviderEngineScope serviceProviderEngineScope)\n at Microsoft.Extensions.DependencyInjection.ServiceLookup.ServiceProviderEngineScope.GetService(Type serviceType)\n at Microsoft.Extensions.DependencyInjection.ServiceProviderServiceExtensions.GetService[T](IServiceProvider provider)\n at Jupiter.Implementation.BlobCleanupService..ctor(IServiceProvider provider, IOptionsMonitor`1 settings, ILogger`1 logger) in /app/Programs/UnrealCloudDDC/Jupiter/Implementation/GC/BlobCleanupService.cs:line 42\n at System.RuntimeMethodHandle.InvokeMethod(Object target, Void** arguments, Signature sig, Boolean isConstructor)\n at System.Reflection.MethodBaseInvoker.InvokeDirectByRefWithFewArgs(Object obj, Span`1 copyOfArgs, BindingFlags invokeAttr)\n at System.Reflection.MethodBaseInvoker.InvokeWithFewArgs(Object obj, BindingFlags invokeAttr, Binder binder, Object[] parameters, CultureInfo culture)\n at Microsoft.Extensions.DependencyInjection.ServiceLookup.CallSiteVisitor`2.VisitCallSiteMain(ServiceCallSite callSite, TArgument argument)\n at Microsoft.Extensions.DependencyInjection.ServiceLookup.CallSiteRuntimeResolver.VisitRootCache(ServiceCallSite callSite, RuntimeResolverContext context)\n at Microsoft.Extensions.DependencyInjection.ServiceLookup.CallSiteVisitor`2.VisitCallSite(ServiceCallSite callSite, TArgument argument)\n at Microsoft.Extensions.DependencyInjection.ServiceLookup.CallSiteRuntimeResolver.Resolve(ServiceCallSite callSite, ServiceProviderEngineScope scope)\n at Microsoft.Extensions.DependencyInjection.ServiceProvider.CreateServiceAccessor(ServiceIdentifier serviceIdentifier)\n at System.Collections.Concurrent.ConcurrentDictionary`2.GetOrAdd(TKey key, Func`2 valueFactory)\n at Microsoft.Extensions.DependencyInjection.ServiceProvider.GetService(ServiceIdentifier serviceIdentifier, ServiceProviderEngineScope serviceProviderEngineScope)\n at Microsoft.Extensions.DependencyInjection.ServiceLookup.ServiceProviderEngineScope.GetService(Type serviceType)\n at Microsoft.Extensions.DependencyInjection.ServiceProviderServiceExtensions.GetService[T](IServiceProvider provider)\n at Jupiter.JupiterStartup.<>c.<OnAddService>b__2_14(IServiceProvider p) in /app/Programs/UnrealCloudDDC/Jupiter/Jupiter.Startup.cs:line 136\n at Microsoft.Extensions.DependencyInjection.ServiceLookup.CallSiteVisitor`2.VisitCallSiteMain(ServiceCallSite callSite, TArgument argument)\n at Microsoft.Extensions.DependencyInjection.ServiceLookup.CallSiteRuntimeResolver.VisitRootCache(ServiceCallSite callSite, RuntimeResolverContext context)\n at Microsoft.Extensions.DependencyInjection.ServiceLookup.CallSiteVisitor`2.VisitCallSite(ServiceCallSite callSite, TArgument argument)\n at Microsoft.Extensions.DependencyInjection.ServiceLookup.CallSiteRuntimeResolver.VisitIEnumerable(IEnumerableCallSite enumerableCallSite, RuntimeResolverContext context)\n at Microsoft.Extensions.DependencyInjection.ServiceLookup.CallSiteVisitor`2.VisitCallSiteMain(ServiceCallSite callSite, TArgument argument)\n at Microsoft.Extensions.DependencyInjection.ServiceLookup.CallSiteRuntimeResolver.VisitRootCache(ServiceCallSite callSite, RuntimeResolverContext context)\n at Microsoft.Extensions.DependencyInjection.ServiceLookup.CallSiteVisitor`2.VisitCallSite(ServiceCallSite callSite, TArgument argument)\n at Microsoft.Extensions.DependencyInjection.ServiceLookup.CallSiteRuntimeResolver.Resolve(ServiceCallSite callSite, ServiceProviderEngineScope scope)\n at Microsoft.Extensions.DependencyInjection.ServiceProvider.CreateServiceAccessor(ServiceIdentifier serviceIdentifier)\n at System.Collections.Concurrent.ConcurrentDictionary`2.GetOrAdd(TKey key, Func`2 valueFactory)\n at Microsoft.Extensions.DependencyInjection.ServiceProvider.GetService(ServiceIdentifier serviceIdentifier, ServiceProviderEngineScope serviceProviderEngineScope)\n at Microsoft.Extensions.DependencyInjection.ServiceProvider.GetService(Type serviceType)\n at Microsoft.Extensions.DependencyInjection.ServiceProviderServiceExtensions.GetRequiredService(IServiceProvider provider, Type serviceType)\n at Microsoft.Extensions.DependencyInjection.ServiceProviderServiceExtensions.GetRequiredService[T](IServiceProvider provider)\n at Microsoft.Extensions.Hosting.Internal.Host.StartAsync(CancellationToken cancellationToken)\n at Microsoft.Extensions.Hosting.HostingAbstractionsHostExtensions.Start(IHost host)\n at Jupiter.BaseProgram`1.BaseMain(String[] args) in /app/Programs/UnrealCloudDDC/Jupiter.Common/BaseProgram.cs:line 73”}

```

Thank you for the support!

[Attachment Removed]

Hey

  1. I changed so that the ssl configuration applies to keyspace in change 49803386 .
  2. We follow dotnets conventions for options and that means everything can be set as an environment variable, in this case setting `SCYLLA__CONNECTIONSTRING=<string>` should work. Generally the convention is to replace . with double underscores for environment variables .
  3. I am not sure why this error happens, looks like Keyspace is not happy with the clustered primary key for the index we are creating. We use clustered primary keys for a bunch fo tables without issues but it might not support that for indexes. The only workaround I can see is disabling the feature that is attempt to create this index as its not needed for DDC, this will prevent you from using the new build upload features (cooked output storage and distribution of whole game builds). But as a way of testing this out of DDC that seems like a good path forward and we can figure out a proper solution to this later if we need to. You can do that by setting BuildStoreImplementation=Memory in your helm values directly under the config object
    [Attachment Removed]

Thank you Joakim! I’ll try this out today and report back.

[Attachment Removed]

[mention removed]​ Setting `BuildStoreImplementation=Memory` allowed DDC to start, however a new error appeared :frowning:

{"Timestamp":"2026-01-15T04:34:42.1950713+00:00","Level":"Error","MessageTemplate":"An unhandled exception has occurred while executing the request.","TraceId":"ad25e074372a398e3f19a378e9e2ae7e","SpanId":"a6ab48ba5232d5ca","Exception":"Cassandra.InvalidQueryException: TTL is not yet supported.\n at Cassandra.Mapping.Mapper.ExecuteAsync(Cql cql)\n at Jupiter.Implementation.ScyllaReplicationLog.InsertAddEventAsync(NamespaceId ns, BucketId bucket, RefId key, BlobId objectBlob, Nullable1 timestamp) in /app/Programs/UnrealCloudDDC/Jupiter/Implementation/TransactionLog/ScyllaReplicationLog.cs:line 110\n at Jupiter.Implementation.ObjectService.DoFinalizeAsync(NamespaceId ns, BucketId bucket, RefId key, BlobId blobHash, CbObject payload, Action1 onBlobFound, CancellationToken cancellationToken) in /app/Programs/UnrealCloudDDC/Jupiter/Implementation/RefService.cs:line 278\n at Jupiter.Implementation.ObjectService.PutAsync(NamespaceId ns, BucketId bucket, RefId key, BlobId blobHash, CbObject payload, Action1 onBlobFound, Boolean allowOverwrite, CancellationToken cancellationToken) in /app/Programs/UnrealCloudDDC/Jupiter/Implementation/RefService.cs:line 168\n at Jupiter.Controllers.ReferencesController.PutObjectAsync(NamespaceId ns, BucketId bucket, RefId key, Boolean allowOverwrite) in /app/Programs/UnrealCloudDDC/Jupiter/Controllers/ReferencesController.cs:line 931\n at Microsoft.AspNetCore.Mvc.Infrastructure.ActionMethodExecutor.TaskOfIActionResultExecutor.Execute(ActionContext actionContext, IActionResultTypeMapper mapper, ObjectMethodExecutor executor, Object controller, Object arguments)\n at Microsoft.AspNetCore.Mvc.Infrastructure.ControllerActionInvoker.g__Logged|12_1(ControllerActionInvoker invoker)\n at Microsoft.AspNetCore.Mvc.Infrastructure.ControllerActionInvoker.g__Awaited|10_0(ControllerActionInvoker invoker, Task lastTask, State next, Scope scope, Object state, Boolean isCompleted)\n at Microsoft.AspNetCore.Mvc.Infrastructure.ControllerActionInvoker.Rethrow(ActionExecutedContextSealed context)\n at Microsoft.AspNetCore.Mvc.Infrastructure.ControllerActionInvoker.Next(State& next, Scope& scope, Object& state, Boolean& isCompleted)\n at Microsoft.AspNetCore.Mvc.Infrastructure.ControllerActionInvoker.g__Awaited|13_0(ControllerActionInvoker invoker, Task lastTask, State next, Scope scope, Object state, Boolean isCompleted)\n at Microsoft.AspNetCore.Mvc.Infrastructure.ResourceInvoker.g__Awaited|25_0(ResourceInvoker invoker, Task lastTask, State next, Scope scope, Object state, Boolean isCompleted)\n at Microsoft.AspNetCore.Mvc.Infrastructure.ResourceInvoker.Rethrow(ResourceExecutedContextSealed context)\n at Microsoft.AspNetCore.Mvc.Infrastructure.ResourceInvoker.Next(State& next, Scope& scope, Object& state, Boolean& isCompleted)\n at Microsoft.AspNetCore.Mvc.Infrastructure.ResourceInvoker.g__Awaited|20_0(ResourceInvoker invoker, Task lastTask, State next, Scope scope, Object state, Boolean isCompleted)\n at Microsoft.AspNetCore.Mvc.Infrastructure.ResourceInvoker.g__Logged|17_1(ResourceInvoker invoker)\n at Microsoft.AspNetCore.Mvc.Infrastructure.ResourceInvoker.g__Logged|17_1(ResourceInvoker invoker)\n at Microsoft.AspNetCore.Routing.EndpointMiddleware.g__AwaitRequestTask|7_0(Endpoint endpoint, Task requestTask, ILogger logger)\n at EpicGames.AspNet.ServerTimingMiddleware.InvokeAsync(HttpContext context) in /app/Programs/Shared/EpicGames.AspNet/ServerTimings.cs:line 122\n at Jupiter.Common.SuppressExceptionMiddleware.InvokeAsync(HttpContext context) in /app/Programs/UnrealCloudDDC/Jupiter.Common/SuppressExceptionMiddleware.cs:line 26\n at Microsoft.AspNetCore.Authorization.AuthorizationMiddleware.Invoke(HttpContext context)\n at Microsoft.AspNetCore.Authentication.AuthenticationMiddleware.Invoke(HttpContext context)\n at Microsoft.AspNetCore.Diagnostics.ExceptionHandlerMiddlewareImpl.g__Awaited|10_0(ExceptionHandlerMiddlewareImpl middleware, HttpContext context, Task task)“,“Properties”:{“EventId”:{“Id”:1,“Name”:“UnhandledException”},“SourceContext”:“Microsoft.AspNetCore.Diagnostics.ExceptionHandlerMiddleware”,“RequestId”:“0HNIJTSBC20CT:00000016”,“RequestPath”:”/api/v1/refs/test/asset/da16ae9fbfdeb3caa0fc7af4b458d76ce722ba26",“ConnectionId”:“0HNIJTSBC20CT”,“ue-session”:null}}We also had to set \DefaultConsistencyLevel: LocalQuorum` in the Scylla config, Keyspaces rejected writes that default to LocalOne.

[Attachment Removed]

Oh yeah this seems like a big issue, we rely quite a lot on TTL in a few tables especially for the replication log as you saw errors in here. There is currently no workaround I can see for that so I can’t really see any way forward with keyspace unless AWS support as some ideas on how we could enable some of these required features.

[Attachment Removed]

I was able to get past this issue by altering two replication tables. It is unclear why AWS doesn’t support this out-of-the-box, but they have some documentation on enabling custom TTL on tables

ref:

These two queries allowed cloud ddc to write data to the replication tables.

ALTER TABLE cloudddc_demo_local_usw2.replication_log 
WITH CUSTOM_PROPERTIES={'ttl':{'status':'enabled'}};
 
ALTER TABLE cloudddc_demo_local_usw2.blob_replication_log 
WITH CUSTOM_PROPERTIES={'ttl':{'status':'enabled'}};

I’m going to explore connecting another region to see if there are any issues. Do you foresee any potential problems here? I’m mildly concerned about diverging from Epic’s battle tested Cloud DDC implementation for the sake of Scylla simplicity.

Thanks for the assistance Joakim!

[Attachment Removed]

I’m having trouble getting the worker service to replicate data, wondering if it could be related to the BuildStoreImplementation setting?

# worker config
worker:
  enabled: true
  replicaCount: 2
  image:
    repository: "myregistry.example.com/unrealcloudddc"
    pullPolicy: IfNotPresent
  config:
    BuildStoreImplementation: Memory
    S3:
      BucketName: cloudddc-demo-apse2-staging
      Region: ap-southeast-2
      CreateBucketIfMissing: false
      SetBucketPolicies: false
      UseBlobIndexForExistsCheck: true
      ForceAWSPathStyle: false
      UseArnRegion: true
      UseMultiPartUpload: true
      UseChunkEncoding: true
 
    AWSCredentials:
      AWSCredentialsType: AssumeRoleWebIdentity
 
    GC:
      CleanOldRefRecords: false
      CleanOldBlobs: true
      BlobCleanupServiceInterval: 3600
 
    Replication:
      Enabled: true
      ReplicationPollFrequencySeconds: 60
      Replicators:
        - ReplicatorName: usw2-to-apse2-test-game
          Namespace: test-game
          ConnectionString: https://cloudddc-usw2.example.com
          Version: Refs
          MaxParallelReplications: 64
          SkipSnapshot: false
 
    Scylla:
      LocalDatacenterName: "ap-southeast-2"
      LocalKeyspaceSuffix: "apse2"
      UseAWSKeyspace: true
      DefaultConsistencyLevel: LocalQuorum
      LogLevel: Debug
      EnableTracing: true
      KeyspaceReplicationStrategy:
        class: SimpleStrategy
        replication_factor: "3"
 
    UnrealCloudDDC:
      BlobIndexImplementation: Scylla
      ContentIdStoreImplementation: Scylla
      EnableLastAccessTracking: true
      LeaderElectionImplementation: Static
      ReferencesDbImplementation: Scylla
      ReplicationLogWriterImplementation: Scylla
      StorageImplementations:
        - "S3"

error log:

{"Timestamp":"2026-01-15T22:35:52.7727486+00:00","Level":"Error","MessageTemplate":"Failed to create replicator {Name}","Exception":"System.InvalidOperationException: A suitable constructor for type 'Jupiter.Implementation.RefsReplicator' could not be located. Ensure the type is concrete and all parameters of a public constructor are either registered as services or passed as arguments. Also ensure no extraneous arguments are provided.\n at Microsoft.Extensions.DependencyInjection.ActivatorUtilities.FindApplicableConstructor(Type instanceType, Type[] argumentTypes, ConstructorInfoEx[] constructors, ConstructorInfo& matchingConstructor, Nullable1& matchingParameterMap)\n at Microsoft.Extensions.DependencyInjection.ActivatorUtilities.CreateInstance(IServiceProvider provider, Type instanceType, Object parameters)\n at Microsoft.Extensions.DependencyInjection.ActivatorUtilities.CreateInstance[T](IServiceProvider provider, Object parameters)\n at Jupiter.Implementation.ReplicationService.CreateReplicator(ReplicatorSettings replicatorSettings, ClusterSettings clusterSettings, IServiceProvider provider) in /app/Programs/UnrealCloudDDC/Jupiter/Implementation/Replication/ReplicationService.cs:line 73\n at Jupiter.Implementation.ReplicationService..ctor(IOptionsMonitor1 settings, IOptionsMonitor1 clusterSettings, IServiceProvider provider, ILeaderElection leaderElection, ILogger1 logger) in /app/Programs/UnrealCloudDDC/Jupiter/Implementation/Replication/ReplicationService.cs:line 45","Properties":{"Name":"usw2-to-apse2-test-game","SourceContext":"Jupiter.Implementation.ReplicationService"}}

[Attachment Removed]

“Do you foresee any potential problems here? I’m mildly concerned about diverging from Epic’s battle tested Cloud DDC implementation for the sake of Scylla simplicity.”

The change from LocalOne to LocalQuorum will imply a slower response time which can be a issue for DDC which is notoriously sensitive to latency. But we have been considering a similar switch (though maybe not for DDC) and from our experiments the difference doesn’t seem very large in most practical cases so I think it will be okay. We will need to test and see.

For the changes to enable TTL, does seems fine to me. Likely something I should port back into the Cloud DDC when keyspace is enabled if this works out well.

For the refs replicator issue, I would recommend using the blob replicator instead of the refs one. Its faster and more reliable. The actual issue is likely a problem in the code, we haven’t used the refs replication for a while. I don’t see how switching the builds implementation can cause that. I will need to investigate further, but again I would recommend just changing the version from “refs” to “blobs”

[Attachment Removed]

Thanks Joakim! Switching to `blobs` seems to have worked, the replicators are starting successfully now. However… there is a new error when fetching blob data from the DDC API

{"Timestamp":"2026-01-20T17:14:27.2464443+00:00","Level":"Error","MessageTemplate":"Unhandled exception in replicator {Name}","Exception":"System.AggregateException: One or more errors occurred. (Expected content-length on blob response)\n ---> System.Exception: Expected content-length on blob response\n at Jupiter.Implementation.BlobsReplicator.ReplicateBlobHttpAsync(NamespaceId ns, BlobId blob, Nullable1 bucketHint, CancellationToken cancellationToken) in /app/Programs/UnrealCloudDDC/Jupiter/Implementation/Replication/BlobsReplicator.cs:line 825\n at Jupiter.Implementation.BlobsReplicator.ReplicateBlobHttpAsync(NamespaceId ns, BlobId blob, Nullable1 bucketHint, CancellationToken cancellationToken) in /app/Programs/UnrealCloudDDC/Jupiter/Implementation/Replication/BlobsReplicator.cs:line 831\n at Jupiter.Implementation.BlobsReplicator.ReplicateBlobAsync(NamespaceId ns, BlobId blob, Nullable1 bucketHint, CancellationToken cancellationToken) in /app/Programs/UnrealCloudDDC/Jupiter/Implementation/Replication/BlobsReplicator.cs:line 516\n at Jupiter.Implementation.BlobsReplicator.<>c__DisplayClass27_1.<b__0>d.MoveNext() in /app/Programs/UnrealCloudDDC/Jupiter/Implementation/Replication/BlobsReplicator.cs:line 319\n— End of stack trace from previous location —\n at System.Threading.Tasks.Parallel.<>c__571.<<ForEachAsync>b__57_0>d.MoveNext()\n--- End of stack trace from previous location ---\n at Jupiter.Implementation.BlobsReplicator.ReplicateIncrementallyAsync(NamespaceId ns, String lastBucket, CancellationToken replicationToken) in /app/Programs/UnrealCloudDDC/Jupiter/Implementation/Replication/BlobsReplicator.cs:line 294\n at Jupiter.Implementation.BlobsReplicator.TriggerNewReplicationsAsync() in /app/Programs/UnrealCloudDDC/Jupiter/Implementation/Replication/BlobsReplicator.cs:line 229\n --- End of inner exception stack trace ---","Properties":{"Name":"usw2-to-apse2-test-game","SourceContext":"Jupiter.Implementation.ReplicationService"}}It looks like the `content-length` response header is missing, causing the replication request to fail.

[Attachment Removed]

Hey

That error usually happens when there is a setup issue with the endpoint its trying to reach resulting a transport level error. I would try just manually calling the endpoint its attempting to use from with the pod that is running the replication to see what the error is. Simply doing a curl get on any blob is likely enough to trigger it.

[Attachment Removed]

It’s odd, I wasn’t finding errors in any of the pods. I couldn’t find a way to get the content-length header to appear. I tested at each layer (curl from within pod over localhost, curl pod-to-service, curl directly to ALB, etc…). However, I was able to get replication working after modifying BlobController.cs directly and explicitly setting the header.

Response.ContentLength = blobContents.Length; [Image Removed]Not sure if this is related to AWS Keyspaces or how we have our load balancers setup (application LBs), but this worked. We didn’t encounter this in a non-Keyspaces environment, but that environment is setup with `Refs` replication, not `Blobs`.

[Attachment Removed]

Huh that is odd. I don’t see how it is impacted by the keyspace setup. Using a ALB seems fine, we do as well. The change itself seems reasonable so no concern there.

My expectation is that the content length would be set up asp.net in the FileResponse below (line 78) which handles all the byte offset parts.

One difference might be if you have the nginx sidecar enabled in one case but not the other (as that would have redirected at line 73 and nginx would have output the content length header).

[Attachment Removed]

We’re still evaluating the Keyspaces changes - so far so good. I’m curious how Epic is handling the DDC connection string for unreal clients. Currently we have two deployments with unique API endpoints (replicating and using Keyspaces). Clients can connect to both, but ideally they connect to the lowest latency endpoint. All of our traffic is internal, so geographically distributed DNS queries can get a little weird if we try a single endpoint. Is Epic using a single connection string + geo-location DNS? We’re trying to avoid developer churn on modifying DDC settings locally.

[Attachment Removed]

Sorry for the delayed response - our testing is complete and we’re happy with the results! Thank you for the connection string info, that really simplifies things for us. We’d love to have the Keyspaces changes promoted to an official release. Not sure what that process looks like, if you need any additional info I’m happy to help.

Thank you for all the support Joakim!

Nate

[Attachment Removed]

That is great news!

I think I should make sure to add those custom query things to enable ttl, was it only the two replication log tables that needed it? “replication_log” and “blob_replication_log” ?

Ideally we would also figure out a way to run keyspace locally in our test docker container so that we can actually verify that any changes we make in the future are compatible but I can not see any way to do that, so we may have to just take that as it comes.

We can make sure to cut an official release with this fairly soon once I have those changes in, its usually fairly ad hoc for us but we try to make sure to have used the released version for a few weeks within Epic before so that it has plenty of coverage.

[Attachment Removed]

That sounds great! Going back over my notes… These are the modifications I made locally:

  • Explicitly setting the replication header in BlobController.cs

Response.ContentLength = blobContents.Length;* Disable tablets on the in the `CREATE keyspace` - please note this isn’t returning in the `describe keyspace` output below.

AND tablets = {'enabled': false}[Attachment Removed]