Horde Elasticache Configuration

We’ve setup a Horde server that currently uses a redis server running within a docker container. We’ve created an ElastiCache resource that we’re trying to migrate to, but when attempting to use it Horde outputs the following error message. We’re using a serverless valkey cache running engine version 8.

Is there some additional configuration required, or is this an unsupported setup? The docs mention that Epic uses ElastiCache - can you provide some information of the recommended setup?

[18:03:45 err] Exception in call to /Horde.HordeRpc/CreateSession StackExchange.Redis.RedisCommandException: Multi-key operations must involve a single slot; keys can use 'hash tags' to help this, i.e. '{/users/12345}/account' and '{/users/12345}/contacts' will always be in the same slot at StackExchange.Redis.ServerSelectionStrategy.Select(Message message, Boolean allowDisconnected) in /_/src/StackExchange.Redis/ServerSelectionStrategy.cs:line 144 at StackExchange.Redis.ConnectionMultiplexer.SelectServer(Message message) in /_/src/StackExchange.Redis/ConnectionMultiplexer.cs:line 1934 at StackExchange.Redis.ConnectionMultiplexer.PrepareToPushMessageToBridge[T](Message message, ResultProcessor1 processor, IResultBox1 resultBox, ServerEndPoint& server) in /_/src/StackExchange.Redis/ConnectionMultiplexer.cs:line 1949 at StackExchange.Redis.ConnectionMultiplexer.TryPushMessageToBridgeAsync[T](Message message, ResultProcessor1 processor, IResultBox1 resultBox, ServerEndPoint& server) in /_/src/StackExchange.Redis/ConnectionMultiplexer.cs:line 2007 at StackExchange.Redis.ConnectionMultiplexer.ExecuteAsyncImpl[T](Message message, ResultProcessor1 processor, Object state, ServerEndPoint server) in /_/src/StackExchange.Redis/ConnectionMultiplexer.cs:line 2188
at StackExchange.Redis.RedisBase.ExecuteAsync[T](Message message, ResultProcessor1 processor, ServerEndPoint server) in /_/src/StackExchange.Redis/RedisBase.cs:line 54 at StackExchange.Redis.RedisTransaction.ExecuteAsync(CommandFlags flags) in /_/src/StackExchange.Redis/RedisTransaction.cs:line 56 at HordeServer.Agents.AgentScheduler.TryCreateSessionAsync(AgentId agentId, SessionId sessionId, RpcAgentCapabilities capabilities, CancellationToken cancellationToken) in /app/Source/Programs/Horde/Plugins/Compute/HordeServer.Compute/Agents/AgentScheduler.cs:line 272 at HordeServer.Agents.AgentCollection.TryCreateSessionAsync(Agent agent, CreateSessionOptions options, CancellationToken cancellationToken) in /app/Source/Programs/Horde/Plugins/Compute/HordeServer.Compute/Agents/AgentCollection.cs:line 1091 at HordeServer.Agents.AgentCollection.Agent.TryCreateSessionAsync(CreateSessionOptions options, CancellationToken cancellationToken) in /app/Source/Programs/Horde/Plugins/Compute/HordeServer.Compute/Agents/AgentCollection.cs:line 117 at HordeServer.Agents.AgentService.CreateSessionAsync(IAgent agent, RpcAgentCapabilities capabilities, String version, CancellationToken cancellationToken) in /app/Source/Programs/Horde/Plugins/Compute/HordeServer.Compute/Agents/AgentService.cs:line 340 at HordeServer.Server.RpcService.CreateSession(RpcCreateSessionRequest request, ServerCallContext context) in /app/Source/Programs/Horde/Plugins/Compute/HordeServer.Compute/Server/RpcService.cs:line 227 at Grpc.Shared.Server.UnaryServerMethodInvoker3.ResolvedInterceptorInvoker(TRequest resolvedRequest, ServerCallContext resolvedContext)
at Grpc.Shared.Server.UnaryServerMethodInvoker3.ResolvedInterceptorInvoker(TRequest resolvedRequest, ServerCallContext resolvedContext) at HordeServer.Startup.GrpcExceptionInterceptor.<>c__DisplayClass6_01.<b__0>d.MoveNext() in /app/Source/Programs/Horde/HordeServer/Startup.cs:line 130
— End of stack trace from previous location —
at HordeServer.Startup.GrpcExceptionInterceptor.GuardInnerAsync(ServerCallContext context, Func1 callFunc) in /app/Source/Programs/Horde/HordeServer/Startup.cs:line 155 [18:03:45 inf] Executed endpoint 'gRPC - /Horde.HordeRpc/CreateSession'

Did a little digging into the code, I suspect this is a bug in the current `AgentScheduler` code - `SessionKeys` should use SessionId as the hashtag to ensure related session data goes to the same slot (https://redis.io/blog/redis-clustering-best-practices-with-keys/ for more info).

EDIT: It initially looked like this fixed the error. Closer inspection of the logs shows that it’s still failing, will continue to look for workarounds.

Hey Andrew,

Thanks for the question, and I appreciate you following up with some attempts at resolving the issue.

If you could, please provide your connection string / Redis configuration to your horde server? We are currently on engine version Redis OSS 6.2.6 for posterity, so I do wonder if this is a versioning issue. Our connection string is quite simplistic at the moment (just a simple URL:PORT), so it’s not doing anything particularly complex w.r.t. redis configuration + instrumentation.

Kind regards,

Julian

I created a new STANDALONE valkey ElastiCache resource running Engine version 8.0.1, which works. I’m confident that the current implementation of AgentScheduler is incompatible with a redis cluster, you MUST use a standalone server. That would be useful information to include in the docs, although I understand the challenge of keeping docs current while the code is constantly in flux.

Hey there Andrew,

I’ll get this updated in the documentation, as well as create a JIRA internally for cluster support. I appreciate you giving it a whirl on your end and providing us with the context.

Kind regards,

Julian

Sure, here’s the redacted line from our docker-compose.yml file.

Horde__RedisConnectionConfig: OUR_SERVER_ID.serverless.use2.cache.amazonaws.com:6379,ssl=true,user=USERNAME,password=PASSWORD

Hey Andrew,

Yeah this does appear to be quite similar in complexity (we don’t seem to have ssl set to true, but that seems irrelevant in the error). Would it be possible for you to try Engine 6.2.X? In doing so we may have a quicker way to verify whether this could be a versioning issue as opposed to config.

Kind regards,

Julian

I can’t find a way to downgrade ElastiCache to engine version 6.2, but I can test with the valkey/valkey:8 image. That should help determine whether it’s AWS-specific or affects the newer version. I’ll update with results later

Connecting to a single valkey 8 (or redis 8) node works fine. Connecting to a cluster (which the serverless option uses) fails. That makes sense - for a single server all transactions go to the same destination. There’s no ambiguity.

For a cluster, each operation determines the target “slot” which in turn determines which server handles it. If my attempted code change was working properly all operations within the transaction would be assigned the same slot which should fix the problem - I probably missed something. I’ll take another look.