Failed to start Horde : Unable to find any healthy Perforce server in cluster XXXXX

Horde failed to start.

Starting up Horde Container : the container takes 30 minutes to give first logs and Horde is unresponsive during that time.

Any authentication fails : event when we tried to disable IODC and changed to Horde local passwords.

After started: we are having this message looping :

[11:50:30 wrn] Unable to find any healthy Perforce server in cluster XXXX
[11:50:30 err] Failed resolving stream infos for Perforce cluster XXXX
HordeServer.VersionControl.Perforce.PerforceServiceException: Unable to select server from 'XXXX'
   at HordeServer.VersionControl.Perforce.PerforceService.GetServerAsync(PerforceCluster cluster, CancellationToken cancellationToken) in /app/Source/Programs/Horde/Plugins/Build/HordeServer.Build/VersionControl/Perforce/PerforceService.cs:line 656
   at HordeServer.VersionControl.Perforce.PerforceService.ConnectAsync(PerforceCluster cluster, String userName, CancellationToken cancellationToken) in /app/Source/Programs/Horde/Plugins/Build/HordeServer.Build/VersionControl/Perforce/PerforceService.cs:line 458
   at HordeServer.VersionControl.Perforce.PerforceService.ConnectAsync(String clusterName, String userName, CancellationToken cancellationToken) in /app/Source/Programs/Horde/Plugins/Build/HordeServer.Build/VersionControl/Perforce/PerforceService.cs:line 450
   at HordeServer.VersionControl.Perforce.PerforceServiceCache.CreateStreamInfoForClusterAsync(String clusterName, IEnumerable`1 streams, CancellationToken cancellationToken) in /app/Source/Programs/Horde/Plugins/Build/HordeServer.Build/VersionControl/Perforce/PerforceServiceCache.cs:line 352
   at HordeServer.VersionControl.Perforce.PerforceServiceCache.CreateStreamInfoAsync(CancellationToken cancellationToken) in /app/Source/Programs/Horde/Plugins/Build/HordeServer.Build/VersionControl/Perforce/PerforceServiceCache.cs:line 338
[11:51:00 wrn] Unable to find any healthy Perforce server in cluster XXXX

[Attachment Removed]

Steps to Reproduce
Starting up Horde Container : the container takes 30 minutes to give first logs and Horde is unresponsive during that time.

Any authentication fails : event when we tried to disable IODC and change to Horde local passwords.

After started: we are having this message looping :

[11:50:30 wrn] Unable to find any healthy Perforce server in cluster XXXX
[11:50:30 err] Failed resolving stream infos for Perforce cluster XXXX
HordeServer.VersionControl.Perforce.PerforceServiceException: Unable to select server from 'XXXX'
   at HordeServer.VersionControl.Perforce.PerforceService.GetServerAsync(PerforceCluster cluster, CancellationToken cancellationToken) in /app/Source/Programs/Horde/Plugins/Build/HordeServer.Build/VersionControl/Perforce/PerforceService.cs:line 656
   at HordeServer.VersionControl.Perforce.PerforceService.ConnectAsync(PerforceCluster cluster, String userName, CancellationToken cancellationToken) in /app/Source/Programs/Horde/Plugins/Build/HordeServer.Build/VersionControl/Perforce/PerforceService.cs:line 458
   at HordeServer.VersionControl.Perforce.PerforceService.ConnectAsync(String clusterName, String userName, CancellationToken cancellationToken) in /app/Source/Programs/Horde/Plugins/Build/HordeServer.Build/VersionControl/Perforce/PerforceService.cs:line 450
   at HordeServer.VersionControl.Perforce.PerforceServiceCache.CreateStreamInfoForClusterAsync(String clusterName, IEnumerable`1 streams, CancellationToken cancellationToken) in /app/Source/Programs/Horde/Plugins/Build/HordeServer.Build/VersionControl/Perforce/PerforceServiceCache.cs:line 352
   at HordeServer.VersionControl.Perforce.PerforceServiceCache.CreateStreamInfoAsync(CancellationToken cancellationToken) in /app/Source/Programs/Horde/Plugins/Build/HordeServer.Build/VersionControl/Perforce/PerforceServiceCache.cs:line 338
[11:51:00 wrn] Unable to find any healthy Perforce server in cluster XXXX

[Attachment Removed]

The issue is related to the content of Mongo db database : the same conf with an empty db starts an runs fine

For the container startup delay : If I remove the remote storage for /App/Data it starts quickly.

If I keep the remote storage i have to wait about 30 minutes even if we plug an empty file system.

It is an Azure Container instance.

Answer to your questions :

  1. It is a Horde server running since July 2024.
  2. Nothing ! The /App/Logs files “do not move” after the container started. But I can connect to a Bash on my running container and observe some light activity on the CPU and the remote MongoDB.
  3. My mongo was 7.1 Atlas Cluster. I moved to an empty 8.0 version. Redis is 7.2 stand alone container.
  4. My Azure Container instances runs both horde server and the redis cache with a 3 CPU/6GB ram limitation. We never had a overloaded RAM or CPU.

My question

In the /App/Data/Storage/logs we had 400 000 blob folders/files !

As if the housekeeping is not working.

Here you’ll find our conf in global.json

		"build": {
			"artifactTypes": [
				{
					"name": "step-output",
					"keepDays": 7
				},
				{
					"name": "step-trace",
					"keepDays": 30
				},
				{
					"name": "step-logs",
					"keepDays": 30
				},
				{
					"name": "step-saved",
					"keepDays": 14
				},
				{
					"type": "packaged-build",
					"keepDays": 14
				},
				{
					"type": "staged-build",
					"keepDays": 14
				},
				{
					"type": "ugs-pcb",
					"keepCount": 5
				}
			],

[Attachment Removed]

I found answers to my problems :

1 - Container is not starting quickly because of a Network mount on /App/Storage in stead of using Azure Blob Storage as backend.

I began to test the feature with no luck but it is another story.

After removing this mounted drive (cifs/nfs) the container starts instantaneously !

Draw back : local container storage is limited and GC must work

2 - Garbage Collection was not working because i missed a config (“enableGc”: true,). With the following config is it Working.

"storage": {
			"enableGc": true,
			"backends": [
				{
					"id": "default-backend",
					"type": "FileSystem",
					"baseDir": "Storage" 
				},
....

Sometimes we may have issue when it tries to delete some files on local storage :

0251217.txt Host-639010423962025714:/app/Data/Logs# grep -i  ' horde-artifacts (storage:horde-artifacts:check) has' Log20
[08:35:33 inf] Garbage collection queue for namespace horde-artifacts (storage:horde-artifacts:check) has 14375 entries
[08:40:33 inf] Garbage collection queue for namespace horde-artifacts (storage:horde-artifacts:check) has 14378 entries
[08:45:33 inf] Garbage collection queue for namespace horde-artifacts (storage:horde-artifacts:check) has 14370 entries
[08:50:33 inf] Garbage collection queue for namespace horde-artifacts (storage:horde-artifacts:check) has 14365 entries
[08:55:33 inf] Garbage collection queue for namespace horde-artifacts (storage:horde-artifacts:check) has 14362 entries
[09:00:33 inf] Garbage collection queue for namespace horde-artifacts (storage:horde-artifacts:check) has 14360 entries

In that case it fails and skip until next GC run !!! So GC is slow to remove unneed blobs…

[08:40:33 inf] Garbage collection queue for namespace horde-artifacts (storage:horde-artifacts:check) has 14378 entries
[08:40:33 err] Exception while running garbage collection: Could not find a part of the path '/app/Data/Storage/artifacts/step-output/main/45515/compile-tools-win64/69369bf4c768b03109318ff0/595da3c9d9924a8981838536d256cfa8_1.blob'.
[08:45:33 inf] Running garbage collection for namespace horde-artifacts...
[08:45:33 inf] Garbage collection queue for namespace horde-artifacts (storage:horde-artifacts:check) has 14370 entries
[08:45:33 err] Exception while running garbage collection: Could not find a part of the path '/app/Data/Storage/artifacts/step-output/main/45515/copy-editor-files-to-staging-directory/69369d1fc768b03109319278/a59e9ae2c6e041f4b74f0d6eff0640e8_1.blob'.
[08:45:33 inf] Running garbage collection for namespace default...

I am testing a new code for it in FileObjectStore.cs

		/// <summary>
		/// Delete a file from the store
		/// </summary>
		/// <param name="key"></param>
		public void Delete(ObjectKey key)
		{
			FileReference location = GetBlobFile(key);
			// BEGIN CHANGE Carpool studio : Willy MARIO - Only delete if the file exists
			if (FileReference.Exists(location))
			{
				_mappedFileCache.Delete(location);
			}
			else
			{
				_logger.LogWarning("Tried to delete non-existing file {FilePath}", location);
			}
			// END CHANGE
		}

Best regards.

[Attachment Removed]