Introduction:
I started to notice this issue after our update from 4.11 to 4.13, but i’m not sure if it was present even before.
In our project, we manually serialize some data, and send it to clients using some replicated TArray, to grab them (if and when they arrive client-side), deserialize and use the data. So far so good: locally it worked like a charm.
But when we tested it online, we got totally random issues/crashes: after a lot of tests, we saw that sometimes the deserialization output data was just garbage, and we narrowed the problem to the deserialization input data.
We created a new project to test this issue, and confirmed it. Below you can find the issue description, how to reproduce it, source files etc.
Description:
A replicated TArray of bytes (uint8) can get altered/corrupted over a internet network connection, probably due to UDP packet alteration/corruption. I’m not sure if this can happen with every replicated variable (i.e. if it’s a global problem or just a TArray’s one).
How we tested it
To test the isssue, we created the following situation:
server: it spawns 4 actors that will contain a TArray, and randomly generates 4 byte sequences that will be used as test cases (random length, random data), and save them in the GameState (these are replicated too, since the client will use them for the validity check).
Every tick, the server will swap the sequence between replicated actors, forcing them to be sent to the client (in particular, we choose to cycle them, i.e. the first actor will have on the first tick the sequence 1, on the second tick the sequence2, on the third the sequence 3, and so on).
client: it receives the replicated data on each actor, and checks it with the 4 known sequences, to see if corresponds or it got corrupted somehow.
Results
Testing it locally, it got 0 errors in an hour of continued testing, as expected.
However, we got 2 errors testing it over the internet network after half an hour (we did just a test with this setup, but as we saw in our project, it could be as low as 3 minutes, it’s totally random).
How to reproduce it
- Create a new empty c++ project called “NetTest”
- Extract in the project folder the attached .rar archive
- Build & package it (in development mode)
- Prepare to run it on 2 different machines, with a internet-network connection in between (or with something that can simulate packet corruption)
- Launch the game
- On server, open the console with the “end” key, and type: “travel TestMap?listen”
- On client, open the console with the “end” key, and type: "travel "
- The server will print on screen when the test will start (a couple of seconds after the client connection)
- The client will continuosly show the amount of correct replicated sequences received, as well as the amount of corrupted ones.
- After one or more corrupted ones are detected, you can open the console and type exit on both client and server, and check the logs: on the server you will see the 4 valid sequences, on the client you will see the corrupted ones received. Most of the times, the difference is located on just a single byte.
Attachments
I will attach a .rar with the configs, content and source files. (source attachment)
I can provide you with the packaged version too if needed.
[edit1:]
Expected behavior
If a packet gets corrupted, the data should be dropped/ignored, and the variable should not be updated.
Final considerations
At this point, i am curious and have a couple of questions:
- how does Unreal manage corrupted packets?
- is this an issue about TArrays, replicated variables or even RPC can be affected?
- as a workaround, do you think adding a CRC or something similar to the byte sequence could be enought to avoid garbage input on my system?
[edit 2]:
Test result example
Here a log result example:
Server side (test sequences)
[2016.10.28-19.18.37:109][851]LogTemp:Warning: Sequence 1:
[2016.10.28-19.18.37:109][851]LogTemp:Warning: Sequence data: 107 33 63 183 152 88 178 28 38 143 37 63 235 216 39 188 189 52 175 41 212 129 8 132 37 147 20 123 47 243 196 44 64 24 249 157 206 148 166 120 212 20 114 67 201 151 167 184 53 214 175 107 109 42 239 86 127 240 69 136 208 198 235 141 125 141 22 196 1 153 6 64 38 242 220 103 214 21 0 236 191 23 78 89 100 153 234 191 156 106 237 201 7 70 191
[2016.10.28-19.18.37:109][851]LogTemp:Warning: Sequence 2:
[2016.10.28-19.18.37:109][851]LogTemp:Warning: Sequence data: 129 165 106 243 28 227 155 92 165 209 178 4 73 240 148 113 78 96 148 29 120 208 53 74 25 105 13 122 94 23 150 32 211 10 46 185 179 248 21 252 1 222 167 59 0 208 173 170 254 160 240 170 17 214 244 204 186 226 15 161 5 227 174 21 32 243 7 109 1 100 12 151 19 209 219 244 127 101 126 62 15 79 57 188 62 190 51 97 109 78 197 10 215 94 46 118 64 26 255 45 235 190 255 238
[2016.10.28-19.18.37:109][851]LogTemp:Warning: Sequence 3:
[2016.10.28-19.18.37:109][851]LogTemp:Warning: Sequence data: 231 208 225 244 102 32 34 237 111 173 212 183 160 230 107 69 176 102 73 85 68 35 90 49 197 138 224 251 162 89 8 130 231 117 123 53 145 149 142 43 136 245 62 219 159 233 152 127 145 148 188 168 87 131 31 26 191 130 54 102 181 103 53 89 30 30 1 191 175 200 4 130 169 30 13 179 227 0 219 128 5 88 136 163 100 45 105 73 94 43 164 6 166 98 13 216 100 137 125 184 143 138 78 214 108 161 26 237 158
[2016.10.28-19.18.37:109][851]LogTemp:Warning: Sequence 4:
[2016.10.28-19.18.37:110][851]LogTemp:Warning: Sequence data: 112 213 165 35 152 190 60 230 39 52 124 47 106 96 41 107 111 159 23 219 191 238 8 150 126 49 240 91 10 225 9 4 218 49 148 206 163 28 189 93 143 123 128 162 143 159 252 137 132 120 29 160 98 214 248 141 237 197 222 149 71 143 101 0 229 114 124 155 37 134 200 39 31 227 84 254 225 150 228 172 187 82 228 206 196 142 73 250 62 238 102 253 4 159 252 49 166 156 143 49 98 109 74 195 156 93 110 40
client side (wrong sequence received):
[2016.10.28-19.43.28:315][365]LogTemp:Warning: Error found, received sequence:
[2016.10.28-19.43.28:315][365]LogTemp:Warning: Sequence data: 112 213 165 35 102 190 60 230 39 52 124 47 106 96 41 107 111 159 23 219 191 238 90 150 126 49 240 91 10 225 9 4 218 49 148 206 163 28 189 93 143 123 128 162 143 159 252 137 132 120 29 160 98 214 248 141 237 197 222 149 71 143 101 0 229 114 124 155 37 134 200 39 31 227 84 254 225 150 228 172 187 82 228 206 196 142 73 250 62 238 102 253 4 159 252 49 166 156 143 49 98 109 74 195 156 93 110 40
[2016.10.28-19.43.28:605][383]LogTemp:Warning: Error found, received sequence:
[2016.10.28-19.43.28:605][383]LogTemp:Warning: Sequence data: 112 213 165 35 28 190 60 230 39 52 124 47 106 96 41 107 111 159 23 219 191 238 53 150 126 49 240 91 10 225 9 4 218 49 148 206 163 28 189 93 143 123 128 162 143 159 252 137 132 120 29 160 98 214 248 141 237 197 222 149 71 143 101 0 229 114 124 155 37 134 200 39 31 227 84 254 225 150 228 172 187 82 228 206 196 142 73 250 62 238 102 253 4 159 252 49 166 156 143 49 98 109 74 195 156 93 110 40
As you can see, it was the same sequence in this case, with 2 different errors contained.
This happened after 25 minutes from the test begin.