The Risk of Data Loss

anonymous_user_ad6d3acb · July 13, 2016, 6:22am

I have just started to work in a video game industry. I notice that they are having issues with their data. I think, our company is facing here the biggest problem because of the volume and variety of data that we have now. I want to know then, how can they manage all those risk ? Is it possible that it could run the risk of data loss? In such case, what defined set of procedures do they have to look up to avoid certain issues? I heard about data governance. What is it all about?

anonymous_user_64d64697 · July 13, 2016, 2:23pm

data governance is the specification of decision rights and an accountability
framework to encourage desirable behavior in the valuation, creation, storage, use, archival and
deletion of data and information.

It includes the processes, roles, standards and metrics that ensure the effective and efficient use of data and information in enabling an organization to
achieve its goals.

junfanbl · July 18, 2016, 3:32pm

I work for a defense contractor and data loss is a huge issue. Actually it is they issue. One of the best things that you can do in general is to create specific procedures in the handling of the data that reduce the risk of data loss, and then educating the people in the company on how to follow those procedures and the importance of them(emphases on education, lots of it). These procedures should be enforced from the top down. Supervisors should administer them and ensure people are educated and periodically reminded that they should be following them. There are technical implementations that can be made to further reduce the chance of accidental data loss such as restricting the handling of certain data to restricted environments/networks that has limited access and is password protected. However those are very generic answers. Can you be more specific on the issue at hand and maybe we can provide better advice? Or are you just looking for a general answer?

anonymous_user_2b10dd13 · July 21, 2016, 11:00am

From an armchair IT support perspective, here are some things to consider regarding data loss.

Data integrity. Use RAID to protect against data loss. SSD may help as it has no moving parts.

2a. Data redundancy. Timely backups are important. Hard drives fail and usually at the worst times.

2b. Local storage should always be backed up offsite. NEVER keep all your data in one physical location.

Data transfer. This is a big issue with the size of files being transferred through the internal network and to offsite backup/ offices etc. Network infrastructure must keep up or you can have mismatched/ corrupted files/ slow backups/ etc.
Data security. As per above ensure access levels and have procedures in place to reduce risk of security breaches.
Data synchronisation. With very large files keeping track of changes, etc. becomes difficult. Data can be lost because of wrongly synchronised files, overriden files, etc. A lot of people are looking at Git LFS.
User education. People come from different backgrounds so may have different practices and habits.

Good luck!

BrUnO_XaVIeR · July 24, 2016, 8:41pm

Yes, UPS is needed even at home. It prevents so many, soooo many issues.
I’ve seen workplaces without any and I always ask why.

jwatte · July 24, 2016, 9:09pm

You really need some kind of revision control that people commit to and update from with some frequency.
Gamedev often like Perforce, because it’s one of the better tools with large binary data.
Git works fine, too, as long as you have enough hard disk space for it.
Even Subversion is better than nothing.
If you aren’t already using this (even if you are a single developer) then you are DOING IT WRONG.
Source control can also let you back out of changes that end up being bad (or even malicious, if you have poor luck with some particular contributor.)

This way, if you lose a machine, you don’t lose much work.
If you lose your source control server, you can, worst case, make a new one from one of the development machines (whoever synced most recently.)
This also makes it easy for new people and contractors to get what they need to do work.

Once you have source control, you should make sure to take frequent backups of the server you push all your changes to. (Perforce server, Git master respository, etc)
Once a day is a fine frequency.
Store the last three days, the last three weeks, and a quarterly checkpoint on offline media of some sort (loose hard disks in a fire safe; Amazon Glacier storage files; rsync.net hosting; whatever.)
Also, take your server offline and restore your backup to a new server, and let people work on that, once a quarter or so. If you don’t continually audit your backups, you don’t know that you have working backups (and, most likely, you don’t.)

This way, even if you lose an entire office, you can set up shop on new hardware, restore from backup, and let everyone check out the code again.
Also, backups let you go back in time for forensic purposes, and if it turns out someone malicious inserted some time bomb somewhere.

Once you have source control, and backup of your source control systems, you need to automate the setup of any kind of servers you may use (lightmap baking, distributed build, QA/testing/validation, staging servers, even production)
The automated deploy should come from your source control. That is, you clone some repository/directory onto the new server (based on what role it plays,) and run a setup script in that directory, and the server becomes able to do what it’s supposed to do.

That way, it’s easy to make sure that you’re testing the same code you’re running in production, and that there are no “snowflakes” that have hand-crafted, then forgotten, special cases.
If you find that you need to “sudo” or double-click anything other than the “setup this machine as X” script on any of your servers, YOU’RE DOING IT WRONG.

You should take this all the way to developer machines – check in the installers for Visual Studio, 3ds Max, clang, XCode, or whatever other tools you need, if you can.
The setup script can then make sure that the necessary tools are installed.

This makes it very cheap to restore if a machine is lost, and also makes it very easy to spin up a new employee/contractor on a new machine.

Finally, if you believe that you are big enough to be a target for various malicious actors, you need to take opsec steps to preserve the integrity of your data.
Make everyone have to use two-factor authentication for all access (computers, web services, etc.) Smart cards, phone apps, passwords, signed keys, or what have you – pick any two!
Use 802.1x for your networking, such that nobody without the right authentication can get on your network.
Use key signing for your source repository commits to know that nobody is impersonating someone else.
Make sure you require all keys/certificates to be encrypted and require at least a strong passphrase to unlock.

This protects against actively malicious actors that may otherwise snoop around your office, war-dial your WiFi, plug into your Ethernet while using the bathroom, etc.

First, if you use source control. a corrupted file is easy restored, and you shouldn’t be losing too much work from that one file.
Second, if you use best practices for each host OS (journaling file systems, aggressive cache flush, etc) then the risk of corrupting a file should be very small.
Some tools may stlil be trying to edit a file in place in an insecure way; if you have your own tools, make them not do that. If you use third party tools, those are usually safe, but if they aren’t, complain to the tool vendor until they fix it.
Non-safe-save should not be allowed in this day and age …
Separately, a laptop with a docking station has its own built-in UPS; plus you can take your work wherever you go.
The MacBook Pros have terrible GPUs, but for example the Razer line has OK GPUs while still being lightweight.
Slightly thicker laptops from all the vendors are fully capable workstation replacements, and with Thunderbolt GPU enclosures, even the heaviest VR scenes should be OK.

jwatte · July 25, 2016, 4:40pm

Yeah, doing leading edge work on less-modern infrastructure is always a challenge! I’ve done this in the past, too.
You could use a local git repo, and then do the git push when you end work, so it can upload in the background. Even if you don’t push remote, a local git repo lets you check out any previously checked-in version of the file, locally. (This is why git takes lots of disk space!)
Another option is to have a small server in each work office, everyone in that office (might just be you) pushes to that server, and the server syncs to some remote location in the middle of the night. Needs a little bit of IT skillz to set up, but can work great!

Windows 10 defaults are actually pretty good; the system can’t be installed on a FAT partition; it requires NTFS which is much more reliable.

Older Windows version had a setting called “Turn off Windows write-cache buffer flushing on the device” (or “Enable advanced performance”) which will reduce robustness while gaining higher performance.
(Btw: This should NEVER be enabled unless the drive has its own UPS – UPS for the computer isn’t enough!)

However, if you use a second data storage disk, make sure that, too, is NTFS – FAT is known for corrupting data on power loss. If you get properties on the drive, and it says file system is FAT or ExFAT, you’re in for trouble.

If you still get corrupt files on power loss, it’s more likely due to a bug in Substance than in the OS. (Or it could be a hard disk with a firmware bug or marginal media – that happens occasionally.)

DomusLudus · July 25, 2016, 6:29pm

I have a 64 megas pen drive.