Backups, Our Most Sacred Commitment to Users & Customers

Backups, Our Most Sacred Commitment to Users & Customers

ChinaNetCloud's Internet Operations & Server Management includes many types of services such as setup, software installation, monitoring, troubleshooting, performance, scale, security, and more.  Up to 250 million users, per system (after all, we are in China).

We find, fix, and automated everything, 24x7, so our customers can sleep at night.  We also help them build and run reliable, fast, and secure sites so their businesses can succeed, in China and around the world.  We have many responsibilities to our customers.

But none is more important than backups.  And we take nothing more seriously.

While we work hard to avoid it, customers can survive a little downtime; they can even survive a lot of downtime.  They can have a slow site, or scaling issues, even a security problem.  

But the one thing they cannot survive is the loss of their data, especially if they discover they don't have good backups.  This is fatal to almost any company and something we cannot allow to happen, ever.

Backups are hard to do correctly, in part because you never need them, until the day you need them.  They are easy to do incorrectly, missing a key file or database, or with incorrect options.  And they often fail due to space, permissions, options, or other complex reasons.  But you'll never know this until you need them, when you discover they didn't work.

No wonder company after company runs into serious trouble with bad or missing backups, or ones that get stolen and leaked by hackers or incompetence.  This happens every day, even in state-of-the-art places like Silicon Valley.

This gets more difficult as the backup environment and variables get more complex.  For ChinaNetCloud, this includes many types of data and customers, with different technology and tools, spread all over the world in hundreds of locations, backing up to dozens more.  It's very difficult to keep it all working, and to guarantee that it's all perfect, every day, for every customer.

And yet, we must.  We must make sure every backup not only works, but is valid, secure, and usable when the magic moment comes when it's needed most.  How do we do this?

First, we use best-practice tools and processes to help ensure that things are done right, every time.  This includes carefully selecting backup options that ensure key data like databases are backed up correctly, in a coherent state, with all data and objects synchronized and included.

Second, we use a multi-level backup strategy which keeps local copies for fast recent restores (e.g. if a table is deleted or data corrupted), and remote copies for full restores if a server or data center is lost due to disaster such as fire or full disk crash.

Third, we keep the system as secure as possible.  For example, backups that use our servers are written in a write-only method that prevents hackers or anyone from deleting them, even if they break into the server being backed up.  This prevents hackers from deleting backups, and ensures there are always valid backups to restore.  AWS has some good features for this, too.

Forth, we conduct periodic backup audits to validate the whole backup chain, including that everything is backed up that should be, that it's completing correctly, the keys are correct, and the records are up to date.  Our customers' systems are always changing and it's easy to have a new file system or subsystem that gets missed, so these audits help insure we get the right things, every time.

Fifth, we carefully monitor the backup processes to insure they complete correctly, without error, with non-zero length files, and reasonable times, sizes, and file lists.  This helps make sure the backups are valid and useful, and any problems are ticketed for immediate follow up.

Sixth, we have new tools to test the backups as best we can.  This is very difficult to do correctly, but we are looking at how to best restore and test web, image, and DB content of all sizes.  For DBs we will restore and then test them, but for web and images it’s not so obvious how to really test, so we’ll likely load and check tar headers and that some files are there as basic validation.

In the end, correct and useful backups are the most important and sacred thing our customers entrust to us.  We work hard to live up to that and to protect their systems, data, and businesses in case anything goes wrong.

We promise systems are Up, Fast, and Cheap.  Backed-up, too.

To view or add a comment, sign in

Insights from the community

Explore topics