Are you a Mac person or a PC person?
That question has been asked for years, often as a veiled way to discern whether someone prefers a polished, beautiful and easy experience or the ability to control every little detail at the expense of usability and a potentially steep learning curve.
Over the years, we’ve seen IT tools move toward the “PC” end of the “Mac-PC” dichotomy, offering more granular control and flexibility, but requiring time and dedication to learn the complexities and nuances of each piece of software. However, as IT teams are stretched thin and time becomes increasingly scarce, there is a need to re-balance. While remaining configurable and customizable, systems must filter out noise and only demand human attention when absolutely necessary.
For disaster recovery and data backup, this principle can be thought of as “manage by exception”. In the context of systems and data that underpin the operations of a firm, the answer to the question “Am I protected?” isn’t as simple as whether the latest backup was successful. Instead, IT personnel need to take a holistic approach to understanding when protection is adequate and when they must spend time and effort taking corrective actions to ensure the safety and security of critical information.
Making the Exception Fit Into the Rule
Adequate and acceptable protection is something that every agency needs to define for itself. What constitutes acceptable will vary according to many factors. For a small firm, adequate protection may be defined as one successful backup per day. In a more sophisticated or sensitive-data scenario, for example, that tolerance threshold may narrow to one successful backup every hour. The requirements are influenced by the importance of each system being protected, as well as the rate and volume of data being created or updated.
Once an appropriate threshold is determined, human action should only be required when the threshold isn’t met. For instance, if running backups every hour, but circumstances only require a daily backup to consider the system to be adequately protected, then one doesn’t need to spend time or effort investigating one failed backup during the day. Perhaps the server was offline for maintenance at the time the backup failed. The “manage by exception” concept relies on answering the question “Am I protected?” as opposed to “Did my backup succeed?”
In other words, don’t let the exception define the rule, but rather define the rule to allow for exceptions.
The Tools of the “Manage by Exception” Trade
As with the Mac versus PC debate, different disaster recovery and backup tools also offer a choice. If the efficiency of your IT team matters, then consider the following when evaluating tools:
● Does the solution let you specify different protection thresholds across multiple systems to be protected?
● How are users informed about the health of protection? Is a dashboard available to give a quick overview? Or do you need to log in and spend time digging through multiple screens about backups to arrive at the same information?
● Does the system provide alerts when the protection health falls outside the threshold window? Or are alerts limited to activity status, such as “backup failed”?
● Does the solution integrate with systems already in use for overall health tracking of the IT environment, such as remote monitoring and management tools?
Time is a precious commodity for any IT professional — most rarely have enough time to do what needs to be done as it is. Put simply, “manage by exception” frees up time. For example, if a backup fails because someone is running maintenance, and a subsequent backup succeeds and is still within your protection threshold, the failure can be ignored. The answer to “Am I protected?” is still going to be yes. Without this philosophy, someone would have to investigate why the backup failed, wasting both valuable time and resources.
To employ this approach in the IT landscape, IT teams need to look at their solutions and determine whether they have the ability to specify protection rules with distinct tolerance thresholds. Being able to set thresholds and have alerts contribute to a “manage by exception” experience yields greater efficiency and streamlined operations, freeing up one of the most valuable resources — time.