Disaster recovery testing is a vital part of any business continuity plan, ensuring that your organization can recover from a disaster effectively and minimize any potential downtime, data loss, or damage. 

To achieve this, it’s crucial to have an effective disaster recovery plan that considers timing, changes, impact, and people. 

In this article, we’ll discuss the purpose of a DR test, the different types of tests, and the best practices to follow.

 

 

What is the Purpose of a DR Test & Why Is Important?

A DR test’s purpose is to evaluate the steps outlined in the plan to ensure that the organization is prepared to handle operational disasters. 

Conducting regular disaster recovery tests is essential to avoid potential issues and ensure that the backup/restore processes remain unaffected by any changes. 

Failing to invest time and resources into testing a disaster recovery plan can result in the plan’s failure to execute as expected when it’s most needed. 

Therefore, experts recommend conducting disaster recovery tests regularly throughout the year, incorporating them into planned maintenance and staff training.

Once a test is completed, the data should be analyzed to identify what worked, what didn’t, and what changes need to be made to the plan’s design. The goal of a disaster recovery test is to meet the organization’s predetermined RPO/RTO requirements.

 

 

Types of Disaster Recovery Tests (+ Examples of Possible Scenarios)

There are three types of disaster recovery testing, which include a plan review, tabletop exercise, and simulation tests.

A plan review involves reviewing the DRP to find any inconsistencies and missing elements. 

A tabletop exercise involves stakeholders walking through all the components of a DRP step by step to uncover any inconsistencies, missing information, or errors. 

A simulation test involves simulating disaster scenarios to see if the procedures and resources allocated for disaster recovery and business continuity work in a situation as close to the real world as possible.

There are two types of simulation tests, including a parallel test and a live or “full interruption” test. A parallel test restores a system that hasn’t broken down to an alternate location, whereas a live or “full interruption” test downs the main system and attempts to recover it.

Disasters can be categorized into several major groups, including equipment failures, user errors, natural disasters, and cyber-attacks. 

Equipment failures range from server meltdowns to storage failures, while user errors involve accidental deletion of data or crashing the database server. 

Natural disasters include hurricanes, tornadoes, and earthquakes, and cyber-attacks can range from malware infections to hacking. 

All of these potential disasters should be considered when developing a DRP.

That being said…

 

 

Checklist of Best Practices for Creating a Disaster Recovery Plan and Disaster Recovery Testing

Based on our experience and all that we’ve mentioned before, here is a checklist of best practices for disaster recovery testing:

  1. Backup data regularly: It is essential to back up data files regularly and store it in a secure location, ideally an offsite cloud backup service that stores and transmits backup data encrypted.
  2. Develop a disaster recovery plan (DRP): Create a clear document outlining the steps to be taken in case of cyber security incidents. Ensure all technical staff or contractors know the plan and its procedures.
  3. Test your DRP regularly: Conduct regular tests of your DRP to ensure it is effective in a real-life crisis. Make updates based on the results of these tests.
  4. Identify critical business functions: Identify the most critical ones and ensure they receive priority in recovery efforts.
  5. Identify dependencies and ensure redundancy: Identify critical dependencies essential for normal operations, such as power and internet connectivity. Ensure that redundancy is in place to provide a backup in case of an outage.
  6. Allocate recovery resources: Allocate resources required to recover from cyber incidents, such as manpower, hardware, and software.
  7. Create an incident response team: Establish a team of individuals trained to respond quickly and effectively to cyber incidents.
  8. Review insurance coverage: Review insurance coverage with experts and ensure it covers all potential cyber-related incidents.
  9. Educate employees: Educate employees on cyber security best practices to reduce the risk of security breaches.
  10. Restrict access to systems and data: Limiting employee access to systems and data minimizes a malicious insider threat. Ensure that privileged access and password controls are enforced, and use two-factor authentication wherever feasible.
  11. Secure the network: Implement security measures, such as firewalls and anti-virus software, to prevent cyber attacks.
  12. Keep software and system up to date: Regularly updating software and systems can prevent security breaches associated with outdated versions. Ensure that any security patches or updates are promptly installed.
  13. Keep documentation current: Ensure all policies and procedures are documented accurately and trained personnel are familiar with the latest information.
  14. Conduct regular training: Train all employees on the DRP, roles and responsibilities, and best practices, including the importance of cyber security hygiene.
  15. Establish communication channels: Establish clear communication channels to inform all stakeholders during cyber security incidents.

By following a comprehensive disaster recovery checklist such as this, businesses can proactively prepare for a cyber security incident and minimize disruption to their operations and financial loss.

 

Disaster-Recovery-Testing-ChecklistDownload

In disaster recovery planning, two critical terms that often come up are RTO and RPO. RTO and RPO are both essential metrics that define how long a business can tolerate downtime and how much data it can afford to lose. 

Understanding the differences between RTO and RPO is vital for creating an effective disaster recovery strategy that can help minimize the impact of a disruptive event.


What is RTO (Recovery Time Objective) in Disaster Recovery?

RTO (Recovery Time Objective) is a metric that determines the maximum amount of time that is tolerable to restore all critical systems online after a disaster. RTO indicates the time between a disaster occurrence and the recovery of the system. 

It is important to define RTO since it allows a company to determine how quickly it needs to recover its activity. RTO can be defined as rapid as a few hours, or it can be as long as a couple of weeks. 

Some factors that can influence a user’s RTO include the amount of revenue a company will lose per hour of downtime, the amount of financial loss that can be absorbed during an emergency, the availability of resources necessary to restore operations, and a customer’s tolerance for downtime.

The RTO is calculated based on the costs and risks associated with downtime, and the time it takes for losses to become significant. If a client needs its systems to function within three hours, then this is its RTO. 

If their average calculated time for effective recovery is five hours, they exceeded their RTO by two hours. This preliminary calculation indicates that more investments in BDR are necessary to reduce the actual recovery time.

Although RTO is not just about determining the duration between the disaster’s start and recovery, but also includes defining the recovery steps that IT teams must perform to restore their applications and data. 


What is RPO (Recovery Point Objective) in Disaster Recovery?

Recovery Point Objective (RPO) is a metric used in disaster recovery planning to determine the maximum acceptable amount of data loss that a company can tolerate without causing significant damage to its business operations.

It defines the frequency with which a company’s systems need to be backed up, and the time interval between the last backup and the occurrence of the disaster. 

The frequency of backups will determine the volume of data at risk of loss, and the company will need to assess the amount of data it considers tolerable to lose in case of a disaster.

RPO is determined by the company’s owner/director and IT management, and it helps to configure the appropriate backup job. For critical systems, an RPO of 15 minutes is recommended as a good compromise between system load and processing time. 

RPO is closely related to the frequency of data backup, and it depends on the complexity and number of fundamental systems, volume of data and access requirements, frequency of data changes, and the backup method used.

RPO is critical in determining the company’s continuity during downtime. The longer the RPO, the greater the possibility of data loss due to prolonged downtime. 

RPO aims to answer the question, “How much data can the company afford to lose?” 

In other words, RPO determines the age of the data that must be recovered to resume business operations. 

The RPO prepares the scenario for determining the disaster recovery plan, evaluating the importance of the data, and deciding which applications, processes, or information should be recovered. 

The backup system determines the RPO, depending on the specified time of the last backup and the type of backup. 

Therefore, RPO is important in guiding an MSP’s recommendations for data backup solutions, especially regarding storage space and backup mode.


4 Main Differences Between RTO and RPO

Recovery Point Objective (RPO) Recovery Time Objective (RTO)
Amount of data loss a company can tolerate in the event of a disasterThe maximum amount of downtime a company can tolerate
Determines the frequency of data backups and replicationDetermines the time needed to recover a system after a disaster
Helps establish the maximum acceptable time gap between backupsHelps establish the acceptable time frame for system recovery
Helps ensure that the most recent version of data is always availableHelps ensure that the system is back up and running as quickly as possible

In conclusion, RTO and RPO are two fundamental concepts that must be considered when designing a disaster recovery plan. 

Both metrics play a crucial role in ensuring business continuity and minimizing data loss. 

By understanding the differences between RTO and RPO, organizations can make informed decisions about how to allocate their resources and prioritize their recovery efforts to minimize downtime and keep critical business operations running smoothly.

From taking inventory of your devices and applications to choosing the right pricing plan and managing your server, this guide will provide you with the following Office 365 migration tips.

After all, migrating to Office 365 can be a daunting task for any small or midsize company. Whether it’s to upgrade business tools or as part of a merger, the migration process can present challenges that can negatively impact the business if not done correctly. 

However, with the right planning and guidance, companies can make a safe and accurate transition. Follow these Office 365 migration tips and you will be on the right path.


8 Office 365 Migration Tips for Small and Midsize Companies

To make the migration process smoother, companies should not skimp on preparation and plan for coexistence to minimize the impact on business. They should also implement the ABCs of security and not forget about post-migration management. So keep reading.

  1. Analyze what will be affected by migration: Before starting a Microsoft 365 migration, take inventory of all networked devices and applications that will be affected. This will help you identify which devices may lose functionality during the migration, and give you time to research and implement additional configurations to maintain their functionality in the new environment.

  1. Meet System Requirements: Make sure your versions of Office and Windows meet the Microsoft 365 system requirements.

    It’s important to ensure that your versions of Office and Windows are compatible with Microsoft 365 before starting the migration. While it’s best to use the most recent version of Microsoft 365, if your organization is currently using older versions, it’s still possible to upgrade. However, it’s important to note that older versions may have reduced functionality, which could impact your users.




  2. Verify DNS Compatibility: Make sure to check whether your DNS provider supports SRV records, as this can impact your organization’s ability to email, instant message, and more.

    If your organization is a nonprofit, it’s important to confirm that you qualify for nonprofit pricing before beginning the migration process. Unexpected expenses can be a headache, especially if you thought your licensing was free. Check your eligibility requirements listed on Microsoft’s nonprofit page, or speak with your IT partner.



  3. Consider Business Needs: Before choosing a Microsoft 365 plan, consider your organization’s business needs.

    For instance, if your organization has industry-specific compliance requirements for data security, regulatory reporting, or data recovery, make sure that the plan you choose meets those needs.



  4. Test on-prem Exchange server: Use Microsoft’s Remote Connectivity Analyzer to test whether your on-prem Exchange server will encounter any connectivity issues during the migration process.

    If your server doesn’t pass the test, the Connectivity Analyzer will highlight any problems that need to be fixed before the migration can begin.




  5. Inspect Files Before: Before migrating files, inspect them to ensure that they’re supported by Microsoft 365 and that their filenames don’t contain unsupported characters.

    Failing to account for file permissions can also cost you time and effort when you have to rebuild security policies from scratch after the migration.



  6. Decommission On-Prem Servers: Make sure to verify that any on-prem servers, such as Lync servers, have been properly decommissioned before starting the migration.

    Failure to do so could result in users being unable to connect to new Microsoft 365 features.


  1. Decide Manage Server: Decide ahead of time who will be responsible for administering your Microsoft 365 tenant after the migration is complete.

    If you’re working with an IT partner, they can manage it for you, or you can choose to manage it in-house by taking courses at Microsoft’s Virtual Academy.


Do’s & Don’t’s for a Successful Office 365 Migration

As you may or may not know, migrating to Office 365 and Azure AD can bring a range of benefits to organizations, from improved collaboration and productivity to enhanced security and compliance. 

With feature sets now on par with on-premises counterparts, it’s hard to justify investing in expensive on-prem email, collaboration, and communication capabilities when everything can be obtained through a monthly subscription to Office 365. 

Azure AD also offers compelling features, such as the ability to provide single-sign-on (SSO) to thousands of end-user applications, including non-Microsoft ones like Salesforce, and valuable security features like conditional access policies.

However, migrating to Office 365 is not without its challenges. Proper assessment, inventory, and cleanup of the source environment are necessary, along with efficient migration tracking, ensuring normal user operations throughout the process, and proper management of the target environment after migration. 

Specific challenges include mapping permissions from the source platform to Office 365, dealing with feature restrictions and size limitations, and migrating highly customized SharePoint applications. 

Additionally, native tools have important limitations during each phase of the migration process, with no capability to merge tenants or to migrate from one tenant to another. But with proper planning and execution, organizations can overcome these challenges and experience a successful migration. Simple Office 365 migration tips can go a long way.

That being said, here’s what we recommend you to do and don’t if you’re planning to migrate into Office 365:

  1. Prepare – Planning and preparation are essential for a successful migration to Office 365. An accurate inventory of your source environment is critical, including user accounts, SharePoint content, and email archives. Application inventory should start early and should consider the target platform’s requirements.

    Define your goals and what data and applications you want to host in Office 365, and consider governance, compliance, and technical limitations. Also, clean up your environment and review SharePoint metrics before migration and plan for possible issues and ensure a quick recovery.

    Lastly, estimate how long the migration will take and plan for coexistence requirements and verify that the new environment is working properly before decommissioning the source environment.

  1. Minimize the Business’ Impact – A successful migration should be complete and accurate, ensuring that all required data is moved and users can work effectively in the new environment. It’s important to update user profiles and ensure that SharePoint sites and OneDrive data are migrated completely and accurately.

    To keep users happy, the migration should be completed quickly, with the ability to perform migration jobs in parallel and schedule jobs to run at convenient times.

    Efficient administration and reporting reduce costs and help complete the migration on time, with third-party solutions offering a dashboard for tracking and reporting on migration status.

    Consider getting help from migration experts, either for the complete project or specific pieces, as most IT pros don’t perform migrations frequently.

    Finally, 24/7 support is crucial to address issues that may arise as quickly as possible, minimizing the impact on the business.

  1. Co-Existence Strategy – When migrating from on-premises Exchange to Exchange Online, it’s essential to maintain a seamless user experience. This can be achieved through a co-existence strategy that synchronizes the source and target mailboxes, calendars, address lists, and public folders.

    By flipping a switch, you can easily migrate a particular group of users without affecting others. You’ll also need to synchronize your Active Directory users and groups and migrate your back-end resources, such as file servers, databases, and SharePoint sites.

    Native tools are not sufficient for this task, as they require extensive scripting and offer limited troubleshooting capabilities. Third-party tools that offer strong co-existence capabilities can make the migration process easier and less disruptive for users, reducing the risk of business impact.



  1. Post-migration Management – Post-migration management is a crucial aspect of any migration to the cloud, including Office 365.

    While moving to the cloud eliminates some administrative responsibilities, such as hardware management and platform availability, you and your team are still accountable for day-to-day administration, IT governance, and compliance with internal and external regulations.

    These responsibilities include permissions reporting, privileged account management, compliance auditing, provisioning, backup and recovery, and license management.

    Having the right tools in place before the migration starts is essential to ensure a secure and effective environment from day one.

    Microsoft will be responsible for performance and availability, but you’ll still need to manage and secure your Office 365 environment.

When it comes to finding the right IT solution for your business, you have several options to choose from. Managed IT services and in-house IT department services all have their pros and cons. 

This article will compare these two IT solutions to help you determine which is best suited for your company.

Managed IT Services vs. In-House IT Availability Comparison 

Availability is one of the most important factors to consider when choosing an IT solution. 

Here is a comparison of how managed, in-house, and co-managed IT services handle availability.

AvailabilityAdvantagesDisadvantages
Managed ITMSPs provide redundancy, ensuring that you always have access to IT support.
MSPs have on-call engineers to address IT problems outside of typical business hours.
They cannot provide as much on-site support as in-house IT can. An MSP engineer may visit your site only once a week.
In-House ITHiring an in-house engineer gives you the option to have your engineer on-site during all business hours.
Your in-house IT engineer can address problems as they arise.
In-house IT resources can have lapses when the engineer takes time off.

Managed IT Services vs. In-House IT Service Level Comparison

All IT solutions are designed to support your IT environment. Here is a comparison of what service looks like for managed, in-house, and co-managed IT services.

Service-LevelAdvantagesDisadvantages
Managed ITMSPs provide constant support from engineers with expertise in specific IT disciplines.
MSPs have the knowledge and skills to solve complex IT problems.
MSPs might not know your business or industry.
In-House ITIn-house IT engineers know your business and industry.
In-house IT engineers are always available on-site.
In-house IT engineers may not have expertise in all IT disciplines.
In-house IT can be expensive to maintain.


Managed IT Services vs. In-House IT Cost Comparison

Cost is always an important factor when it comes to choosing an IT solution. Here is a comparison of the cost of managed, in-house, and co-managed IT services.

CostAdvantagesDisadvantages
Managed ITMSPs are typically less expensive than hiring a full in-house IT department.MSPs may charge extra for some services or require you to sign a long-term contract.
In-House ITIn-house IT departments provide complete control over your IT environment.In-house IT departments are expensive to maintain, requiring salaries, benefits, and infrastructure.

Conclusion

Managed IT services, in-house IT departments, and co-managed IT services each have their pros and cons. The right choice for your business depends on your specific needs and goals. 

Managed IT services are becoming more popular as they are less expensive, easier to set up and maintain, and have teams segmented into tiers, ensuring that any issue is addressed by the right person.

They are also efficient, have experienced professionals, and offer remote problem resolution.

Managed IT service providers are experienced in managing network security and keeping data safe, ensuring your network is protected from cyber threats. However, working with MSPs can be a hands-off experience, and some companies may prefer more control over their cybersecurity.

On the other hand, building an in-house IT department allows for more customization, hiring employees with the exact qualifications and experience needed, and customizing the hardware and software.

However, it can be expensive, and the costs can quickly add up, paying for salaries, benefits, workstations, and cyber security and management software.

The decision between in-house or managed IT services depends on your company’s specific needs and capabilities, such as the size of the company, the level of control required, and the complexity of the IT infrastructure.

Ultimately, it’s essential to weigh the benefits and drawbacks of both options and review feedback before selecting an IT company.


Credits: Featured image/photo by Sigmund on Unsplash

To address misconceptions about the frequency and cost of data center downtime, we’ve studied and now explained the common causes, potential costs, and solutions.

After all, the reliance on IT systems to support business-critical applications has increased significantly over the past decade, with data center availability now becoming essential to many companies whose customers pay a premium for access to a variety of IT applications. 

This connection between data center availability and total cost of ownership has made a single downtime event capable of significantly impacting the profitability (and, in extreme cases, the viability) of an enterprise. 

Costs of Data Center Downtime

A study found that the average cost of data center downtime was approximately $5,600 per minute, and the average cost of a single downtime event was approximately $505,500.

Indirect and opportunity costs accounted for more than 62 percent of all costs resulting from data center downtime

This study conducted in 2011 involved Data Center Professionals from 41 independent facilities across various industry segments such as financial services, telecommunications, retail, healthcare, government, and third-party IT services. 

The participating data centers were required to have a minimum of 2,500 ft2 to ensure that the costs were representative of an average enterprise data center. 

Respondents provided cost estimates for a single recent outage, and follow-up interviews were conducted to obtain additional information. 

Business disruption and lost revenue were the most significant cost consequences, and losses in end-user and IT productivity also had a significant impact. Surprisingly, equipment costs were among the lowest costs reported for a downtime event.

Common Causes of Data Center Downtime

The common causes of downtime are UPS system failure, human error, and cyber attacks.

But let’s take a look at two that cause more damage, therefore, result in more expensive.

a) Power-Related Outages – The root causes of power-related outages are discussed, and it is noted that UPS and generator failures are the most costly. Tier I and II data centers are particularly vulnerable to power failures due to a lack of redundancy and other preventative measures.

Redundancy in power systems is recommended to minimize the impact of equipment failure. Additionally, regular maintenance and monitoring of critical power systems can help to minimize the risk of power equipment failure.

Comprehensive monitoring solutions can aid in quickly identifying and addressing power equipment issues.



b) Environmental-Related Outages – Environmental vulnerabilities, such as thermal issues and water incursion, are cited in this study as root causes of data center failures, accounting for 15% of all root causes.

IT equipment failures caused by environmental issues are the most expensive, with a cost of more than $750,000 per incident. It also emphasizes that an optimized cooling infrastructure is critical to preventing catastrophic equipment failures and minimizing downtime.

Best practices for cooling infrastructure are explored, including using refrigerant-based cooling instead of water-based solutions, eliminating hot spots and high heat densities, installing robust monitoring and management solutions, and implementing regular preventive maintenance and service visits.

However, you can implement a proactive strategy to mitigate these risks and improve availability by considering these six key strategies.

Solutions for Data Center Downtime

Regular assessments and performance optimization services can help identify vulnerabilities and create a plan tailored to your infrastructure and budget. By implementing these strategies, you can improve availability, reduce downtime risks, and gain a competitive edge.

Firstly, monitor batteries and implement a battery maintenance program that identifies system anomalies and trends end-of-life. 

Secondly, consider monitoring software like Vertiv’s Data Center Planner to help identify battery problems before they impact operations. 

Thirdly, consider lithium-ion batteries as they are smaller, lighter, and last longer while providing the power needed for critical loads. 

Fourthly, use an integrated approach to optimize your infrastructure with Vertiv’s Liebert iCOM-S Thermal System Supervisory Control to match load demand. 

Fifthly, keep the data center clean, perform preventative maintenance, and assess environmental threats to protect your infrastructure. 

And lastly, implement and update policies and procedures regularly to ensure everyone is aware of common threats and how to respond to system failures.