Business continuity planning

From HORSE - Holistic Operational Readiness Security Evaluation.
Jump to navigation Jump to search

Business continuity planning life cycle

Business continuity planning (BCP) "identifies an organization's exposure to internal and external threats and synthesizes hard and soft assets to provide effective prevention and recovery for the organization, while maintaining competitive advantage and value system integrity”. It is also called business continuity and resiliency planning (BCRP). A business continuity plan is a road-map for continuing operations under adverse conditions such as a storm or a crime. In the US, governmental entities refer to the process as continuity of operations planning (COOP).

Any event that could impact operations is included, such as supply chain interruption, loss of or damage to critical infrastructure (major machinery or computing/network resource). As such, risk management must be incorporated as part of BCP.

In December 2006, the British Standards Institution (BSI) released an independent standard for BCP — BS 25999-1. Prior to the introduction of BS 25999, BCP professionals relied on information security standard BS 7799, which only peripherally addressed BCP to improve an organization's information security procedures. BS 25999's applicability extends to all organizations. In 2007, the BSI published BS 25999-2 "Specification for Business Continuity Management", which specifies requirements for implementing, operating and improving a documented business continuity management system (BCMS).

In 2004, the United Kingdom enacted the Civil Contingencies Act 2004, instructing emergency services and local authorities to actively prepare for emergencies. Local authorities were given the legal obligation to actively lead promotion of business continuity practices in their respective jurisdictions.

Analysis

The analysis phase consists of impact analysis, threat analysis and impact scenarios.

Business Impact Analysis (BIA)

A Business Impact Analysis (BIA) differentiates critical (urgent) and non-critical (non-urgent) organization functions/activities. Critical functions are those whose disruption is regarded as unacceptable. Perceptions of acceptability are affected by the cost of recovery solutions. A function may also be considered critical if dictated by law. For each critical (in scope) function, two values are then assigned:

The recovery point objective must ensure that the maximum tolerable data loss for each activity is not exceeded. The Recovery Time Objective must ensure that the Maximum Tolerable Period of Disruption (MTPD) for each activity is not exceeded.

Next, the impact analysis results in the recovery requirements for each critical function. Recovery requirements consist of the following information:

  • The business requirements for recovery of the critical function, and/or
  • The technical requirements for recovery of the critical function

Threat and Risk Analysis (TRA)

After defining recovery requirements, each potential threat may require unique recovery steps. Common threats include:

  • Epidemic
  • Earthquake
  • Fire
  • Flood
  • Hacker (computer security)|Cyber attack
  • Sabotage (insider or external threat)
  • Hurricane or other major storm
  • Power outage|Utility outage
  • Terrorism
  • Piracy
  • War/civil disorder
  • Theft (insider or external threat, vital information or material)
  • Random failure of mission-critical systems

The impact of an epidemic can be regarded as purely human, and may be alleviated with technical and business solutions. However, if people behind these plans are affected by the disease, then the process can stumble.

During the 2002–2003 SARS outbreak, some organizations grouped staff into separate teams, and rotated the teams between primary and secondary work sites, with a rotation frequency equal to the incubation period of the disease. The organizations also banned face-to-face inter-group contact during business and non-business hours. The split increased resiliency against the threat of quarantine measures if one person in a team was exposed to the disease.

Impact scenarios

After defining threats, impact scenarios form the basis of the business recovery plan. In general, planning for the most wide-reaching impact is preferable. A typical impact scenario such as "building loss" encompasses most critical business functions. A BCP may document scenarios for each building. More localized impact scenarios – for example loss of a specific floor in a building – may also be documented.

Recovery requirement

After the analysis phase, business and technical recovery requirements precede the solutions phase. Asset inventories allow for quick identification of deployable resources. For an office-based, IT-intensive business, the plan requirements may cover desks, human resources, applications, data, manual workarounds, computers and peripherals.

Other business environments, such as production, distribution, warehousing etc. will need to cover these elements, but likely have additional issues.

Solution design

The solution design phase identifies the most cost-effective disaster recovery solution that meets two main requirements from the impact analysis stage. For IT purposes, this is commonly expressed as the minimum application and data requirements and the time in which the minimum application and application data must be available.

Outside the IT domain, preservation of hard copy information, such as contracts, skilled staff or restoration of embedded technology in a process plant must be considered. This phase overlaps with disaster recovery planning methodology. The solution phase determines:

  • crisis management command structure
  • secondary work sites
  • telecommunication architecture between primary and secondary work sites
  • data replication methodology between primary and secondary work sites
  • applications and data required at the secondary work site, and
  • physical data requirements at the secondary work site.

Implementation

The implementation phase involves policy changes, material acquisitions, staffing and testing.

Testing and organizational acceptance

The purpose of testing is to achieve organizational acceptance that the solution satisfies the recovery requirements. Plans may fail to meet expectations due to insufficient or inaccurate recovery requirements, solution design flaws or solution implementation errors. Testing may include:

  • Crisis command team call-out testing
  • Technical swing test from primary to secondary work locations
  • Technical swing test from secondary to primary work locations
  • Application test
  • Business process test

At minimum, testing is conducted on a biannual schedule.

The 2008 book Exercising for Excellence, published by The British Standards Institution identified three types of exercises that can be employed when testing business continuity plans.

Tabletop exercises

Tabletop exercises typically involve a small number of people and concentrates on a specific aspect of a BCP. They can easily accommodate complete teams from a specific area of a business.

Another form involves a single representative from each of several teams. Typically, participants work through simple scenario and then discuss specific aspects of the plan. For example, a fire is discovered out of working hours.

The exercise consumes only a few hours and is often split into two or three sessions, each concentrating on a different theme.

Medium exercises

A medium exercise is conducted within a "Virtual World" and brings together several departments, teams or disciplines. It typically concentrates on multiple BCP aspects, prompting interaction between teams. The scope of a medium exercise can range from a few teams from one organization co-located in one building to multiple teams operating across dispersed locations. The environment needs to be as realistic as practicable and team sizes should reflect a realistic situation. Realism may extend to simulated news broadcasts and websites.

A medium exercise typically lasts a few hours, though they can extend over several days. They typically involve a "Scenario Cell" that adds pre-scripted "surprises" throughout the exercise.

Complex exercises

A complex exercise aims to have as few boundaries as possible. It incorporates all the aspects of a medium exercise. The exercise remains within a virtual world, but maximum realism is essential. This might include no-notice activation, actual evacuation and actual invocation of a disaster recovery site.

While start and stop times are pre-agreed, the actual duration might be unknown if events are allowed to run their course.

Maintenance

Biannual or annual maintenance cycle maintenance of a BCP manual is broken down into three periodic activities.

  • Confirmation of information in the manual, roll out to staff for awareness and specific training for critical individuals.
  • Testing and verification of technical solutions established for recovery operations.
  • Testing and verification of organization recovery procedures.

Issues found during the testing phase often must be reintroduced to the analysis phase.

Information/targets

The BCP manual must evolve with the organization. Activating the call tree verifies the notification plan's efficiency as well as contact data accuracy. Types of changes that should be identified and updated in the manual include:

  • Staffing
  • Important clients
  • Vendors/suppliers
  • Organization structure changes
  • Company investment portfolio and mission statement
  • Communication and transportation infrastructure such as roads and bridges

Technical

Specialized technical resources must be maintained. Checks include:

  • Computer virus definition distribution
  • Application security and service patch distribution
  • Hardware operability
  • Application operability
  • Data verification
  • Data application

Testing and verification of recovery procedures

As work processes change, previous recovery procedures may no longer be suitable. Checks include:

  • Are all work processes for critical functions documented?
  • Have the systems used for critical functions changed?
  • Are the documented work checklists meaningful and accurate?
  • Do the documented work process recovery tasks and supporting disaster recovery infrastructure allow staff to recover within the predetermined recovery time objective?

See also

Bibliography

Further reading

International Organization for Standardization

  • ISO/IEC 27001:2005 (formerly BS 7799-2:2002) Information Security Management System
  • ISO/IEC 27002:2005 (renumerated ISO17999:2005) Information Security Management – Code of Practice
  • ISO/IEC 27031:2011 Information technology - Security techniques - Guidelines for information and communication technology readiness for business continuity
  • ISO/PAS 22399:2007 Guideline for incident preparedness and operational continuity management
  • ISO/IEC 24762:2008 Guidelines for information and communications technology disaster recovery services
  • IWA 5:2006 Emergency Preparedness

British Standards Institution

Others

  • "A Guide to Business Continuity Planning" by James C. Barnes
  • "Business Continuity Planning", A Step-by-Step Guide with Planning Forms on CDROM by Kenneth L Fulmer
  • "Business Continuity Plan Design, 8 Steps for Getting Started Designing a Plan" By Richard Kepenach
  • "Disaster Survival Planning: A Practical Guide for Businesses" by Judy Bell
  • ICE Data Management (In Case of Emergency) made simple – by MyriadOptima.com
  • Harney, J.(2004). Business continuity and disaster recovery: Back up or shut down.
  • AIIM E-Doc Magazine, 18(4), 42–48.
  • Dimattia, S. (November 15, 2001).Planning for Continuity. Library Journal,32–34.
  • Exercising for Excellence (Delivering successful business continuity management exercises) by Crisis Solutions

External links

Standards organizations

Competency certification ventures