High Availability and Disaster Recovery 101

In a practical example, centralized lease solutions are used by many OSS/BSS services to perform lease lookups. These can be used for legal, troubleshooting, and ultimately even billing. Being able to perform lease lookups is a critical component in the overall provisioning of services to a subscriber. But what would happen if services are unreachable?

Critical Considerations for Critical Services

It’s every provider’s nightmare — mission critical systems go down either due to technical issues, power outages, or natural disasters. Subscribers are left without services, customer call centres are overloaded with complaints and service requests, and the end results are quickly reflected in the form of increased subscriber churn.

What measurements can be taken so that service availability is ensured during times of technical outages? How does one make their service network bullet proof and scalable?

There are a few ways that operators can enable high availability and disaster recovery strategies to ensure that subscriber service downtime is minimized during technical issues, power outages, or natural disasters.

Replica sets of services

In this scenario, a complete stack of replicated services is run in standby mode so that if the primary service fails this replicated set takes over. There are many solutions that can help operators achieve this. Some systems have created their own native failover mechanisms and some others use open-source solutions such as Pacemaker and Chorosynch.

Clustered database sets

This solution requires replicated databases with an arbiter that decides which is the primary database (to be queried) and which one(s) act as backup servers. In the case that the primary database goes down, the arbiter can then set the secondary server (if there are only two servers in a cluster) or choose a secondary server from a set of servers (if more than two servers are in a cluster) to assume the primary role.

Georedundancy

There are many reasons why it is important to have a georedundant solution. In tropical areas, flooding is an all-too-common occurrence and there may be situations where a datacenter needs to be shut down. In northern climates, winter storms can knock out power. Regardless of the reason, solutions with deployed georedundant architectures allow an operator to continue delivering services when the primary site is offline.

Operators can utilize all three of these methods to secure service availability for their subscribers. This not only ensures that services remain available when outages occur, it also helps avoid increasing calls to the customer help desk to improve customer loyalty.

Contact us to find out more about ensuring high availability and creating strategies for disaster recovery on your network.

  • Share: