Slides about High Availability and Disaster Recovery. The Pdf explores HA and DR mechanisms, including full, differential, and incremental backup types. This University-level Computer Science material provides a clear overview of operational continuity and data recovery, useful for self-study.
See more59 Pages


Unlock the full PDF for free
Sign up to get full access to the document and start transforming it with AI.
High Availability and Disaster Recovery DOMAIN 3.0 MODULE 12 21 23High Availability and Disaster Recovery Topics HA and DR Concepts High Availability Mechanisms Disaster Recovery Mechanisms Facility and Infrastructure SupportHA and DR ConceptsHigh Availability 20/9 A system, network, or service that is continuously operational for a desirable length of time Availability is measured in "9s"
High Availability Mechanisms include:
Capability of a system or network to provide uninterrupted service if one or more of its components fail No single point of failure
One of many metrics used to evaluate the reliability of a manufactured product
Examples of FRUs that should be replaced, not repaired:
Refers to repairable devices . How long the device/system is expected to function until its first failure . How long after first repair before device is expected to fail again Can (hopefully) be extended by proper maintenance Estimates only, but important in planning, implementation, maintenance, and future plans
How long it will take to repair a device, system, or component that is down and bring it back online Assumption is that the device/system can be repaired Critical metric in planning data center/cloud/system configuration and future configurations
Time Between Failures Time to Repair Time to Failure System Failure Resume Normal Operations System Failure
When restoring a system, the maximum allowable time that can elapse before the system is available again Include Try to be back online at this time Repair Time
When recovering a system, the level of original functionality to be restored before bringing the system back online Sometimes data is lost so you cannot fully recover the system to its original state ◦ Sometimes you sacrifice level of recovery in the interest of quickly making the system available again RPO is often used in database recovery ◦ Identifies the last saved transaction to be restored before the database is made available again Any transactions after the RPO will have to be manually re-entered ◦ System had this much functionality / data before going down Functionality / Data Level Acceptable loss Restore this much before bringing system back online
Two or more systems simultaneously provide the same service If one node fails, the other node(s) continue to provide service Especially good resilience against denial-of-service attacks All systems have their own IP address, but share a common virtual IP address Clients connect to the virtual IP address Systems do NOT share a common database/data files Systems are typically "front end" web sites that "point" to a common back end database server Can be hardware or software solution
Load Balancing Cluster All front end webserver nodes active Back End Database Server 2 192.168.1.20 3 1 192.168.1.30 Virtual IP 192.168.1.10 Client 4 192.168.1.40
A generic term for any redundant network path Can refer to:
Server 10.10.10.21 10.10.20.21 10.10.10.0/24 10.10.20.0/24 VLAN1 Switch VLAN2 10.10.10.51 10.10.20.51 SAN A 10.10.10.52 10.10.20.52 SAN B Core Redundant Path to Core Distribution Access A B
Network Interface Card teaming combines multiple NICs/connections to create a single "link" Aggregates bandwidth ◦ ◦ Increases performance ◦ Provides fault tolerance Also known as aggregation, balancing, and bonding
Both Sides Redundant hardware acting together as a unit Two or more systems provide a single service ◦ Systems typically share a common database/files ◦ All systems have their own IP address, but also share a common virtual IP address ◦ Clients connect to the virtual address ◦ Active/Passive: One system is active ◦ The other system is in standby (passive) mode ◦ Passive system listens to the "heartbeat" of the active system ◦ ◦ If it stops hearing the heartbeat, the passive system takes control of the data/database/service Active/Active: Both systems are active ◦ Each system is the primary provider for a different service (e.g. SQL and Exchange) ◦ Each system acts as backup for the other ◦ ◦ Either system can take over both services
Node IP 192.168.1.20 Active Node One side of a cord Shared Storage 192.168.1.10 Cluster IP Client 192.168.1.30 Node IP Passive Node next slide on the other
on other Node IP 192.168.1.20 - App 1 Active Node App 2 Passive Node App 1 App 2 192.168.1.10 Cluster IP Client App 2 192.168.1.30 Node IP App 1 App 2 Active Node App 1 Passive Node Side of previous shared Storage card
Potential Single Points of Failure Access Distribution : Core 119977Too Much Redundancy!
Logical aggregation of Ethernet switch ports Used to increase the bandwidth of a "single" link Commonly used in uplinks / trunk links Also referred to as EtherChannel Two common methods: Cisco proprietary PAgP ◦ ◦ Vendor-neutral LACP (IEEE 802.3ad / 802.1ax)
- Room on Card for previous Physical View Multiple ports defined as part of an EtherChannel Group Logical View Different subsystems running on the switch see only one large link slide?
You can cluster routers First Hop Redundancy Protocols (FHRP) ◦ A class of mechanisms that allow default gateway redundancy Virtual Router Redundancy Protocol (VRRP) Standards-based FHRP ◦ ◦ Active and Standby routers are organized into a Standby group They share a virtual IP and virtual MAC ◦ Active router is configured with a higher priority so it is preferred ◦ ◦ The standby router has a lower priority, but can take over for the active at any time Cisco has proprietary FHRPs as well: HSRP - very similar to VRRP ◦ ◦ GLBP - like HSRP but also allows Active - Active load balancing side 1 Internet - VRRP Virtual IP Virtual MAC Active Standby 1 Side 2 Client Default Gateway: Virtual IP, Virtual MAC
The routing protocol must decide the best path More redundancy = more fault tolerant = more expensive 1 1 1 !! 1 1 I L 1 1 1 I 1 1 1
ISP
Configure ISP2 as a standby link Or load balance between the two ISP 1 ISP 2
Protect against hardware failures, data loss, corruption, disasters (manmade or natural) etc. Backup Types
X wbadmin - [Windows Server Backup (Local)\Local Backup] File Action View Help Windows Server Backup (L Local Backup Local Backup 1 Actions Local Backup You can perform a single backup or schedule a regular backup using this application. Backup Schedule ... Backup Once ... No backup has been configured for this computer. Use the Backup Schedule Wizard or the Backup Once Wizard Messages (Activity from last week, double click on the message to see details) Time Message Description i 3/8/2018 10:55 PM Backup Successful ? Help Status Last Backup Next Backup AI Status: Successful Status: Not scheduled To Time: 3/8/2018 10:55 PM Time: La View details OI + View details Recover ... Configure Performanc ... View A V
The most basic and complete type of backup Backs up all selected data to another set of media cloud, network share, local disk, tape ◦ Provides a foundation for the other backup types Changes the file's archive bit Longest backup time Shortest restore time Full backup on Sunday
Differential Backup Copies all data changed since last full backup Does not change the archive bit Can be thought of as a "running backup" Typically get larger over time until the next full backup Backup takes longer each day as the week goes by Restore takes less time as you only need the full plus the latest differential Fri Thu Mon Tue Mon Wed 1 Mon I Mon Mon Full backup on Sunday
Copies only the data that has changed since the last full or incremental backup Restore requires the full backup plus all subsequent incremental backups Changes the archive bit Fastest type of backup Longest restore You will have to restore the full ◦ Plus every differential in order ◦ Takes up less storage space Incremental Backup Fri Thu Thu Wed Wed Wed Tue Tue Tue Tue Mon Mon Mon Mon Mon Full backup on Sunday
Own Side Windows operating system baseline Used to restore a virtual machine to a previous state An image of the state of a VM at a point in time Typically requires the base image plus the snapshot(s) As with backups, you can revert to the snapshot of your choice SP1 IE base 1 You Are Here Firefox base 1 SP2 IE base2 Firefox base2 SP = Snapshot Should not be your only backup solution IE base Firefox base