Understanding Amazon Aurora's Multi-AZ Deployment
- Identifying an Availability Zone Code
- Storage Layers vs Server Instances
- What Does Multi-AZ Deployment Provide?
To fully understand what a
Multi-AZ Deployment means for your infrastructure, it's critical to recognize how Amazon Web Services is configured across the globe and thus how it provides the redundancy services no matter your location.
As discussed in the official documentation, the AWS Cloud is made up of a number of
Regions, which are physical locations around the world, such as Oregon, United States; North Virginia, United States; Ireland; and Tokyo.
Region exists a number of separate physical data centers, known as
Availability Zones. Each
Availability Zone is a self-contained facility with its own power, connectivity, and networking capabilities. Most
Regions are home to 2-3 different
Availability Zones each, providing adequate redundancy when necessary within a given
While Amazon is always expanding their
Availability Zone coverage, you may view a current map of the AWS Cloud infrastructure in the image below:
Image courtesy of Amazon Web Services
Availability Zones within a single
Region are connected to one another through private fiber-optic networking, allowing each
Availability Zone to communicate with one another and transfer data quickly and efficiently as required.
Identifying an Availability Zone Code
When creating a new instance through the AWS dashboard, you may be presented with the option to select a specific
Availability Zone, or in many cases simply a
Region and the system will select the
Availability Zone for you.
Regions are labeled by a simple string to present the country and/or sub-region if necessary. For example,
us-west-2 is the designation for the Oregon, United States
us-west-1 is for California, United States.
Availability Zones are designated by following the
Region tag with a letter designation, such as
Storage Layers vs Server Instances
Another important concept to understand in order to grasp what
Multi-AZ Deployments entail is the difference between the
storage layer and the
server instance for your database is best thought of as the physical machine that controls the structure of your database and routes all your data that is contained within the
storage layer is an SSD-backed virtualized representation of all the actual data within your database. The keyword to focus on here is virtualized, which is Amazon's fancy way of saying that the
storage layer which represents the actual data in your system is not attached to any one physical location or machine, but instead is virtualized and propagated to numerous locations (six in total across three
Availability Zones in most cases).
What Does Multi-AZ Deployment Provide?
In nearly all cases using Amazon Web Services, it is standard practice for the
storage layer (where all the data resides) to be redundantly stored across all the
Availability Zones within the given
Region at no extra cost. In the event that one
Availability Zone goes offline for some reason (as unlikely as that might be), the system is already in place to instantly and automatically continue the services of your database through an identical copy of the
storage layer from one of the other connected
However, unless otherwise specified, this redundancy is only applied to the
storage layer, but does not exist for the physical machine of your actual
server instance. If something were to cause the
Availability Zone where your
server instance resides to shutdown, your database would cease to function, as the physical
server instance is offline.
This is where
Multi-AZ Deployment comes in for services like Amazon Aurora. Just like the automatic redundancy of the data in your
storage layer, a
Multi-AZ Deployment means that your
server instance is also redundantly copied across multiple
Availability Zones. For this reason, any Amazon Aurora
Multi-AZ Deployment is assured that should a single
Availability Zone go offline where the physical
server instance machine resides, an automatic failover is initiated onto an up-to-date standby replication in another connected
As discussed in the official documentation, in order to maximize your system's uptime, the failover procedure (which typically only takes 1-2 minutes) will be automatically performed in the case of any of the following events:
- Loss of availability in primary
- Loss of network connectivity to primary
- Compute unit failure on primary
- Storage failure on primary