Hardware Failover allows a second unit to function in an idle role and act as a backup device. The idle device will assume the active role in the event of failure or loss of connectivity on the active unit.
The two units are connected by a cable for heartbeat communication and should be given the same physical network connectivity so failover is automatic. To achieve this, a hub or switch should be placed on each segment. The diagram shows below different network configurations – one with a single firewall behind an Ecessa pair and the other with redundant firewalls behind an Ecessa pair – and illustrates the need for hubs/switches. Please note that these diagrams are for demonstration purposes only; they reflect the most common configurations but other solutions are possible as long as both units have equal access to the WANs and LANs.
Configuration changes may be replicated from the active to the idle device to ensure both units share the most recent configuration. The idle device monitors the active device’s status, including network connectivity. If the idle device detects it has better access to network resources or if it fails to communicate with the active device entirely it will force hardware failover. During the failover, the idle device will load the configuration, assume the active role, and place the previously active device into idle mode.
Both the active and secondary units will test the heartbeat (connection between the devices) as well as any configured testing IP addresses. If the active device does not respond to a keep-alive query after a specified number of timeouts and the active IP addresses do not appear to be in service, the idle device will trigger the failover and become active.
For versions prior to release 8.4.x, if either unit determines a LAN or Gateway test IP address is dead (reached the number of failed responses) then that unit’s total count of accessible test IP addresses will decrease. If it is determined between the devices that the idle unit has more accessible test IP addresses than the active unit, a failover will occur.
The values for the Detection Interval and Failover After X Timeouts settings are multiplied together to determine failover latency after a failure occurs. If these are configured with values too low it may cause false failovers to occur and setting them too high may result in unnecessary delay.
If the Gateway and LAN testing are not enabled or the units are using 8.4.x or later firmware, failover will only be triggered in the event of hardware failure which is detected by the heartbeat.
Definition of terms
The Primary and Secondary labels are assigned to the Ecessa devices in a hardware failover pair. These labels do not change and are used only to distinguish between the devices. They are not related to the current state of the device or the ability of the device to handle traffic as both are equally capable.
The Active and Idle roles are dynamic and define the current state of the given device. The Active role is used by the device that is currently handling network traffic (regardless if it is the Primary or Secondary). The Idle role refers to the device operating in a hot standby state. The idle device monitors the status of both the active and idle units and will assume the active role if it is determined that it can provide better performance than the currently active device.
In the diagram, it is assumed the “Primary” device is currently in the “Active” state and the “Secondary” device is currently in the “Idle” state.
Configuration
Two Ecessa devices are connected over a failover link (aka the Keep-Alive Port or “heartbeat”) which allows the pair to communicate device status information as well as replication session statistics. Typically the Keep-Alive port will be the highest numbered port but can be any available port.
The following screenshot shows the Hardware Failover page from a PowerLink running on version 8.4, which does not include the LAN or Gateway testing options:
Select the “Enable Hardware Failover” check box to enable the feature. Each pair will have a Primary and a Secondary unit and this designation can be changed with the drop-down located at the top-right corner:
The section beneath these settings reflects the current state and status of each device. When Hardware Failover is enabled and the pair successfully communicate over the failover link, the Hardware Failover status will look similar to the image below:
The next section defines the testing parameters between the units. While it is typically not necessary to alter these settings, testing sensitivity may cause issues such as failovers triggered too quickly or failures are not detected soon enough – both situations causing downtime – so it is important to keep these settings within acceptable thresholds.
By default only the active device is accessible for remote management, however Idle LAN or Idle WAN IP addresses can be entered to assign the idle device its own IP address. Additionally, LAN Testing can be enabled to trigger a failover if the idle unit can successfully ping the LAN Test IP address while the active unit cannot.
The access ports and policies are the same between the devices after a successful replication; however, the user account information (username/password) is not replicated between units so each device will need to be configured individually with the desired login credentials.
Finally, the Keep-Alive Port Settings on the Secondary device will mirror the settings on the Primary device:
In this example, the Primary is using Ethernet port 4 with VLAN 3999 enabled. The IP address assigned to the Primary for keep-alive communication is 100.10.10.1 and it is expecting its peer to use the address 100.10.10.2.
The Secondary unit will need to use Ethernet port 4 with VLAN 3999 enabled. The Local Address on the Secondary unit will have to be 100.10.10.2 with a Remote Address of 100.10.10.1.
Please note: The Keep Alive addresses must be in a subnet that is not already in use for the LAN or WAN.
Testing Failover
Failover can be tested manually using the “Force Failover” button on the Hardware Failover page in the web interface or through the text user interface. Failover will succeed only if the heartbeat connection between the units shows an “UP” status for both the active and idle units.
Failover can also be tested by removing power to the active unit to simulate an outage which should also trigger a failover.
FAQ
What behavior does my bridge have on the idle device?
All bridges are disabled on the idle device. This is to avoid an Layer 2 loop when idle or on failover.
How Fail-To-Wire (FTW) affect Hardware Failover?
It can also cause a Layer 2 loop - FTW should be disabled when using Hardware Failover.
How is Ethernet Bonding affected by Hardware Failover?
Ethernet Bonding is still enabled on the idle. This should not cause any problems, since logically a bond is like any other physical port.
1 Comments