SAN bits

Seeing is believing, knowing is everything

Netapp Metrocluster

Leave a comment

Its been sometime since I contributed on my blog, thought let me contribute on a topic which I have been thinking for a long time and it finally kicked off. Its all about metrocluster.

Metrocluster is one of the features available with netapp for non disruptive operation and for continuous availability.

Metrocluster uses uses both the features of HA and syncmirror.

In the case of a standalone HA cluster, we have redundancy at the controller level,where in one controller fails and the other takes over.

we do not have the redundancy for the shelf failure, in case of a shelf failure we will have to loose the data.

To mitigate this problem we have something called as syncmirror, which will provide resiliency at the shelf level.

let me explain that in detail.

lets assume we have two sites, site 1 and site 2

for the purpose of clarity I am considering only a two node cluster here.

 

Capture1

we could see that aggregate 1 is owned by controller 1 and aggregate 2 is owned by controller 2,

When we create a aggr1 on controller one, immediately we could see a plex0 attribute added to it in a normal scenario( HApair without metrocluster), when we apply syncmirror licence to a aggregate1 we will have the plex1 created for the same aggregate on the partner controller. as seen in the figure above, as aggrplex1 in controller 1. Similarly, we will have aggr2 plex 0 created in controller 2 and aggr2 plex1 created on controller 1( exact mirror). This would create a resiliency for the disk shelves. This is called aggregate level mirroring. In short, aggregate level mirroring is called as syncmirror.

In case of a disaster, if  either of the controller goes down, since they are in HA pair configuration other controller can takeover and serve data. Similarly, in case of a disk shelf failure , since we have sync mirror and since the aggregates are mirrored even if the entire site 1 or 2 goes down the data will continue to be served.

for a metrocluster, we will have exact same configuration of hardware in site 1 and site 2. There cannot be different configuration when we plan to set up a metrocluster.

Now lets get into more details of a metrocluster.

Netapp metrocluster is a licensed feature available with netapp since 7 mode, but since its obsolete now, i would be focusing now only on cluster mode.

Netapp supports 2 node and 4 node clustering from ontap8.3.1. The earlier explanation is about a 2 node cluster. We will now look into 4 node metro cluster in detail.

In a 4 node metrocluster, both the sites are a independent 2 node cluster.  The 2 nodes are a HA pair in each site.  if a single controller fails, site level failover will not happen, as there there is another node in HA to take care. Site level failover happens only in case of a entire site down, where both the controllers are not able to serve data.

Ideally all the disk shelves are connected through sas cables from controllers, in a daisy chain fashion. Since because we are extending this on a metrocluster. We cannot extend with SAS cables as they have distance limitations. To overcome this, we use ATTO fiber gateway.

ATTO fiber bridge is a fiber channel to SAS Gateway as below.

 

Capture2

now lets see the diagram where we have a 4 node cluster with ATTO bridge connected.

SITE 1                                                                                                            SITE 2

Capture5

In the diagram I have highlighted Atto switches in blue and fiber switches in Red, we have four atto switches and 4 fiber switches for a 4 node metrocluster.

The connection from the disk shelves to ATTO bridge is through the SAS cables, from ATTO bridge to controllers is all connected through Fiber cables.

The Left side of the diagram denotes Site1 and the right side of the diagram denotes site2. There is also connectivity for the NVRAM mirroring which takes place through FCVI adapter. The FCVI adapter has dedicated 16Gbps ports for the NVRAM mirroring.

From the above diagram we could see there is no single point of failure for any component.If a site fails completely, the other sites takes over.

The connectivity between the two sites would be through a dark fiber using DWDM( dense wave division muliplex) or MPLS 2. We would be employing long distance transceiver on the switches to accomplish this.

Types of Metro cluster:

There are three types of metrocluster.

  1. Streched metrocluster
  2. Streched metrocluster with ATTO bridge
  3. Fabric metrocluster

In Streched metrocluster, we will use netapp propreitary SAS cables for connectivity. No ATTO bridges are used and the maximum distance limitation is 500m

Streched metrocluster with ATTO bridge is the same, except that we use additionally ATTO bridges for connectivity.

Fabric metrocluster is the one which we have discussed earlier as in the diagram 4 node cluster, with a distance limitation of 200KM. We can also enable other netapp efficency in the nodes like dedup,compression and thin provision this would be automatically be available on the secondary site. We can also have  snapmirror to other locations from the metrocluster sites.

See the list of recovery methods available in case of failure of components in a 2 node and 4 node cluster.

Capture6

For a automatic fail over to the site, we have a free tool given by netapp called metrocluster tiebreaker software, this can  be installed on any redhat linux machines. This would constantly monitor for the SSH timeouts to the cluster nodes, if both the cluster nodes are not reachable, then it would make a fail over to the secondary site. This has a RTO of 120 seconds. In a ideal scenario, failover would be done manually.

Ontap9.0 now supports upto 8 nodes in metrocluster. I do not want to create a diagram for a 8 node cluster here as that would be clumsy.  I have excluded a lot of paths in the diagram for clarity. Please let me know your comments and thoughts about metrocluster on the comments session below. Do click on the follow button to get instant updates as i publish my blogs.

 

 

 

 

Author: kumaraysun

Loves SAN, Learns and works mainly on netapp

Leave a comment