Clustering the vRO Cluster Database using SQL Server (PoC) (SKKB1027)

It's only fair to share...Share on FacebookTweet about this on TwitterShare on LinkedInShare on Google+Digg thisPin on PinterestShare on TumblrPrint this pageEmail this to someone

In this post we will demonstrate in a PoC how to configure and a Microsoft SQL Server Cluster to provide a highly available database instance and use it in a VMware vRealize Orchestrator (vRO) cluster server.

 

Lab Environment

The logical design of this lab can be seen HERE.

 

Introduction

In this post we will demonstrate in a PoC how to configure and a Microsoft SQL Server Cluster to provide a highly available database instance and use it in a VMware vRealize Orchestrator (vRO) cluster server. I would like to note that at this point clustered instance of SQL Server database for use with vRO is not officially supported by VMware or at least I could not find information supporting this. This is just a PoC and testing done here might turn out to be insufficient for a production deployments.

The results are free for interpretation for anyone to determine if SQL Clustering is a viable option.

The Setup

For the purpose of this PoC we will be using the following MS SQL and vRO versions:

  • 2x Node Microsoft Failover Cluster with Windows Server 2012 R2 SP1 x64
  • 2x Node Microsoft SQL Server 2014 x64 Cluster
  • 2x Node vRO 7.0.1 Appliances in a Cluster (Active/Active & Active/Passive)

Setting up the Windows Cluster

The Microsoft Windows Failover Cluster (WFC) setup is straight out of the box setup with nothing special.
In this PoC I’ve used an iSCSI Luns which are presented to all nodes to hold resources. I’m using 3 LUN’s : one for the Disk Witness, one for the clustered MSDTC service and one for the SQL Server instance files.

In addition to the default setup I’ve:

  • Added the MSDTC Server as a Role and thus clustering it in the WFC. This is not 100% necessary in this use case , but it’s good to have it clustered. I haven’t mapped the MSDTC Clustered service to the clustered SQL Instnace , but that can also be done.
  • Configured the MSDTC service for Network DTC Access on all nodes.

For more information on when to cluster the MSDTC service and when to map it to a clustered SQL server instance, visit:  MSDTC Recommendations on SQL Failover Cluster

For additional reading, visit Not-Supported: AGs With DTC/Cross-Database Transactions

I’m not going to go step-by-step through the setup there are a lot of references from Microsoft on how to do this.
Following screenshots summarize the installation and configuration.

Setting up the SQL Cluster

The Microsoft SQL Cluster setup is again straight out of the box setup with nothing special. In this PoC I’ve clustered one instance of SQL Server.
For more information on the setup prerequisites and process, visit SQL Server Failover Cluster Installation & Before Installing Failover Clustering,

In addition to the default cluster setup I’ve:

  • Created a new empty vRO databse
  • Enabled ALLOW_SNAPSHOT_ISOLATION and READ_COMMITTED_SNAPSHOT 

Following screenshots summarize the installation and configuration.

Setting up the vRO Cluster

The VMware Orchestrator cluster setup is again straight out of the box setup with nothing special.

For more information visit the official product documentation at VMware vRealize Orchestrator Documentation

In addition to the default vRO cluster setup I’ve changed the number of active nodes of the cluster b/n 1 & 2 to test an Active/Passive and Active/Active configurations.

Following screenshots summarize the installation and configuration.

 

The Tests & The Results

These are the main tests I’ve performed:

  • Test planned failover of a SQL Server Cluster Node.
  • Test failure of a SQL Server Cluster Node.
  • Test failure of a vRO Cluster node in Active/Passive setup
  • Test failure of a vRO Cluster node in Active/Active setup
  • Monitor connection from the Orchestrator client during all test.
  • Test using the Locking test WF that is part of the OOB vRO library.

For the purpose of these tests I’ve buid a very simple Cluster DB Test WF that increments a number from 0 to 300. Each time it does this the WF creates a couple of System Logs (stored on vRO Appliance and seen on Logs tab) and a couple of Server Logs (stored on DB and seen on the Events tab). I’ve split the action in multiple scriptable tasks. I’ve done this because each time a scriptable task executes and finishes it reads and writes to the DB. I wanted to have a constant communication to the DB.

Following screenshots summarize the WF and configuration.

I’ve performed multiple tests like

  • vRO Active/Passive: Start the workflow on Node 1 and failover the SQL DB.
  • vRO Active/Passive: Start the workflow on Node 1 and simulate failure of the SQL DB.
  • vRO Active/Passive: Start the workflow on Node 1 and simulate failure of the active vRO Node
  • vRO Active/Active: Start the workflow on Node 1 and failover the SQL DB.
  • vRO Active/Active: Start the workflow on Node 1 and simulate failure of the SQL DB.
  • vRO Active/Active: Start the workflow on Node 1 and simulate failure of the active vRO Node
  • vRO Active/Active: Start the workflow on Node 1 & 2 and failover the SQL DB.
  • vRO Active/Active: Start the workflow on Node 1 & 2 and simulate failure of the SQL DB.
  • vRO Active/Active: Start the workflow on Node 1 & 2 and simulate failure of the active vRO Node

All tests delivered pretty much same results:

  • SQL Cluster node failure/failover: WF resumes in ~20 seconds and continues.
  • vRO Node failure: WF resumes in ~120 seconds.
  • vRO Client does not loose connection to the vRO Server. It needs only refresh (F5) after SQL node failure/failover.

Note: This is a 2 layer nested virtual environment. These results might be a bit faster in a real life deployment.
Couple of settings affect the time which take for the SQL/WFC to failover the database and for the vRO Cluster to resume workflow execution.

From SQL/ WFC Cluster side this is the cluster Heartbeat Delay and Threshold setting values.
For more information about these values , visit Tuning Failover Cluster Network Thresholds

From vRO Cluster side this is the cluster Heartbeat Interval (in seconds) and Number of failover hearbeats setting values. These can be set via the Control Center interface, under the Orchestrator Node Settings tab (as shown on one of the screenshot)

After vCO 5.5.3 , Orchestrator has more resiliency and tolerance against disruptions of the communication to the database.

Following are screenshots form some of the tests performed.
(Failure/Failover of SQL Node – WF resume after ~20 sec. )

(Failure of vRO Node – WF resume after ~120 sec. After the 30th increment , where the failure happens, the WF execution continues on the other node)

 

 

Final Step

If all went well, go grab a beer.

It's only fair to share...Share on FacebookTweet about this on TwitterShare on LinkedInShare on Google+Digg thisPin on PinterestShare on TumblrPrint this pageEmail this to someone

Leave a Reply

Your email address will not be published. Required fields are marked *