How-to: Deploy a Secure Enterprise Data Hub on Microsoft Azure – Part 2

In Part 1 of the blog, we covered all the prerequisites  needed to deploy a CDH cluster on the Microsoft Azure cloud platform. In Part 2, we will cover the resources required on the Azure platform and actually deploy a cluster with Cloudera Director.

Cloudera Director Use Case

Cloudera Director simplifies cluster creation and lessen the time to an operational cluster on the cloud. It’s a great tool for running POCs in your organization. It’s also ideal for transient workloads in the cloud, where the exact compute resource requirementsresources are unknown. 

Microsoft Azure Portal

I recommend reading the Cloudera Enterprise Reference Architecture for Azure Deployments and the Cloudera Director Getting Started on Microsoft Azure for recommendations and best practices.

The “Getting Started” document includes instructions for all the configurations and information required for Cloudera Director to deploy a cluster on the Azure Portal.

First, you will need to gather these four pieces of Azure credentials information for Cloudera Director:  Subscription ID, Tenant ID, Client ID, and Client Secret. Secondly, the following resources are also needed to configure Cloudera Director:

    • Resource Group: morantus-rg
    • Network Security Group (NSG): morantus-nsg
    • Virtual Network (Vnet): morantus-vnet
    • Availability Sets:
      • availmgmt: Cloudera Director
      • availedge: Cloudera Manager
      • availmgmt2: Cluster Management nodes
      • availworker: Cluster DataNodes

The values in bold are the name of Azure resources I have configured in my Azure Portal. These resources are explained in full details in the RA documentation.

Active Directory

As discussed in Part 1, in order to join the nodes in the cluster to Active Directory Domain Service and create/update DNS entries in DNS Manager, we need some configurations in place to support our deployment.

The following are required on the AD side:

  • Properly configured DNS Server
  • Privilege user account to join nodes to the ADCS and Create/Update DNS entries
  • Cloudera Manager user account to create Kerberos Service Principal Names (SPNs)

DNS Server

As you may recall, the Azure Cloud service does not provide reverse DNS natively. You must configure your own DNS server such as BIND or AD. Active Directory is what I am using for my DNS.

In addition to the DNS configurations, there are user accounts and OU setups that are also required to be configured in AD. If you need a refresher, you can jump right into this blog, which covers similar operations when deploying to AWS.


Similarly to Cloudera Manager, Cloudera Director also relies on a RDBMS back-end to store all the metadata for the cluster. In this deployment, I am using a preconfigured MySQL Database for this purpose. For more on MySQL configuration, go to Cloudera Documentation.

Cloudera Director Virtual Machine

We are now ready to provision a VM to install Cloudera Director. Step-by-step instructions are available here.  Pay close attention to the section where you specify a username for your VM. This account will be used to SSH to all nodes, and will also be the owner of resources provisioned by Cloudera Director to create the cluster.

In my deployment, I used “azuredirectoradmin” as the Azure portal privileged account for my VM and the cluster. Here are some key specs for my VM:

  • VM size: D3
  • OS Image: Cloudera CentOS 7.2
  • Storage: Standard
  • Public IP

In this deployment I elected to assign a Public IP for this VM. In an environment where a VPN or ExpressRoute connection is available, it’s not recommended to assign a Public IP to any nodes in your cluster. We recommend that all master and worker nodes are placed in the same Network Security Group (NSG).

Configure VM to Install Director

In this section, we will configure the Cloudera Director VM to join the AD domain and create DNS entries for both forward and reverse DNS with FQDN resolutions. These steps were explained in detail in Part 1 of this series. Download, review, and run these scripts from Github.

Login to your VM with the appropriate account and copy the downloaded files to the tmp directory.  Switch to the “root” user to execute script, which also calls script. This separation is done on purpose to show which script accomplishes which function. The script does the AD join, SSSD, and SAMBA configurations.  The maintains the DNS records. There are two very important modifications to this script to allow secure DNS updates to the AD DNS Manager:

  • Added to acquire a Kerberos ticket to update DNS entries in AD

kinit kt /etc/krb5.keytab “$princ”


  • Added -g to initiate a secure update to AD DNS

nsupdate g “$nsupdatecmds”


How-to: Deploy a Secure Enterprise Data Hub on Microsoft Azure – Part 2A successful run should look like below

How-to: Deploy a Secure Enterprise Data Hub on Microsoft Azure – Part 2

Java 8 and Cloudera Director Installation

Cloudera Director installs Java 7 by default. We will deploy this cluster using Java 8 instead. The following steps must be completed as the root user or a user with sudo privileges.

Install Java JDK 8 and JCE Policy file for encryption. Oracle requires that you acknowledge you have read and accepted the Oracle license terms:

wget nocookies nocheckcertificate header “Cookie: oraclelicense=accept-securebackup-cookie” “” O /tmp/jdk8u60linuxx64.rpm
yum y install /tmp/jdk8u60linuxx64.rpm
wget nocheckcertificate nocookies header “Cookie: oraclelicense=accept-securebackup-cookie” -O /tmp/
unzip /tmp/ d /tmp/
unalias cp
cp f /tmp/UnlimitedJCEPolicyJDK8/*.jar /usr/java/*/jre/lib/security/


Download and install Cloudera Director

wget “” O /etc/yum.repos.d/clouderadirector.repo
yum clean all
yum y install clouderadirectorserver clouderadirectorclient


Configure Director to use your existing MySQL database. Modify these values to your own environment:

vi /etc/clouderadirectorserver/
lp.database.type: mysql
lp.database.username: director
lp.database.password: director
lp.database.port: 3306 director


Start Cloudera Director.

service clouderadirectorserver start


If you see any errors, view the log file located at /var/log/cloudera-director-server/application.log

Login to the Director UI and accept the license agreement. Use admin/admin as credentials http://director_publicIP:7189

Note: after login, it’s a good idea to change the default password by clicking the dropdown next to the ‘admin’ user name in the upper right corner of the screen.




How-to: Deploy a Secure Enterprise Data Hub on Microsoft Azure – Part 2

Deploy Cluster with Cloudera Director

The remaining steps must be executed as the Cloudera Director admin user you created earlier. In my case, that’s the “azuredirectoradmin” account. All resources created by Cloudera Director in the Azure Portal will be owned by this account. The “root” user is not allowed to create resources on the Azure Portal.

First, we’ll need to create a SSH key as the “azuredirectoradmin” user on the VM where Cloudera Director is installed. This key will be added to our deployment configuration file, which will be added on all the VMs provisioned by Cloudera Director. This will allow us to use passwordless SSH to the cluster nodes with this key.

Create ssh key (Do not enter a passphrase, keep all defaults)

sshkeygen f ~/.ssh/director_azure_vm_key t rsa


Configure Cloudera Director Configuration file

Download and inspect configuration file from Github, which we will use to create the cluster with Cloudera Director. There are a few sections I want to point out.

  1. Cloud Provider

Here you specify your Azure credentials information such as subscriptionID, tenantID, clientID, and Client Secret. Visit this Azure resource and/or your

Azure Portal administrator for this information

             2.  SSH login key

Add the SSH created earlier for passwordless login to the cluster



              3.  Instance Templates

Define the VM profiles to use for each node type such as management, worker, and edge nodes. This is also the section where you specify all the resources created earlier in the Azure Portal like the ResourceGroup, Network Security Group, etc.

               4.  Bootstrap-script

In this section, we combine the same scripts used to prepare the Cloudera Director VM earlier. We also join each VM created by Cloudera Director to the AD domain and create forward and reverse DNS entries for FQDN name resolutions. Finally, we manually installed JAVA 8 on all the nodes.

Note: I am installing the MySQL JDBC Driver on all the nodes as well. The driver is required for all services that are backed by my preconfigured MySQL database. There’s an option to have Cloudera Director automatically create databases for you. As part of that process, it will install a version of the MySQL JDBC Driver for you. I like to control this process, so I do this installation manually.

              5.  databaseServers

Specify the database instance information where Cloudera Director would automatically create the databases for you on the fly.

              6.  cloudera-manager

In this section, you specify all the configurations for Cloudera Manager. Since this is a secured deployment with authentication enabled, we will define our username and password to connect to Active Directory to create the Kerberos Service

Principals to secure our cluster. This step will take care of the integration with Active Directory Kerberos service.

Instruct Cloudera Manager to NOT install JAVA 7

javaInstallationStrategy: NONE


Connect to AD

krbAdminUsername: “cloudera-scm@CLOUDERA.MORANTUS.COM”
krbAdminPassword: “PASSWORD”

Instruct Cloudera Manager to install the JCE policy files on the cluster.

unlimitedJce: true


Note: This should only be enabled if allowed in your country or jurisdiction.

Active Directory Details

KDC_TYPE: “Active Directory”
KRB_ENC_TYPES: “aes256-cts aes128-cts rc4-hmac”
AD_KDC_DOMAIN: “ou=serviceaccounts,ou=prod,ou=clusters,ou=hadoop,dc=CLOUDERA,dc=MORANTUS,dc=COM”


              7.  Cluster

Here we specify all the configurations for the cluster:

Services to install, Enable HDFS High Availability, Database details for dependent services, etc. There are more example configuration files on the Cloudera Director github page.

At this point you are now ready to create your cluster with Cloudera Director using the configuration file. Use the default username and password “admin/admin”.

Create the cluster

clouderadirector bootstrapremote /home/azuredirectoradmin/config/ lp.remote.username=admin lp.remote.password=admin lp.remote.hostAndPort=localhost:7189



How-to: Deploy a Secure Enterprise Data Hub on Microsoft Azure – Part 2

If you run into any error, restart the deployment or if you want to delete your cluster, you can terminate it by running:

clouderadirector terminateremote /home/azuredirectoradmin/config/ lp.remote.username=admin lp.remote.password=admin lp.remote.hostAndPort=localhost:7189


A successful deployment will look like the following:

Azure Portal with all nodes

How-to: Deploy a Secure Enterprise Data Hub on Microsoft Azure – Part 2Cloudera Director Dashboard

How-to: Deploy a Secure Enterprise Data Hub on Microsoft Azure – Part 2

Cloudera Manager Dashboard

How-to: Deploy a Secure Enterprise Data Hub on Microsoft Azure – Part 2Verify Kerberos is Enabled

How-to: Deploy a Secure Enterprise Data Hub on Microsoft Azure – Part 2


You should now be able to configure the Azure Portal, provision a VM for Cloudera Director, join your VMs to an AD domain, create DNS entries in AD DNS server, and provision a Kerberized cluster with Cloudera Director using configuration files.

James Morantus is a Senior Solution Consultant at Cloudera


Leave a Reply

Your email address will not be published. Required fields are marked *