What’s New in Cloudera Director 2.4?

Cloudera Director 2.4 improves support for long-running clusters by syncing with upgrades and topology changes via Cloudera Manager, and adds support for Spark 2 and Kudu. Cloudera Director along with CM and CDH5.11 adds support for Microsoft Azure Data Lake Store (ADLS), and pausing of clusters with Amazon EBS volumes.

Cloudera Director helps you deploy, scale, and manage Apache Hadoop clusters in the cloud of your choice. Its enterprise-grade features deliver a reliable mechanism for establishing production-ready clusters in the cloud for big-data workloads and applications in a simple, reliable, automated fashion.

Cloudera Director Overview

In this post, you will learn about new functionality in release 2.4, but first, if you’re new to Cloudera Director, let’s visit what it does.

  • On-demand creation and termination of clusters: Using Cloudera Director, you can allocate and configure Cloudera Manager instances and highly available CDH clusters in the cloud provider of your choice. A single Cloudera Director instance can manage multiple cloud provider environments and the separate lifecycles of multiple Cloudera Managers and clusters. Cloudera Director lets you configure Cloudera Manager and cluster services like Hive to use databases hosted on external database servers that you maintain yourself or that Cloudera Director provisions for you through AWS Relational Database Service (RDS).
  • Multi-cloud support. Cloudera Director supports creating clusters in Amazon Web Services (AWS), Microsoft Azure, and Google Cloud Platform through its cloud provider plugin architecture. A single Cloudera Director instance can work with multiple cloud providers at once. Because the plugin specification is open source, you can create a plugin to support other providers, either in-house or public.
  • On-demand grow and shrink of clusters: One of the main benefits of running Hadoop clusters in the cloud is being able to provision additional instances when demand increases, and to terminate instances when demand decreases. Cloudera Director, in concert with Cloudera Manager, does the work required to add new instances to and remove existing ones from your Hadoop clusters.
  • Programmatic and repeatable instantiation of clusters: Cloudera Director can consume cluster definitions specified in HOCON configuration files submitted through the Cloudera Director CLI or in JSON input sent to the Cloudera Director API. The flexibility and rich feature set of these input formats let you tailor Hadoop clusters to your needs. A cluster definition can include custom scripts to run after instance provisioning and cluster setup, or before cluster termination, to perform tasks like installing additional packages, configuring system settings, or saving off important data. Java and Python clients make it easy to work with the Cloudera Director API.
  • Usage-based billing for Cloudera services: Usage-based billing can help you optimize your expenditures for transient clusters. With a pay-as-you-go billing ID from Cloudera, you can use your Cloudera Enterprise license as usual, but you are only charged for CDH services when they are running.
  • Security: Cloudera Director, like other Cloudera offerings, is committed to enabling secure deployments and applications. Cloudera Director’s own database is automatically encrypted, and Cloudera Director helps you configure Cloudera Manager and CDH clusters with Kerberos authentication, as well as deploy Cloudera Navigator for auditing, data lineage, and data discovery.
  • Powerful web user interface: Cloudera Director’s user interface provides a single dashboard to assess the health of all your clusters across all cloud providers and all Cloudera Manager deployments. It can also be used to bootstrap new clusters, grow and shrink existing clusters, and terminate clusters that are no longer needed. Exploring the web user interface is a great stepping stone to using the configuration file or API to deploy production-ready clusters.

What’s New in Cloudera Director 2.4?

New Features and Improvements in Cloudera Director 2.4

Cloudera Director strives to enable you to deploy clusters on the cloud provider of your choice and for a variety of workloads and cluster lifecycles.

One of Cloudera Director 2.4’s key features is improved support for long-running clusters. User workflows with long running clusters typically include actions such as upgrading CDH, upgrading Cloudera Manager, changing the topology of a cluster by adding or removing services and reconfiguring clusters via Cloudera Manager. Cloudera Director 2.4 supports such cluster modifications via Cloudera Manager 5.11 and above. This example timeline of actions is now supported:

  1. Set up a 20 node Impala cluster via Cloudera Director with Kerberos, Sentry, and AD to handle multi-user Analytic DB workloads
  2. Change Impala resource management configuration via Cloudera Manager
  3. Grow the cluster to 40 nodes via Cloudera Director to handle additional workload (and/or decrease cluster nodes as needed to match demand)
  4. Upgrade the cluster via Cloudera Manager to a future version of CDH
  5. Clone the cluster via Cloudera Director for a different tenant

Cloudera Director 2.4 adds support for Spark2 and Kudu. Cloudera Director now supports specifying CSDs, which is necessary to configure Spark2 in Cloudera Manager. Both Spark2 and Kudu are shipped as separate parcels from CDH. You can refer to this example config file to see how to deploy Spark2 and Kudu.

There are two notable new features in the CDH platform that influence the cluster lifecycle and workload choices in the cloud using Cloudera Director:

  • Microsoft Azure Data Lake Store (ADLS), Azure’s object store, is now supported when using Cloudera Manager and CDH 5.11. ADLS provides cost-effective storage for transient cluster lifecycles.
  • Support for pausing a cluster when deployed with AWS EBS volumes: Cloudera Director 2.3 added support for deploying clusters in AWS with EBS volumes (vs local storage). This allowed for use of increasingly popular and cheaper instance types such as the m4 and c4 series. You can now pause these clusters when there are no incoming workloads by pausing the underlying EC2 instances, and start them back up at a later time as described in Pausing a Cluster in AWS.

Using Cloudera Director

If you’re ready to give the latest version of Cloudera Director a try, here are the ways you can get started.

Vinithra Varadharajan is the Engineering Manager for Cloudera Director


Leave a Reply

Your email address will not be published. Required fields are marked *