Cloudera Director helps you deploy, scale, and manage Apache Hadoop clusters in the cloud of your choice. Its enterprise-grade features deliver a reliable mechanism for establishing production-ready clusters in the cloud for big-data workloads and applications in a simple, reliable, automated fashion.
Cloudera Director Overview
In this post, you will learn about new functionality in release 2.3, but first, if you’re new to Cloudera Director, let’s revisit what it does.
- On-demand creation and termination of clusters: Using Cloudera Director, you can allocate and configure Cloudera Manager instances and highly available CDH clusters in the cloud provider of your choice. A single Cloudera Director instance can manage multiple cloud provider environments and the separate lifecycles of multiple Cloudera Managers and clusters. Cloudera Director lets you configure Cloudera Manager and cluster services like Hive to use databases hosted on external database servers that you maintain yourself or that Cloudera Director provisions for you through AWS Relational Database Service (RDS).
- Multi-cloud support. Cloudera Director supports creating clusters in Amazon Web Services (AWS), Microsoft Azure, and Google Cloud Platform through its cloud provider plugin architecture. A single Cloudera Director instance can work with multiple cloud providers at once. Because the plugin specification is open source, anyone can create a plugin to support other providers, either in-house or public.
- On-demand grow and shrink of clusters: One of the main benefits of running Hadoop clusters in the cloud is being able to provision additional instances when demand increases, and to terminate instances when demand decreases. Cloudera Director, in concert with Cloudera Manager, does the work required to add new instances to and remove existing ones from your Hadoop clusters.
- Programmatic and repeatable instantiation of clusters: Cloudera Director can consume cluster definitions specified in HOCON configuration files submitted through the Cloudera Director CLI or in JSON input sent to the Cloudera Director API. The flexibility and rich feature set of these input formats let you tailor Hadoop clusters to your needs. A cluster definition can include custom scripts to run after instance provisioning and cluster setup, or before cluster termination, to perform tasks like installing additional packages, configuring system settings, or saving off important data. Java and Python clients make it even easier to work with the Cloudera Director API.
- Usage-based billing for Cloudera services: Usage-based billing can help you optimize your expenditures for transient clusters. With a pay-as-you-go billing ID from Cloudera, you can use your Cloudera Enterprise license as usual, but you are only charged for CDH services when they are running.
- Security: Cloudera Director, like other Cloudera offerings, is committed to enabling secure deployments and applications. Cloudera Director’s own database is automatically encrypted, and Cloudera Director helps you configure Cloudera Manager and CDH clusters with Kerberos authentication, as well as deploy Cloudera Navigator for auditing, data lineage, and data discovery.
- Powerful web user interface: Cloudera Director’s user interface provides a single dashboard to assess the health of all your clusters across all cloud providers and all Cloudera Manager deployments. It can also be used to bootstrap new clusters, grow and shrink existing clusters, and terminate clusters that are no longer needed. Exploring the web user interface is a great stepping stone to using the configuration file or API to deploy production-ready clusters.
Get a quick overview of all clusters running on the cloud provider
Easily drill down to each cluster to see Hadoop- and cloud-specific details
New Features and Improvements in Cloudera Director 2.3
The first thing you may notice when using Cloudera Director 2.3 is the refreshed look and feel of the user interface. This is the initial part of Cloudera’s ongoing effort to improve the appearance and workings of the user interfaces for its enterprise and cloud products. A common look and feel makes it easier for users to work across all of the products, and helps Cloudera deliver better-tested, better-working interfaces.
When working with clusters in a cloud provider, you have the choice of supporting long-running clusters, which are always running and available, or transient clusters, which are created and destroyed in accordance with demand. Adopting transient clusters opens up opportunities to save money: not only are instances no longer idle when there’s no work to do, but also the instances can be drawn from cheaper instance classes that are designated by your cloud provider to be temporary. AWS Spot instances are one kind of temporary instance, available for a much lower cost than standard instances. Normally, the lifetime of a Spot instance is somewhat unpredictable, subject to price changes in the Spot market, but fixed-duration Spot instances, also known as Spot blocks, let you allocate Spot instances that are guaranteed to survive for the duration you choose. The newest version of the AWS plugin for Cloudera Director enables you to specify a Spot duration in instance templates. Having a known lifetime for Spot instances makes it easier to take advantage of them in your clusters.
Working effectively with transient clusters requires speed in cluster creation, so that those ready with workloads to run don’t have to wait long. Cloudera Manager 5.10, released simultaneously and installed by default with Cloudera Director 2.3, includes speed improvements in the processes it uses to bring up a cluster, and Cloudera Director 2.3 automatically takes advantage of them for clusters deployed on cloud providers.
Cloudera Director 2.3 has an expanded ability to run custom scripts as clusters are bootstrapped. As before, each instance provisioned by Cloudera Director can be configured with a script to be run on it after it starts up. Now, additional scripts can be specified to run under the following conditions:
- On a Cloudera Manager instance after Cloudera Manager is deployed
- On each cluster instance after cluster bootstrap is complete
- On an arbitrary cluster instance after cluster bootstrap is complete
These new custom scripts can be configured using configuration files or the Cloudera Director API, and they can be provided directly in the configuration or by filesystem paths.
A popular reason to run custom scripts is to secure instances, and it always helps to be able to run on the latest operating system versions. Cloudera Director now supports running clusters under Red Hat Enterprise Linux and CentOS versions 6.8 and 7.3, to enable staying up to date on OS security and features.
Cloud providers continue to expand their scope, both in the geographic regions where they run and in the instance types they offer. The cloud provider plugins for AWS, Google Cloud Platform, and Microsoft Azure packaged with Cloudera Director 2.3 are updated to include the latest regions and instance types to make it easier to take advantage of the latest provider features.
Using Cloudera Director
If you’re ready to give the latest version of Cloudera Director a try, here are the ways you can get started.
- Download Cloudera Director from our website, where you can also find its user guide, to start fresh or upgrade from an existing installation.
- Use these sample configuration files and scripts as starting points for setting up your clusters.
- Send questions or feedback to the Cloudera Director community forum.
Bill Havanki is a Software Engineer at Cloudera.