Cloudera announces support for Azure’s next-generation Data Lake Store

Today we are proud to announce our support for ADLS Gen2 as it enters general availability on Microsoft Azure. CDH 6.1 already includes support for MapReduce and Spark jobs, Hive and Impala queries, and Oozie workflows on ADLS Gen2.

The Cloudera platform delivers a one-stop shop that allows you to store any kind of data, process and analyze it in many different ways in a single environment, and integrate with the rest of your data infrastructure. But working with cloud storage has often been a compromise. Enterprises started moving to the cloud expecting infinite scalability and simultaneous cost savings, but the reality has often turned out to be more nuanced. Before they can fully realize the benefits of the cloud, they have had to adjust to new data models and new processes. Eventual consistency and other pitfalls can be a nightmare for engineers trying to migrate complex big data infrastructure to the cloud.

The introduction of ADLS Gen1 was exciting because it was cloud storage that behaved like HDFS. As a Hadoop developer, I loved that! Directory renames were fast and atomic. Fine-grained security was easy, and you could configure authentication at the machine level with MSI, or in individual jobs, etc. with service principals.

Now ADLS Gen2 takes it to another level, offering the same file-system semantics but as a first-class citizen in the Azure Storage stack, with lower prices, and dramatically wider availability across Azure regions. We even saw improved performance, with a variety of workloads consistently running 10 – 15% faster than on ADLS Gen1. With the anticipated compatibility with the blob storage API, ADLS Gen2 really does become an ideal data store for a cloud “Data Hub”.

Microsoft’s Hadoop driver for ADLS Gen2 (known as ABFS, or Azure Blob FileSystem) was refined and adopted into Apache Hadoop 3.2 as a result of collaboration with engineers from both Cloudera and Hortonworks even before our merger. This work is a prime example of the synergy and compatibility of the 2 companies and a precursor to even more exciting things that will come from our continued partnership with Microsoft.

Now that it is generally available, I would suggest any Cloudera developer deploying to Azure take a look at ADLS Gen2 to understand if it meets your needs. Read the announcement here.