The Dell | Hadoop solution, offered in conjunction with Cloudera and called Dell | Cloudera Solution for Apache Hadoop, lowers the barrier to adoption for businesses looking to use Hadoop in production. Dell’s customer-centered approach is to create rapidly deployable and highly optimized end-to-end Hadoop solutions running on commodity hardware. Dell provides all the hardware and software components and resources to meet the customer’s requirements and no other supplier need be involved.
The hardware platform for the Dell | Hadoop solution is the Dell™ PowerEdge™ C Series. Dell PowerEdge C Series servers are focused on hyperscale and cloud capabilities. Rather than emphasizing gigahertz and gigabytes, these servers deliver maximum density, memory, and serviceability while minimizing total cost of ownership. It’s all about getting the processing customers need in the least amount of space and in an energy-efficient package that slashes operational costs.
The operating system of choice for the Dell | Hadoop solution is Linux (i.e. Red Hat Enterprise Linux, CentOS, etc.). The recommended Java Virtual Machine (JVM) is the Oracle Sun JVM.
The hardware platforms, the operating system, and the Java Virtual Machine make up the foundation on which the Hadoop software stack runs.
The bottom layer of the Hadoop stack comprises two frameworks:
- The Data Storage Framework (HDFS) is the filesystem that Hadoop uses to store data on the cluster nodes. Hadoop Distributed File System (HDFS) is a distributed, scalable, and portable filesystem.
- The Data Processing Framework (MapReduce) is a massively parallel compute framework inspired by Google’s MapReduce papers.
The next layer of the stack in the Dell | Hadoop solution design is the network layer. Dell recommends implementing the Hadoop cluster on a dedicated network for two reasons:
- Dell provides network design blueprints that have been tested and qualified.
- Network performance predictability—sharing the network with other applications may have a detrimental impact on the performance of the Hadoop jobs.
The next two frameworks—the Data Access Framework and the Data Orchestration Framework—comprise utilities that are part of the Hadoop ecosystem.
Dell listened to its customers and designed a Hadoop solution that is fairly unique in the marketplace. Dell’s end-to-end solution approach means that the customer can be in production with Hadoop in shortest time possible. The Dell | Hadoop solution embodies all the software functions and services needed to run Hadoop in a production environment. The customer is not left wondering, “What else is missing?” One of Dell’s chief contributions to Hadoop is a method to rapidly deploy and integrate Hadoop in production. Other major contributions include integrated backup, management, and security functions. These complementary functions are designed and implemented side-by-side with the core Hadoop core technology.
Installing and configuring Hadoop is non-trivial. There are different roles and configurations that need to deployed on various nodes. Designing, deploying, and optimizing the network layer to match Hadoop’s scalability requires a lot of thinking and also consideration for the type of workloads that will be running on the Hadoop cluster. The deployment mechanism that Dell designed for Hadoop automates the deployment of the cluster from “bare-metal” (no operating system installed) all the way to installing and configuring the Hadoop software components to specific customer requirements. Intermediary steps include system BIOS update and configuration, RAID/SAS configuration, operating system deployment, Hadoop software deployment, Hadoop software configuration, and integration with the customer’s data center applications (i.e. monitoring and alerting).
Data backup and recovery is another topic that was brought up during customer roundtables. As Hadoop becomes the de facto platform for business-critical applications, the data that is stored in Hadoop is crucial for ensuring business continuity. Dell’s approach is to offer several enterprise-grade backup solutions and let the customer choose.
Customers also commented on the current security model of Hadoop. It is a real concern because as a larger number of business users share access to exponentially increasing volumes of data, the security designs and practices need to evolve to accommodate the scale and the risks involved. Also HIPAA, Sarbanes-Oxley, SAS70, and PCI Security Standards Council may have an interest in data stored in Hadoop. Particularly in industries like healthcare and financial services, access to the data has to be enforced and monitored across the entire stack. Unfortunately, there is no clear answer on how the security architecture of Hadoop is going to evolve. Dell’s approach is to educate the customer and also work directly with leading vendors to deliver a model that suits the enterprise.
Lastly, Dell’s open, integrated approach to enterprise-wide systems management enables customers to build comprehensive system management solutions based on open standards and integrated with industry-leading partners. Instead of building a patchwork of solutions leading to systems management sprawl, Dell integrates the management of the Dell hardware running the Hadoop cluster with the “traditional” Hadoop management consoles (Ganglia, Nagios).
To summarize, Dell is adding Hadoop to its data analytics solutions portfolio. Dell’s end-to-end solution approach means that Dell will provide readily available software connectors for integration between the solutions in the portfolio.
For additional information, please check out http://www.dell.com/hadoop.