Designing for scalability with the Dell | Hadoop solution

Over the last months I’ve been having conversations with a lot of Hadoop users and developers. I’m glad to see that everyone wants to run Hadoop in production. Most of the practitioners also realize that, although Hadoop can scale, there are no clear guidelines that describe how to scale up/out Hadoop from very small to very large.

My advice is designing for scalability should be a fundamental design principle, not an afterthought. You need to plan for scale up / scale out from the very beginning, not when you run out of disk space and jobs are getting painfully slow.

The Hadoop solution that we designed at Dell has scalability at its core. You can start small (only six machines) and grow the cluster real big.

The starter cluster configuration consists of six servers and the appropriate network gear:

  • One Primary NameNode
  • One Secondary NameNode
  • One EdgeNode
  • Three DataNode’s

You can find more details about the functional role and the hardware configuration of each machine in the Dell | Hadoop Reference Architecture.

The recommendation is to distribute these machines evenly across a number of network switches to avoid single points of failure. Ideally, the machines should be installed in different racks with independent power circuits.

Next, you can scale up the cluster by adding DataNode’s in each rack. We strongly recommend adding DataNode’s in increments of three nodes, one per rack.

The reasons to start with three racks (instead of one or two) are:

  1. Most Hadoop deployments replicate the data three times, thus the copies will be evenly distributed across the cluster
  2. If one of the racks goes down, you lose only 33% of the capacity with no loss of data. Starting with two racks means 50% capacity loss when one of the two racks goes down.

If you were to start with only two racks, statistically all the copies will be placed on one of the two racks. Losing one rack leaves the cluster running with no copies of the data. Losing one more machine in the remaining rack results in loss of data.

The 3-rack network design supports up to 60 DataNode’s with no need for expensive 10 GigE solutions. To grow the cluster beyond 60 nodes you need to start a new 3-rack POD. Adding a second POD (and beyond) requires additional 10 GigE routing infrastructure.

You can see now the Dell Hadoop solution is designed for scalability, both scale-up and scale-out. You can start small, grow the cluster to up to 60 nodes with no need for 10 GigE gear, and then expand the cluster by additing additional POD’s. Practitioners can use the next figure to map out their own Hadoop scaling journey.

Last but not least, scaling up/out a Hadoop implementation is not only an exercise in infrastructure design. There are other aspects that need to be considered as well:

  • Distribution of the core functions – i.e. at small scale the NameNode and the JobTracker services run both on the NameNode machine. That machine will become a bottleneck if the cluster grows beyond a certain scale. The recommendation is to run the NameNode and JobTracker services on different machines.
  • Distributed operations – operating large-scale systems is not small task. For example, the deployment methodology (and solution) needs to be able to scale, which means scalable bare-metal deployment, a software image repository that can manage multiple versions of the software, etc.
  • Heterogeneity – large-scale systems need to be able to support multiple releases of the hardware and the software. Hadoop clusters are commonly heterogeneous. It is both impractical and unrealistic to assume that system admins can take the *entire* cluster down for software updates and maintenance. Instead there will be rolling updates that will be pushed through the system and at different levels in the stack. The applications running on the cluster should not assume a particular version of the Hadoop software stack.
  • Integration endpoints – in most cases, Hadoop is closely integrated with other systems (i.e. social networks, data warehouses, application frameworks, etc.). The connection points with these systems need to be able to sustain the kind of data throughput that a larger system may demand. Also there is going to be more than one integration point and that depends on type of interface, data throughput requirements, security requirements, etc.

Scale Up vs. Scale Out

Scale up (or vertically) means adding more resources to an existing component of a system. Adding more DataNode’s to a POD is an example of scaling up the Hadoop cluster. In Dell’s vision, the POD is the component (or the building block) of the Hadoop.

Scale out (or horizontally) means adding new components (or building blocks) to a system. Adding a new POD is an example of scaling out the Hadoop cluster.

The main trade-off between scale-up and scale-out is:

–      Scale-up usually means just incremental costs with no major impact on solution design, performance, complexity, etc. For example, adding more DataNode’s inside a POD does not require adding more network switches.

–      Scale-out, aside from incremental costs, adds complexity and in some cases it requires a redesign of the solution. For example, adding a new POD to an existing Hadoop cluster adds complexity in managing the overall cluster.

Because scaling up is less expensive, our advice is to take full advantage of the scale-up capabilities of the solution before pursuing a scale-out model.

Comments are always welcome! Also please share with us your experience with Hadoop.

Thanks for your time!