Isilon HDFS and Rack Awareness for DataNode Connections

Wanted to clear up the implementation of racks with Isilon hdfs and providing rack awareness, using Isilon racks we can provide node location awareness to clients and within-switch datanode connections collocation. This gives you the capability to provide switch aware or top of rack functionality that emulates hdfs local storage implementations, where compute nodes attempt to get data blocks from the closest datanode, staying within the rack and within a top of rack switch so that hdfs traffic is not crossing interconnects or traversing into the core network.

This is implemented on the Isilon using SmartConnect Advanced and hdfs racks. The enclosed diagram illustrates how using a number of network pools can be used to provide specific datanode/isilon node access to specific clients.

(For the purposes of this blog, we assume a top of rack switch per rack with the Isilon nodes and compute nodes collocated in the same rack, this does not have to be implemented physically in this way, devices can be located wherever, what is critical is the networking connection for a ‘rack’ are within the same switch/blade so traffic does not need to cross interconnects/or core)

The location or specific node aware requirement is implemented using a network pool with a SmartConnect name for NameNode access and then adding additional pools (smartconnect name optional) for each rack aware set of clients you wish to delegate specific node access to.

NameNode Pool:

-Has a SmartConnect Zone name, used by all clients for all NN, SN connections via Access Zone assignment, has SmartConnect name and is valid DNS

-Dynamic Pool, Round Robin

-Contains all relevant node interfaces, even across rack/switches -provides maximum resiliency/scale

X(n)Rack Pools:

-No SmartConnect Name requirement

-Dynamic, Round Robin

-Should only contain nodes & interfaces which you wish to delegate datanode access to a subset of clients(IP) eg: all the interfaces of nodes in the same rack/switch as compute clients accessing them

Then within, isi hdfs you assign the source IP clients/IP range to the allocated Isilon rack/pool. This ensures when a client makes a NameNode request to obtain a datanode connection, we only return the nodes assigned to the rack/pool which is associated with those clients IP’s. Since all the clients(IP’s) within a single switch are associated with all the Isilon node interfaces also within that same switch. DataNode traffic is kept within that rack/switch minimizing the network hops needed to retrieve data.

This can be scaled out as needed with (X)number of datanode pools created and assigned to client IP racks to partition access to the required Isilon nodes and interfaces. This can be implemented on a single HDFS root or to provide multi-tenancy for multiple HDFS root on Isilon. The key is these rack pools do not need smartconnect names just IP in a pool that can be delegated to a rack.




You likely don’t need to use racks with Isilon if:

-you don’t have separation of nodes and clients in racks

-you have all your clients accessing all the interface on all nodes

-you only have 1 hdfs root on your cluster


Racks complicate configuration and only attempt to provide clients with DN access to a specific subset of Isilon node interfaces, determine if this is what you need or just use the default no rack configuration where DN access is based on the same SmartConnect dynamic pool in use for the NN.

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s