Clusters and Storage Architecture

To accommodate the high end needs for business continuity like optimal performance, disaster recovery, load balancing, scalability and high-availability, organizations more often implement the cluster configurations for databases. Clusters, to recall once again, is the group of resource components linked together to work upon single task, as if they all were one. Cluster in reference with database can be better understood as dedicated hosting servers in cluster performing operation on a single database. Components here comprises of computing powers of servers in terms of processors and memory. When implemented, the storage architecture of underlying DBMS plays very crucial role. There are basically two types of storage architecture for DBMS, share nothing and share everything. The name themselves indicate the purpose of their use. The architecture explain the accessibility and distribution of data in cluster. Each of the mentioned have their own pros and cons.

Let us have glance at both the architectures one-by-one.

1. Shared nothing: it’s the simple architecture where each server/node in cluster has own data under its ownership. Every individual share no data at all with other node in cluster. Hence it is called shared nothing. When you implement the clustering solution with shared nothing architecture, you must divide/split the data across the nodes. The splitting of data can be logical or physical, and the process of splitting is termed as partitioning. In logical partitioning you can split the data across node in a way to better understand the responsibility of server. Ex: node 1 will own the data of “Sales” department, node 2 will own the data of “Purchase” department and so on. Whereas in physical portioning the data simply is distributed over servers in certain amount. When a request to fetch the data in such architecture is made, then request is processed with the help of routing table that can route the request to node which owns data. This is base of Distributed Transactions.

2. Shared Everything: shared everything or simply shared disks is an array of disks that holds all of data in database. Each node in the cluster acts on the single collection of data. This array of disk can be typically SAN or NAS. All the nodes in cluster have access to all the data and at any point in time. They can accommodate any request to any data. In this case there is no need to split the data. So instead of switching between the server for specific data, shared everything can easily route the request to next available node in cluster. This is base of Load Balancing.

It depends upon various needs of the organization to choose the architecture. The needs can be evaluated on the basis of various parameters mentioned below. DBA’s have lot of stuff to compare while evaluating which architecture will accommodate the needs in better way.

1. The choice begins at setup level itself. In case of shared nothing, you have work upon the partitioning with suitable portioning schemes, of data across the servers and decide the Cost of Ownership, assemble the routing tables to route the data request, etc. whereas in case of Shared everything, you do not need to look into this section as all the server will access the data centrally.

2. Data Maintenance when business is live becomes crucial consideration. As data is subject to growing and undergoing changes. Data portioning may turn sub-optimal which may lead to database poor performance. So in case of shared nothing only way to overcome such problems is re-partioning of data from time-to-time. Whereas shared everything is free from such expenses.

3. Performance overhead: Most of us are well aware of Inter-Nodal Messaging, that refers to node level information and cluster-status that is shared between all nodes in clusters. This messaging comprises of information on data locking, buffering, node heartbeats, and other relevant information. There are various trade-offs with these messaging that is linear to number of nodes in cluster. The trade-off is only related to shared everything, whereas shared nothing has no deal with inter-nodal messaging.

4. Data Retrieval Speed: with share everything system can experience considerably small amount of latency in retrieving data from NAS or SAN, while shared nothing databases access data from local disks at faster bus speeds.

There are also various other parameters that can be considered while selecting the storage architecture like Failover and load balancing capabilities, Data Consistency, Scalability, etc. But the above mentioned are the basic. DBA’s must ensure the needs of the organization and better understand both architectures.

Compare the benchmark performance of the both with overheads involved and then finally implement.

Be Sociable, Share!