The following table specifying the main differences between Cassandra and HBase:
HBase |
Cassandra |
HBase is based on Bigtable (Google) |
Cassandra is based on DynamoDB (Amazon). It
was initially developed at Facebook by former Amazon engineers. This is one
reason why Cassandra supports multi data center. |
HBase uses the Hadoop infrastructure
(Zookeeper, NameNode, HDFS). Organizations that deploy Hadoop must have the
knowledge of Hadoop and HBase |
Cassandra started and evolved separate from
Hadoop and its infrastructure and operational knowledge requirements are
different than Hadoop. However, for analytics, many Cassandra deployments use
Cassandra + Storm (which uses zookeeper), and/or Cassandra + Hadoop. |
The HBase-Hadoop infrastructure has several
"moving parts" consisting of Zookeeper, Name Node, HBase master,
and data nodes, Zookeeper is clustered and naturally fault tolerant. Name
Node needs to be clustered to be fault tolerant. |
Cassandra uses a single node-type. All nodes
are equal and perform all functions. Any node can act as a coordinator,
ensuring no Spof. Adding storm or Hadoop, of course, adds complexity to the
infrastructure. |
HBase is well suited for doing range based
scans. |
Cassandra does not support range based
row-scans which may be limiting in certain use-cases. |
HBase provides for asynchronous replication
of an HBase cluster across a wan. |
Cassandra random partitioning provides for
row-replication of a single row across a wan. |
HBase only supports ordered partitioning. |
Cassandra officially supports ordered
partitioning, but no production user of Cassandra uses ordered partitioning
due to the "hot spots" it creates and the operational difficulties
such hot-spots cause. |
Due to ordered partitioning, HBase will
easily scale horizontally while still supporting Rowkey range scans. |
If data is stored in columns in Cassandra to
support range scans, the practical limitation of a row size in Cassandra is
10's of megabytes. |
HBase supports atomic compare and set. HBase
supports transaction within a row. |
Cassandra does not support atomic compare
and set. |
HBase does not support read load balancing
against a single row. A single row is served by exactly one region server at
a time. |
Cassandra will support read load balancing
against a single row. |
Bloom filters can be used in HBase as
another form of indexing. |
Cassandra uses bloom filters for key lookup. |
Triggers are supported by the coprocessor
capability in HBase. |
Cassandra does not support co-processor-like
functionality. |
0 Comments