In the
previous post
we learnt about Cassandra data model and replication concepts, in this post
we will look the Cassandra architecture and read/write internals.
Architecture | Highlights
- Cassandra was designed after considering all the system/hardware failures that do occur in real world.
- Peer-to-peer, distributed system in which all nodes are alike hence reults in read/write anywhere design.
- Data is transparently partitioned among all nodes in the cluster.
- Custom data replication is provided out of the box to ensure fault tolerance.
- In Cassandra cluster each node communicates with other through the GOSSIP protocol, which exchanges information across the cluster every second.
- A commit log is used on each node to capture write activity. Data durability is assured.
- At the same time data also written to an in-memory structure (memtable) and then to disk once the memory structure is full (an SStable).
- A row in a column family is indexed by its key. Other columns may be indexed as well, we need indexes to quickly search from cassandra. Note that in Cassandra indexes are virtually another tables.
- Consistency can be choosen between strong and eventual (from all to any node responding) depending on the need. It can be done on a per-request basis, and for both reads and writes.
- Provides data compression out of the box. It uses Google's Snappy data compression algorithm, compresses data on a per column family level. There are not known performance penalty in compression.