What is Cassandra ?
Apache Cassandra is a highly scalable distributed database which allows you to store and manage high structured data across multiple commodity servers without any failure.
A NoSQL database is a database that does not store data in tabular relations used in relational databases. These databases are schema-free, support easy replication, have simple API, and can easily handle large amount of data.
The main objective of a NoSQL database are:
- Simplicity of design
- Finer control
- Horizontal scaling
NoSql databases uses different data structures as compared to relational databases. It makes some operations faster in NoSQL. This databases are very fast and can be accessed easily.
Features of Cassandra
Cassandra has become much popular because of its amazing technical features. Lets checkout some of the features of Cassandra:
Flexible data storage - Cassandra accommodates all available data formats including structured, semi-structured, and unstructured data. It can accommodate changes dynamically to your data structures as per your need.
Elastic scalability - Cassandra is highly scalable because it allows you to add more hardware to serve more customers and more data on demand.
Always on architecture - Cassandra has zero failure and it is continuously available for business-critical applications that must not face any failure.bbbbbbbb
Linear-scale performance - Cassandra is linearly scalable, it enhances your throughput as you increase the number of nodes in the cluster. Therefore it maintains a quick response time.
Transaction support - Cassandra supports properties like Atomicity, Consistency, Isolation, and Durability.
Data distribution - Cassandra provides the flexibility to distribute data anywhere by replicating data across multiple data centers.
Architecture of Cassandra
Some of the important components of the Cassandra architecture are as follows:
Cluster – it is a complete set of multiple data centers on which entire data is stored for processing in the database
Data center – a set of related nodes which are grouped in a data center
Node – the specific place where the data resides on the cluster is called the node
Mem-table – it is a data structure that resides in the memory to write after the data is written to the mem-table. There can be multiple mem-tables even for single-column family data
SSTable – when the mem-table reaches its threshold value then data is moved into disk file called the SSTable
Bloom filter – the bloom filter is an algorithm that let you know whether an element is a member of a set or not. These bloom filters are accessed after each query executed.
Cassandra column-oriented approach for data storage makes it easy to store data where each row in a column can contain varied number of columns and there is no need for the column names to match. Because of the log structured storage engine of Cassandra it is possible to deploy high speed write operations which is most suited for storing and analyzing data.
Due to its inherent persistent cache of data, Cassandra can be easily deployed for storing Key-value data that needs to have high availability.
Since most of the Big Data available now a days is in the unstructured format it makes perfect sense to integrate the NoSQL database Cassandra for Hadoop applications. This is another reason why Cassandra has seen huge deployment due to its seamless integration with the Hadoop framework. It is possible to deploy MapReduce job for read and write operations to the Cassandra database.
Top Interview questions for Apache Cassandra:
1) What is Apache Cassandra?
2) What is NoSQL?
3) Cassandra is written in which language?
4) How many types of NoSQL databases are there?
5) What is the relationship between Apache Hadoop, HBase, Hive and Cassandra?
6) What is SSTable?
7) How Cassandra stores data?
8) Define Mem-table in Cassandra?
9) What are “Seed Nodes” in Cassandra?
10) Explain the Cassandra Data Model?
11) What is Cassandra- CQL collections?
12) What do you understand by Bloom filter in Cassandra?
13) What do you understand by Column Family?
14) What do you understand by CQL?
15) What is the use of “cqlsh –version” command?
16) Do you have any experience in virtual machine automation?
17) What are the collection data types provided by CQL?
18) Which command is used to start the cqlsh prompt?
19) What is the use of “void close()” method?
20) What is cqlsh?
21) What does JMX stands for?
22) What do you understand by Thrift?
23) What is Zero Consistency?
24) What are secondary indexes?
25) What ports does Cassandra use?
26) What do you understand by High availability?
27) When to use secondary indexes?
28) When to avoid secondary indexes
29) When should you not use Cassandra?
30) Define Memtable?
Conclusion: - Apache Cassandra NoSQL tool can easily take your career to the next level. Cassandra is really a powerful tool that has some unique features making it the one of the best NoSQL tools to integrate into Hadoop.
Some of the many Cassandra Interview Questions listed below will help you get an idea about what questions gets asked in such jobs related to Software Engineering & Tech. Get through the Cassandra Interview bar with our selected Cassandra Interview Questions for all Cassandra enthusiasts!