Cassandra

Where Did Cassandra Come From
• Cassandra originated at Facebook in 2007 to
solve that company’s inbox search problem
– large volumes of data
– many random reads
– many simultaneous random writes
• was released as an open source Google Code
project in July 2008
• March 2009 it was moved to an Apache Incubator
project
• February 17, 2010 it was voted into a top-level
project

Cassandra in 50 Words or Less
• Apache Cassandra is an
– open source
– distributed
– Decentralized
– elastically scalable
– highly available
– fault-tolerant
– tuneably consistent
– column-oriented
• Database that
• bases its distribution design on Amazon’s Dynamo
• its data model on Google’s Bigtable
• Created at Facebook
• it is now used at some of the most popular sites on the Web

Who Is Using Cassandra
• Twitter is using Cassandra for analytics.
• Mahalo uses it for its primary near-time data store.
• Facebook still uses it for inbox search, though they are using a
proprietary fork.
• Digg uses it for its primary near-time data store.
• Rackspace uses it for its cloud service, monitoring, and logging.
• Reddit uses it as a persistent cache.
• Cloudkick uses it for monitoring statistics and analytics.
• Ooyala uses it to store and serve near real-time video analytics
data.
• SimpleGeo uses it as the main data store for its real-time location
infrastructure.
• Onespot uses it for a subset of its main data store

Decentralized

• Master/slave:
Decentralized Master/slave
all nodes are the same, If the master node fails, the
failures of a whole database is in jeopardy
node won’t disrupt service

Elastic Scalability
• add another machine—Cassandra will find it
and start sending it work

High Availability and Fault Tolerance

SCID
• Atomic
– All or nothing
• Consistent

• Isolated
– Two transaction modify same data
• Durable

Brewer’s CAP Theorem
• you can strongly support only two of the Three:
– Consistency
• All database client will read the same value for same query,
even given concurrent updates
– Availability
• All database clients will always be able to read and write
data
– Partition Tolerance
• The database can be split into multiple machines
• It can continue functioning in fact of network segmentation
breaks

usage
• Connect localhost/9160 ;
• Show cluster name
• Show keyspaces
• Create keyspace XXXXX
• Use XXXXX
• Create column family YYYYY
• Describe keyspace XXXXX

• Set YYYYY[“XiaoMing”][“name”] = “小明”
• Get YYYYY[“XiaoMing”]

• List
• Map
• MapList<row_id, Map>

• Column Family 列簇
• create column family User
with key_validation_class=UTF8Type

Clusters (Ring)
• If the first node goes down, a replica can
respond to queries. The peer-to-peer protocol
allows the data to replicate across nodes in a
manner transparent to the user

• Replaction factor

Keyspaces
• Don’t add too much Keyspaces

• (database)

Gossip protocols
• intra-ring communication so that each node
can have state information about other nodes
• Runs every second
• Gossip Message:
– Send: GossipDigestSynMessage
– Ack: GossipDigestAckMessage
– send: GossipDigestAck2Message
• algorithm :
– Phi Accrual Failure Detection

Anti-entropy
• Anti-entropy is the replica synchronization
mechanism in Cassandra for ensuring that
data on different nodes is updated to the
newest version
• Merkle tree

Memtable&SSTable&CommitLog
• Memtable
– Value is written to a memory-resident data structure
• SSTable
– Include: Data, Index, and Filter
– concept borrowed from Google’s Bigtable
– Memtable reaches a threshold, flushed to disk
• Commit log
– Flush status: 0 / 1
• 1:start to flush
• 0: flush success

hinted handoff & Compaction
• hinted handoff
– When a write no available
– Create a hint to node Cassandra

• Compaction:
– In order to merge SSTable
– merged data is sorted
– new index is created over the sorted data

major compaction
• stored in memory
• used to improve performance by reducing disk
access on key lookups

Tombstones 墓碑
• Knows as “soft delete”
• Not immediately deleted after execute a
delete operation
• Garbage Collection Grace Seconds:
– GCGraceSeconds
• Default: 10 days (864000 sec)

Staged Event-Driven Architecture
(SEDA)
• originally proposed in a 2001 paper called “SEDA: An
Architecture for Well-Conditioned, Scalable Internet
Services”
• A stage consists of an incoming event queue
– Read
– Mutation
– Gossip
– Response
– Anti-Entropy
– Load Balance
– Migration
– Streaming
– …

Custom FactoryUtil
• Prevent version uncompatible

Configuring Cassandra
• system_add_keyspace
– Creates a keyspace.
• system_rename_keyspace
– Changes the name of a keyspace after taking a snapshot of it. Note that this
method
– blocks until its work is done.
• system_drop_keyspace
– Deletes an entire keyspace after taking a snapshot of it.
• system_add_column_family
– Creates a column family.
• system_drop_column_family
– Deletes a column family after taking a snapshot of it.
• system_rename_column_family
– Changes the name of a column family after taking a snapshot of it. Note that
this
– method blocks until its work is done.

Creating a Column Family
• column_type
– Either Super or Standard.
• clock_type
– The only valid value is Timestamp.
• comparator
– Valid options include AsciiType, BytesType, LexicalUUIDType, LongType, TimeUUID Type, and UTF8Type.
• subcomparator
– Name of comparator used for subcolumns when the column_type is Super. Valid options are the same as comparator.
• reconciler
– Name of the class that will reconcile conflicting column versions. The only valid value at this time is Timestamp.
• comment
– Any human-readable comment in the form of a string.
• rows_cached
– The number of rows to cache.
• preload_row_cache
– Set this to true to automatically load the row cache.
• key_cache_size
– The number of keys to pull into the cache.
• read_repair_chance
– Valid values are a number between 0.0 and 1.0.

Replicas
• Simple Strategy
– RackUnawareStrategy
• Old Network Topology Strategy
– RackAwareStrategy
• Network Topology Strategy
– DataCenterShardStrategy
– datacenter.properties

Replication Factor
• specifies how many copies of each piece of
data will be stored and distributed throughout
the Cassandra cluster
• Factor = 1 : your data will exist only in a single
node in the cluster. Losing that node means
that data becomes unavailable

Increasing the Replication Factor
• Nodes grows and should increasing factor
• How to do:
– ensure that all the data is flushed to the SSTables
• flush -h 192.168.1.1 -p 9160
– stop that node
– copy the datafiles from your keyspaces
– Paste those datafiles to the new node

Replica Placement Strategies
• Simple Strategy
• Old Network Topology Strategy
• Network Topology Strategy

Adding Nodes to a Cluster
• If you want to add a new seed node, then you should
autobootstrap it first, and then change it to a seed
afterward

• Node1:
– listen_address: 192.168.1.1
– rpc_address: 0.0.0.0
• Node2:
– auto_bootstrap: true
– listen_address: 192.168.2.34
– rpc_address: 0.0.0.0

Hector
• Cluster myCluster =
HFactory.getOrCreateCluster("Test Cluster",
"192.168.2.3:9160");

• ThriftCfDef columnFamilyDefinition = new
ThriftCfDef("s3","nb",ComparatorType.UTF8TYPE
);
•
columnFamilyDefinition.setReplicateOnWrite(tru
e);

Hector
• ThriftCfDef columnFamilyDefinition = new
ThriftCfDef("s3","bb",ComparatorType.UTF8TYPE);
•
columnFamilyDefinition.setKeyValidationClass("org.apache.
cassandra.db.marshal.UTF8Type");
•
columnFamilyDefinition.setDefaultValidationClass("org.apa
che.cassandra.db.marshal.UTF8Type");
•
//myCluster.addColumnFamily(columnFamilyDefinition) ;
• columnFamilyDefinition.setId(1013);
•
myCluster.updateColumnFamily(columnFamilyDefinition);

Hector
• Keyspace myKeyspace =
HFactory.createKeyspace("s3", myCluster);
• Mutator<String> mutator =
HFactory.createMutator(myKeyspace,
StringSerializer.get());

• mutator.insert("b", "bb",
HFactory.createStringColumn("column1", "你好
在"));

Hector
• ColumnQuery q = HFactory.createColumnQuery(myKeyspace,
StringSerializer.get(), StringSerializer.get(), StringSerializer.get());
• // set key, name, cf and execute
• QueryResult<HColumn> r = q
• .setColumnFamily("bb")
• .setKey("b")
• .setName("column1")
• .execute();
• // read value from the result
• HColumn<String,String> c = r.get();
• String value = c.getValue();
• System.out.println(value);

Cassandra

Related slideshows

Recommended for you

Recommended for you

Recommended for you

Recommended for you

Recommended for you

Recommended for you

Recommended for you

Recommended for you

Recommended for you

More Related Content

What's hot

What's hot (20)

Viewers also liked

Viewers also liked (14)

Similar to Cassandra

Similar to Cassandra (20)

More from exsuns

More from exsuns (6)

Recently uploaded

Recently uploaded (20)

Cassandra