Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
SlideShare a Scribd company logo
©2013 DataStax Confidential. Do not distribute without consent.
@spyced
Jonathan Ellis
CTO, DataStax / Project Chair, Apache Cassandra
Cassandra 2.0
Five Years of Cassandra
Jul-09 May-10 Feb-11 Dec-11 Oct-12 Jul-13
0.1 0.3 0.6 0.7 1.0 1.2
...
2.0
DSE
Jul-08
Core values
0
20000
40000
60000
80000
0 2 4 6 8 10 12
Cassandra HBase Redis MySQL
• Massive scalablility
• High performance
• Reliability/Availability
CREATE TABLE users (
id uuid PRIMARY KEY,
name text,
state text,
birth_date int
);
CREATE INDEX ON users(state);
SELECT * FROM users
WHERE state=‘Texas’
AND birth_date > 1950;
New Core Value
• Massive scalablility
• High performance
• Reliability/Availability
• Ease of use
*Key concepts?
*Data Modeling section of documentation: http://www.datastax.com/
documentation/cassandra/1.2/index.html#cassandra/ddl/
ddl_anatomy_table_c.html
CQL delivers
"Coming from a relational database background we found
the transition to Cassandra to be very straightforward. There are a
few simple key concepts one must grasp at first but ever since it's
been smooth sailing for us."
Boris Wolf, Comcast
1.2 for Developers
• CQL3
• Thrift compatibility
• Collections
• Data dictionary
• Auth support
• Hadoop support
• Native drivers
• Tracing
• Atomic batches
[cassandra.yaml]
authenticator: PasswordAuthenticator
# DSE offers KerberosAuthenticator as well
CREATE USER robin WITH PASSWORD 'manager' SUPERUSER;
ALTER USER cassandra WITH PASSWORD 'newpassword';
LIST USERS;
DROP USER cassandra;
Authentication
[cassandra.yaml]
authorizer: CassandraAuthorizer
GRANT select ON audit TO jonathan;
GRANT modify ON users TO robin;
GRANT all ON ALL KEYSPACES TO lara;
Authorization
Native drivers
• CQL native protocol: efficient, lightweight, asynchronous
• Java (GA): https://github.com/datastax/java-driver
• .NET (Beta): https://github.com/datastax/csharp-driver
• Python (Beta): https://github.com/datastax/python-driver
• Coming soon: PHP, Ruby, others
1.2 for Operators
• Virtual nodes
• “Dense node” support (5-10TB/machine)
• JBOD improvements
• Off-heap bloom filters, compression metadata
• Parallel leveled compaction
1.2.5+
1.2.5+
• ~1/2 memory usage in partition summary
1.2.5+
• ~1/2 memory usage in partition summary
• Improved compaction throttle
1.2.5+
• ~1/2 memory usage in partition summary
• Improved compaction throttle
• Parallel leveled compaction
1.2.5+
• ~1/2 memory usage in partition summary
• Improved compaction throttle
• Parallel leveled compaction
• Removed cell-name bloom filters
1.2.5+
• ~1/2 memory usage in partition summary
• Improved compaction throttle
• Parallel leveled compaction
• Removed cell-name bloom filters
• Thread-local allocation
1.2.5+
• ~1/2 memory usage in partition summary
• Improved compaction throttle
• Parallel leveled compaction
• Removed cell-name bloom filters
• Thread-local allocation
• LZ4 compression (default in 2.0)
1.2.5+
• ~1/2 memory usage in partition summary
• Improved compaction throttle
• Parallel leveled compaction
• Removed cell-name bloom filters
• Thread-local allocation
• LZ4 compression (default in 2.0)
• (1.2.7) CQL Input/Output for Hadoop
1.2.5+
• ~1/2 memory usage in partition summary
• Improved compaction throttle
• Parallel leveled compaction
• Removed cell-name bloom filters
• Thread-local allocation
• LZ4 compression (default in 2.0)
• (1.2.7) CQL Input/Output for Hadoop
• (1.2.7) Range tombstone performance
1.2.5+
• ~1/2 memory usage in partition summary
• Improved compaction throttle
• Parallel leveled compaction
• Removed cell-name bloom filters
• Thread-local allocation
• LZ4 compression (default in 2.0)
• (1.2.7) CQL Input/Output for Hadoop
• (1.2.7) Range tombstone performance
• (1.2.9) Larger default LCS filesize (160MB > 5MB)
Cassandra 2.0
2.0
• Lightweight transactions
• Triggers (experimental)
• Improved compaction
• CQL cursors
SELECT * FROM users
WHERE username = ’jbellis’
[empty resultset]
INSERT INTO users (...)
VALUES (’jbellis’, ...)
Session 1
SELECT * FROM users
WHERE username = ’jbellis’
[empty resultset]
INSERT INTO users (...)
VALUES (’jbellis’, ...)
Session 2
Lightweight transactions: the problem
Paxos
• All operations are quorum-based
• Each replica sends information about unfinished operations to the leader
during prepare
• Paxos made Simple
LWT: details
• 4 round trips vs 1 for normal updates
• Paxos state is durable
• Immediate consistency with no leader election or failover
• ConsistencyLevel.SERIAL
• http://www.datastax.com/dev/blog/lightweight-transactions-in-
cassandra-2-0
LWT: Use with caution
• Great for 1% of your application
• Eventual consistency is your friend
• http://www.slideshare.net/planetcassandra/c-summit-2013-eventual-consistency-
hopeful-consistency-by-christos-kalantzis
UPDATE USERS
SET email = ’jonathan@datastax.com’, ...
WHERE username = ’jbellis’
IF email = ’jbellis@datastax.com’;
INSERT INTO USERS (username, email, ...)
VALUES (‘jbellis’, ‘jbellis@datastax.com’, ... )
IF NOT EXISTS;
Using LWT
Triggers
CREATE TRIGGER <name> ON <table> USING <classname>;
Trigger implementation
class MyTrigger implements ITrigger
{
public Collection<RowMutation> augment(ByteBuffer key, ColumnFamily update)
{
...
}
}
Experimental!
• Relies on internal RowMutation, ColumnFamily classes
• [partition] key is a ByteBuffer
• Expect changes in 2.1
Compaction
• Single-pass, always
• LCS performs STCS in L0
Healthy leveled compaction
Sad leveled compaction
STCS in L0
Cursors (before)
SELECT *
FROM timeline
WHERE (user_id = :last_key
AND tweet_id > :last_tweet)
OR token(user_id) > token(:last_key)
LIMIT 100
CREATE TABLE timeline (
  user_id uuid,
  tweet_id timeuuid,
  tweet_author uuid,
tweet_body text,
  PRIMARY KEY (user_id, tweet_id)
);
Cursors (after)
SELECT *
FROM timeline
Misc. performance improvements
Misc. performance improvements
• Tracking statistics on clustered columns allows eliminating unnecessary
sstables from the read path
Misc. performance improvements
• Tracking statistics on clustered columns allows eliminating unnecessary
sstables from the read path
• New half-synchronous, half-asynchronous Thrift server based on LMAX
Disruptor
Misc. performance improvements
• Tracking statistics on clustered columns allows eliminating unnecessary
sstables from the read path
• New half-synchronous, half-asynchronous Thrift server based on LMAX
Disruptor
• Faster partition index lookups and cache reads by improving performance
of off-heap memory
Misc. performance improvements
• Tracking statistics on clustered columns allows eliminating unnecessary
sstables from the read path
• New half-synchronous, half-asynchronous Thrift server based on LMAX
Disruptor
• Faster partition index lookups and cache reads by improving performance
of off-heap memory
• Faster reads of compressed data by switching from CRC32 to Adler
checksums
Misc. performance improvements
• Tracking statistics on clustered columns allows eliminating unnecessary
sstables from the read path
• New half-synchronous, half-asynchronous Thrift server based on LMAX
Disruptor
• Faster partition index lookups and cache reads by improving performance
of off-heap memory
• Faster reads of compressed data by switching from CRC32 to Adler
checksums
• JEMalloc support for off-heap allocation
Spring cleaning
Spring cleaning
• Removed compatibility with pre-1.2.5 sstables and pre-1.2.9 schema
Spring cleaning
• Removed compatibility with pre-1.2.5 sstables and pre-1.2.9 schema
• The potentially dangerous countPendingHints JMX call has been replaced
by a Hints Created metric
Spring cleaning
• Removed compatibility with pre-1.2.5 sstables and pre-1.2.9 schema
• The potentially dangerous countPendingHints JMX call has been replaced
by a Hints Created metric
• The on-heap partition cache (“row cache”) has been removed
Spring cleaning
• Removed compatibility with pre-1.2.5 sstables and pre-1.2.9 schema
• The potentially dangerous countPendingHints JMX call has been replaced
by a Hints Created metric
• The on-heap partition cache (“row cache”) has been removed
• Vnodes are on by default
Spring cleaning
• Removed compatibility with pre-1.2.5 sstables and pre-1.2.9 schema
• The potentially dangerous countPendingHints JMX call has been replaced
by a Hints Created metric
• The on-heap partition cache (“row cache”) has been removed
• Vnodes are on by default
• the old token range bisection code for non-vnode clusters is gone
Spring cleaning
• Removed compatibility with pre-1.2.5 sstables and pre-1.2.9 schema
• The potentially dangerous countPendingHints JMX call has been replaced
by a Hints Created metric
• The on-heap partition cache (“row cache”) has been removed
• Vnodes are on by default
• the old token range bisection code for non-vnode clusters is gone
• Removed emergency memory pressure valve logic
Operational concerns
Operational concerns
• Java7 is now required!
Operational concerns
• Java7 is now required!
• Leveled compaction level information has been moved into sstable
metadata
Operational concerns
• Java7 is now required!
• Leveled compaction level information has been moved into sstable
metadata
• Kernel page cache skipping has been removed in favor of optional row
preheating (preheat_kernel_page_cache)
Operational concerns
• Java7 is now required!
• Leveled compaction level information has been moved into sstable
metadata
• Kernel page cache skipping has been removed in favor of optional row
preheating (preheat_kernel_page_cache)
• Streaming has been rewritten to be more transparent and robust.
Operational concerns
• Java7 is now required!
• Leveled compaction level information has been moved into sstable
metadata
• Kernel page cache skipping has been removed in favor of optional row
preheating (preheat_kernel_page_cache)
• Streaming has been rewritten to be more transparent and robust.
• Streaming support for old-version sstables
©2013 DataStax Confidential. Do not distribute without consent. 31

More Related Content

London + Dublin Cassandra 2.0

  • 1. ©2013 DataStax Confidential. Do not distribute without consent. @spyced Jonathan Ellis CTO, DataStax / Project Chair, Apache Cassandra Cassandra 2.0
  • 2. Five Years of Cassandra Jul-09 May-10 Feb-11 Dec-11 Oct-12 Jul-13 0.1 0.3 0.6 0.7 1.0 1.2 ... 2.0 DSE Jul-08
  • 3. Core values 0 20000 40000 60000 80000 0 2 4 6 8 10 12 Cassandra HBase Redis MySQL • Massive scalablility • High performance • Reliability/Availability
  • 4. CREATE TABLE users ( id uuid PRIMARY KEY, name text, state text, birth_date int ); CREATE INDEX ON users(state); SELECT * FROM users WHERE state=‘Texas’ AND birth_date > 1950; New Core Value • Massive scalablility • High performance • Reliability/Availability • Ease of use
  • 5. *Key concepts? *Data Modeling section of documentation: http://www.datastax.com/ documentation/cassandra/1.2/index.html#cassandra/ddl/ ddl_anatomy_table_c.html CQL delivers "Coming from a relational database background we found the transition to Cassandra to be very straightforward. There are a few simple key concepts one must grasp at first but ever since it's been smooth sailing for us." Boris Wolf, Comcast
  • 6. 1.2 for Developers • CQL3 • Thrift compatibility • Collections • Data dictionary • Auth support • Hadoop support • Native drivers • Tracing • Atomic batches
  • 7. [cassandra.yaml] authenticator: PasswordAuthenticator # DSE offers KerberosAuthenticator as well CREATE USER robin WITH PASSWORD 'manager' SUPERUSER; ALTER USER cassandra WITH PASSWORD 'newpassword'; LIST USERS; DROP USER cassandra; Authentication
  • 8. [cassandra.yaml] authorizer: CassandraAuthorizer GRANT select ON audit TO jonathan; GRANT modify ON users TO robin; GRANT all ON ALL KEYSPACES TO lara; Authorization
  • 9. Native drivers • CQL native protocol: efficient, lightweight, asynchronous • Java (GA): https://github.com/datastax/java-driver • .NET (Beta): https://github.com/datastax/csharp-driver • Python (Beta): https://github.com/datastax/python-driver • Coming soon: PHP, Ruby, others
  • 10. 1.2 for Operators • Virtual nodes • “Dense node” support (5-10TB/machine) • JBOD improvements • Off-heap bloom filters, compression metadata • Parallel leveled compaction
  • 12. 1.2.5+ • ~1/2 memory usage in partition summary
  • 13. 1.2.5+ • ~1/2 memory usage in partition summary • Improved compaction throttle
  • 14. 1.2.5+ • ~1/2 memory usage in partition summary • Improved compaction throttle • Parallel leveled compaction
  • 15. 1.2.5+ • ~1/2 memory usage in partition summary • Improved compaction throttle • Parallel leveled compaction • Removed cell-name bloom filters
  • 16. 1.2.5+ • ~1/2 memory usage in partition summary • Improved compaction throttle • Parallel leveled compaction • Removed cell-name bloom filters • Thread-local allocation
  • 17. 1.2.5+ • ~1/2 memory usage in partition summary • Improved compaction throttle • Parallel leveled compaction • Removed cell-name bloom filters • Thread-local allocation • LZ4 compression (default in 2.0)
  • 18. 1.2.5+ • ~1/2 memory usage in partition summary • Improved compaction throttle • Parallel leveled compaction • Removed cell-name bloom filters • Thread-local allocation • LZ4 compression (default in 2.0) • (1.2.7) CQL Input/Output for Hadoop
  • 19. 1.2.5+ • ~1/2 memory usage in partition summary • Improved compaction throttle • Parallel leveled compaction • Removed cell-name bloom filters • Thread-local allocation • LZ4 compression (default in 2.0) • (1.2.7) CQL Input/Output for Hadoop • (1.2.7) Range tombstone performance
  • 20. 1.2.5+ • ~1/2 memory usage in partition summary • Improved compaction throttle • Parallel leveled compaction • Removed cell-name bloom filters • Thread-local allocation • LZ4 compression (default in 2.0) • (1.2.7) CQL Input/Output for Hadoop • (1.2.7) Range tombstone performance • (1.2.9) Larger default LCS filesize (160MB > 5MB)
  • 22. 2.0 • Lightweight transactions • Triggers (experimental) • Improved compaction • CQL cursors
  • 23. SELECT * FROM users WHERE username = ’jbellis’ [empty resultset] INSERT INTO users (...) VALUES (’jbellis’, ...) Session 1 SELECT * FROM users WHERE username = ’jbellis’ [empty resultset] INSERT INTO users (...) VALUES (’jbellis’, ...) Session 2 Lightweight transactions: the problem
  • 24. Paxos • All operations are quorum-based • Each replica sends information about unfinished operations to the leader during prepare • Paxos made Simple
  • 25. LWT: details • 4 round trips vs 1 for normal updates • Paxos state is durable • Immediate consistency with no leader election or failover • ConsistencyLevel.SERIAL • http://www.datastax.com/dev/blog/lightweight-transactions-in- cassandra-2-0
  • 26. LWT: Use with caution • Great for 1% of your application • Eventual consistency is your friend • http://www.slideshare.net/planetcassandra/c-summit-2013-eventual-consistency- hopeful-consistency-by-christos-kalantzis
  • 27. UPDATE USERS SET email = ’jonathan@datastax.com’, ... WHERE username = ’jbellis’ IF email = ’jbellis@datastax.com’; INSERT INTO USERS (username, email, ...) VALUES (‘jbellis’, ‘jbellis@datastax.com’, ... ) IF NOT EXISTS; Using LWT
  • 28. Triggers CREATE TRIGGER <name> ON <table> USING <classname>;
  • 29. Trigger implementation class MyTrigger implements ITrigger { public Collection<RowMutation> augment(ByteBuffer key, ColumnFamily update) { ... } }
  • 30. Experimental! • Relies on internal RowMutation, ColumnFamily classes • [partition] key is a ByteBuffer • Expect changes in 2.1
  • 31. Compaction • Single-pass, always • LCS performs STCS in L0
  • 35. Cursors (before) SELECT * FROM timeline WHERE (user_id = :last_key AND tweet_id > :last_tweet) OR token(user_id) > token(:last_key) LIMIT 100 CREATE TABLE timeline (   user_id uuid,   tweet_id timeuuid,   tweet_author uuid, tweet_body text,   PRIMARY KEY (user_id, tweet_id) );
  • 38. Misc. performance improvements • Tracking statistics on clustered columns allows eliminating unnecessary sstables from the read path
  • 39. Misc. performance improvements • Tracking statistics on clustered columns allows eliminating unnecessary sstables from the read path • New half-synchronous, half-asynchronous Thrift server based on LMAX Disruptor
  • 40. Misc. performance improvements • Tracking statistics on clustered columns allows eliminating unnecessary sstables from the read path • New half-synchronous, half-asynchronous Thrift server based on LMAX Disruptor • Faster partition index lookups and cache reads by improving performance of off-heap memory
  • 41. Misc. performance improvements • Tracking statistics on clustered columns allows eliminating unnecessary sstables from the read path • New half-synchronous, half-asynchronous Thrift server based on LMAX Disruptor • Faster partition index lookups and cache reads by improving performance of off-heap memory • Faster reads of compressed data by switching from CRC32 to Adler checksums
  • 42. Misc. performance improvements • Tracking statistics on clustered columns allows eliminating unnecessary sstables from the read path • New half-synchronous, half-asynchronous Thrift server based on LMAX Disruptor • Faster partition index lookups and cache reads by improving performance of off-heap memory • Faster reads of compressed data by switching from CRC32 to Adler checksums • JEMalloc support for off-heap allocation
  • 44. Spring cleaning • Removed compatibility with pre-1.2.5 sstables and pre-1.2.9 schema
  • 45. Spring cleaning • Removed compatibility with pre-1.2.5 sstables and pre-1.2.9 schema • The potentially dangerous countPendingHints JMX call has been replaced by a Hints Created metric
  • 46. Spring cleaning • Removed compatibility with pre-1.2.5 sstables and pre-1.2.9 schema • The potentially dangerous countPendingHints JMX call has been replaced by a Hints Created metric • The on-heap partition cache (“row cache”) has been removed
  • 47. Spring cleaning • Removed compatibility with pre-1.2.5 sstables and pre-1.2.9 schema • The potentially dangerous countPendingHints JMX call has been replaced by a Hints Created metric • The on-heap partition cache (“row cache”) has been removed • Vnodes are on by default
  • 48. Spring cleaning • Removed compatibility with pre-1.2.5 sstables and pre-1.2.9 schema • The potentially dangerous countPendingHints JMX call has been replaced by a Hints Created metric • The on-heap partition cache (“row cache”) has been removed • Vnodes are on by default • the old token range bisection code for non-vnode clusters is gone
  • 49. Spring cleaning • Removed compatibility with pre-1.2.5 sstables and pre-1.2.9 schema • The potentially dangerous countPendingHints JMX call has been replaced by a Hints Created metric • The on-heap partition cache (“row cache”) has been removed • Vnodes are on by default • the old token range bisection code for non-vnode clusters is gone • Removed emergency memory pressure valve logic
  • 51. Operational concerns • Java7 is now required!
  • 52. Operational concerns • Java7 is now required! • Leveled compaction level information has been moved into sstable metadata
  • 53. Operational concerns • Java7 is now required! • Leveled compaction level information has been moved into sstable metadata • Kernel page cache skipping has been removed in favor of optional row preheating (preheat_kernel_page_cache)
  • 54. Operational concerns • Java7 is now required! • Leveled compaction level information has been moved into sstable metadata • Kernel page cache skipping has been removed in favor of optional row preheating (preheat_kernel_page_cache) • Streaming has been rewritten to be more transparent and robust.
  • 55. Operational concerns • Java7 is now required! • Leveled compaction level information has been moved into sstable metadata • Kernel page cache skipping has been removed in favor of optional row preheating (preheat_kernel_page_cache) • Streaming has been rewritten to be more transparent and robust. • Streaming support for old-version sstables
  • 56. ©2013 DataStax Confidential. Do not distribute without consent. 31