Skip to content

zznate/cassandra-stress

Folders and files

NameName
Last commit message
Last commit date

Latest commit

author
zznate
Jan 24, 2012
f43d300 · Jan 24, 2012

History

69 Commits
Jan 24, 2012
Sep 2, 2011
Sep 29, 2010
Nov 4, 2011
Jan 23, 2012
Jan 24, 2012

Repository files navigation

Overview

cassandra-stress is modeled after the stress.py script in Apache Cassandra's source distribution.

cassandra-stress is built on top of Hector, a well-tested and widely deployed Java client for Apache Cassandra. The benefits of using Hector for a tool like this are many:

  • JMX hooks into Hector for visibility into runtime statistics
  • Extensive configurability of which node(s) to target
  • Verbose logging output available
  • Hooks into performance counters to correllate client and server performance

cassandra-stress is a currently just a command line app which supports the following commands (syntactically identical to stress.py where possible):

usage: stress [options]... url1,[[url2],[url3],...]
 -b,--batch-size <arg>   The number of rows in the batch_mutate call
 -c,--columns <arg>      The number of columsn to create per key
 -o,--operation <arg>    One of insert, read, rangeslice, multiget
 -t,--threads <arg>      The number of client threads to create
 -h,--help               Print this help message and exit
 -D,--discovery-delay <arg>      The amount of time to wait between runs
                                 of Auto host discovery. Providing a value
                                 enables this service
 -h,--help                       Print this help message and exit
 -L,--consistency-levels <arg>   Defaults to QUORUM for R+W, specified in
                                 the form of [read]:[write] eg. '-L
                                 ONE:ONE'
 -M,--max-wait <arg>             The Maximum time to wait on aquiring a
                                 connection from the pool
                                 (maxWaitTimeWhenExhausted). Default is
                                 forever.
 -m,--unframed                   Disable use of TFramedTransport
 -n,--num-keys <arg>             The number of keys to create
 -o,--operation <arg>            The type of operation: insert or select
 -R,--retry-delay <arg>          The amount of time to wait between runs
                                 of Downed host retry delay execution. 30
                                 seconds by default.
 -S,--skip-retry-delay           Disable downed host retry service
                                 execution.
 -T,--thrift-timeout <arg>       The ThriftSocketTimeout value.
 -t,--threads <arg>              The number of client threads we will
                                 create
 -w,--colwidth <arg>             The widht of the column in bytes. Default
                                 is 16

This project now runs as a stand-alone binary via the Maven app-assempbler plugin. You must first clone the source of the project, then using maven, execute the install goal:

mvn install

Move to the directory where the artiffacts are assembled:

cd taget/appassembler

Now you can execute the script directly as needed:

sh bin/stress -o insert -b 50 -n 12000 localhost:9160

This will perform the operation then drop you into a shell in which you can perform additional operations. The command syntax is similar to the initial call, except that the operation is the only argument:

[cassandra-stress] 
usage: operation [options]
operation can be one of: insert, read, rangeslice, multiget, replay [N]
 -o,--operation <arg>    One of insert, read, rangeslice, multiget, verify_last_insert (1)
 -b,--batch-size <arg>   The number of rows in the batch_mutate call
 -t,--threads <arg>      The number of client threads we will create
 -c,--columns <arg>              The number of columsn to create per key
 -C,--clients <arg>              The number of pooled clients to use

To re-run the operation again, just type the operation name. So from the initial example above, re-running the read operation with the same parameters would be:

(1) Must run single thread (-t 1) and QUORUM:QUORUM. Intended to test in a cluster of at least 3 nodes RF = 3.

[cassandra-stress] read

The main benefit of going into a shell is that it keeps the JVM and connection pool machinery warm, so additional test runs will not have the overhead of spinning everything up. This has the added benefit of providing conitnuous access to Hector's JMX stats as well.

Todo (in rough priority)

  • Create ColumnFamily specifically for test
  • Add JMX innards for MBean driven command/control
  • Add replay N times functionality from shell
  • Add (much) better key distribution (gaussian, stdev argument)
  • SuperColumn support

Misc.

Offered under an MIT license (see LICENSE in the top level directory). As with any halfway decent load testing tool, you can generate a lot of load and put a system under duress or even cause it to fail. You are solely responsible for any issues experienced the execution of this tool may cause. Use at your own risk.

Cheers,
zznate