About this guide

This guide covers most CQL operations, such as

  • Inserting data
  • Querying
  • Ordering
  • Collection Types
  • Counter Columns
  • Timestamps and TTL
  • Prepared Statements
  • Range queries
  • Pagination
  • Filtering
  • Tuning consistency/availability per-request

This guide relies on certain features that are covered in the Advanced Client Options guide.

What version of Cassaforte does this guide cover?

This guide covers Cassaforte 3.0.0.

Inserting Rows (INSERT)

Consider the following table:

(ns cassaforte.docs
  (:require [clojurewerkz.cassaforte.client :as client]
            [clojurewerkz.cassaforte.cql    :refer :all]))

(let [session (client/connect ["127.0.0.1"])]
  (create-table session :users
                (column-definitions {:name :varchar
                                     :age  :int
                                     :city :varchar
                                     :primary-key [:name]})))
CREATE TABLE users
  (age int,
   name varchar,
   city varchar,
   PRIMARY KEY (name));

To insert a row in a table, use clojurewerkz.cassaforte.insert:

(ns cassaforte.docs
  (:require [clojurewerkz.cassaforte.client :as client]
            [clojurewerkz.cassaforte.cql    :refer :all]))

(let [session (client/connect ["127.0.0.1"])]
  (insert session "users" {:name "Alex" :age (int 19)}))

The example above will use the following CQL:

INSERT INTO "users" (name, age) VALUES ('Alex', 19);

Querying (SELECT)

The real power of CQL comes in querying. You can perform standard equality queries, IN queries, and range queries.

The examples used in this section need some data to be seeded to the "users" table first:

(ns cassaforte.docs
  (:require [clojurewerkz.cassaforte.client :as client]
            [clojurewerkz.cassaforte.cql    :refer :all]))

(let [session  (client/connect ["127.0.0.1"])
      table "users"]
  (insert session table {:name "Alex" :city "Munich" :age (int 19)})
  (insert session table {:name "Robert" :city "Berlin" :age (int 25)})
  (insert session table {:name "Sam" :city "San Francisco" :age (int 21)}))

The above code will execute the following CQL:

INSERT INTO "users" (name, city, age) VALUES ('Alex', 'Munich', 19);
INSERT INTO "users" (name, city, age) VALUES ('Robert', 'Berlin', 25);
INSERT INTO "users" (name, city, age) VALUES ('Sam', 'San Francisco', 21);

Querying All Rows

(ns cassaforte.docs
  (:require [clojurewerkz.cassaforte.client :as client]
            [clojurewerkz.cassaforte.cql    :refer :all]))

(let [session  (client/connect ["127.0.0.1"])
      table "users"]
  (select session table))
;; => [{:name "Robert", :age 25, :city "Berlin"}
;;     {:name "Alex", :age 19, :city "Munich"}
;;     {:name "Sam", :age 21, :city "San Francisco"}]

In CQL, the query above will look like this:

SELECT * FROM "users";

Querying With Equality Conditions (Equality Operator)

Next, query a user by name:

(ns cassaforte.docs
  (:require [clojurewerkz.cassaforte.client :as client]
            [clojurewerkz.cassaforte.cql    :refer :all]))

(let [session  (client/connect ["127.0.0.1"])
      table "users"]
  (select session table (where [[= :name "Alex"]])))
;; => [{:name "Alex", :age 19, :city "Munich"}]

The CQL executed this time will be

SELECT * FROM "users" WHERE name = 'Alex';

IN Queries

Next, query for rows that match any of the values given in a vector (so so-called IN query):

(ns cassaforte.docs
  (:require [clojurewerkz.cassaforte.client :as client]
            [clojurewerkz.cassaforte.cql    :refer :all]))

(let [session  (client/connect ["127.0.0.1"])
      table "users"]
  (select session table
          (where [[:in :name ["Alex" "Robert"]]])))
;; => [{:name "Alex", :age 19, :city "Munich"}
;;     {:name "Robert", :age 25, :city "Berlin"}]

The IN query is named after the CQL operator it uses:

SELECT * FROM "users" WHERE name IN ('Alex', 'Robert');

Sorting (ORDER BY)

Sorting and range queries in Cassandra have limitations compared to relational databases. Sorting is only possible when partition key is restricted by either exact match or IN. For example, having these user_posts:

(ns cassaforte.docs
  (:require [clojurewerkz.cassaforte.client :as client]
            [clojurewerkz.cassaforte.cql    :refer :all]))

(let [session  (client/connect ["127.0.0.1"])
      table    "users_posts"]
  (insert session "user_posts" {:username "Alex" :post_id "post1" :body "first post body"})
  (insert session "user_posts" {:username "Alex" :post_id "post2" :body "second post body"})
  (insert session "user_posts" {:username "Alex" :post_id "post3" :body "third post body"}))

You can't sort all the posts by post_id. But if you say that you want to get all the posts from user Alex and sort them by post_id, it's possible:

(ns cassaforte.docs
  (:require [clojurewerkz.cassaforte.client :as client]
            [clojurewerkz.cassaforte.cql    :refer :all]))

(let [session  (client/connect ["127.0.0.1"])
      table "users_posts"]
  ;; For brevity, we select :post_id column only
  (select session table
          (columns :post_id)
          (where [[= :username "Alex"]])
          (order-by [:post_id :desc])))

;; => [{:post_id "post3"}
;;     {:post_id "post2"}
;;     {:post_id "post1"}]

CQL used by the code above is quite straightforward:

SELECT post_id FROM "user_posts"
  WHERE username = 'Alex'
  ORDER BY post_id desc;

Range Queries

It is possible to use range queries to get a slice of data:

(ns cassaforte.docs
  (:require [clojurewerkz.cassaforte.client :as client]
            [clojurewerkz.cassaforte.cql    :refer :all]))

(let [session  (client/connect ["127.0.0.1"])
      table "users_posts"]
  ;; For brevity, we select :post_id column only
  (select session table
          (columns :post_id)
          (where [[= :username "Alex"]
                  [> :post_id "post1"]
                  [< :post_id "post3"]])))
;; => [{:post_id "post2"}]

will use

SELECT post_id FROM "user_posts"
  WHERE username = 'Alex'
    AND post_id > 'post1'
    AND post_id < 'post3';

LIMIT

In order to limit results of a query, use the limit clause:

(ns cassaforte.docs
  (:require [clojurewerkz.cassaforte.client :as client]
            [clojurewerkz.cassaforte.cql    :refer :all]))

(let [session  (client/connect ["127.0.0.1"])
      table "users_posts"]
  (select session table (limit 1)))
;; => [{:username "Alex", :post_id "post1", :body "first post body"}]

limit does what one would expect:

SELECT * FROM "user_posts" LIMIT 1;

Tuning Consistency

With Cassandra, it is possible to tune consistency level on a per-query basis. To do that, wrap a database call into clojurewerkz.cassaforte.policies/with-consistency-level:

Available consistency levels are:

  • :any: write must be written to at least one node. :any will suclienteed even if all replica nodes are down and a hinted handoff write was made. Although in that case write will not become readable until replica nodex for the given key recover
  • :one: write must be written to commit log and memory table of at least one replica node
  • :two: write must be written to commit log and memory table of at least two replica nodes
  • :three: write must be written to commit log and memory table of at least three replica nodes
  • :quorum: write must be written to commit log and memory table to quorum of replica nodes
  • :local-quorum: write must be written to commit log and memory table to quorum of replica nodes located in the same data center as the coordinator node
  • :each-quorum: write must be written to commit log and memory table to quorum of replica nodes in all data centers
  • :serial: achieves linearizable consistency for lightweight transactions by preventing unconditional updates
  • :local_serial: like :serial but confined to the local data center
  • :all: write must be written to commit log and memory table of all replica nodes for given key

Please refer to Cassandra documentation on consistency levels for more info.

The following operation will be performed with consistency level of :quorum:

(ns cassaforte.docs
  (:require [clojurewerkz.cassaforte.client   :as client]
            [clojurewerkz.cassaforte.policies :as cp]
            [clojurewerkz.cassaforte.cql      :refer :all]))

(let [r {:name "Alex" :city "Munich" :age (int 19)}]
    (client/execute *session*
                    "INSERT INTO users (name, city, age) VALUES ('Alex', 'Munich', 19);"
                    :consistency-level (cp/consistency-level :quorum))
    (is (= r (first (select *session* :users (limit 1))))))

Timestamp and TTL

Column values in Cassandra have timestamps associated with them. Even if a timestamp is not explicitly provided a timestamp, it is set by Cassandra internally. It is possible to see this with cqlsh or cassandra-cli, both of which ship with Cassandra:

> cassandra-cli
> list users;
RowKey: Alex
=> (column=, value=, timestamp=1369947837808000)
=> (column=age, value=00000013, timestamp=1369947837808000)
=> (column=city, value=4d756e696368, timestamp=1369947837808000)

You can see timestamp value set by Cassandra for every column in a row. In order to make a write with manually set timestamp, you should use (using :timestamp) clause in your query:

(ns cassaforte.docs
  (:require [clojurewerkz.cassaforte.client :as client]
            [clojurewerkz.cassaforte.cql    :refer :all]))

(let [session (client/connect ["127.0.0.1"])]
  (insert session :users {:name "Alex" :city "Munich" :age (int 19)}
          (using :timestamp (.getTime (java.util.Date.)))))
INSERT INTO users (name, city, age) VALUES ('Alex', 'Munich', 19) USING TIMESTAMP 1369948317602;

Note that when developing applications that rely on timestamp values, clock synchronization across the machines that run Cassandra clienst and nodes is mandatory.

Cassandra itself uses timestamps for conflict resolution. Column value with has higher timestamp will win over the record with lower timestamp in case of conflict. You can use arbitrary numbers for timestamps, but microseconds since Unix Epoch (1970) are used as a convention.

You can also specify optional TTL (Time To Live) for column values. If you do so, column values will expire after specified amount of time.

(ns cassaforte.docs
  (:require [clojurewerkz.cassaforte.client :as client]
            [clojurewerkz.cassaforte.cql    :refer :all]))

(let [session (client/connect ["127.0.0.1"])]
  (insert session :users {:name "Alex" :city "Munich" :age (int 19)}
    (using :ttl 60)))
INSERT INTO users (name, city, age) VALUES ('Alex', 'Munich', 19) USING TTL 60;

In this example, the inserted row (technically, all of its column values) will be deleted in 60 seconds.

When using TTL, you should remember that if you update record with (using :ttl) clause, column livespan will be reset, and counted from the moment of insert. Picking TTL values upfront is a good data modelling practice.

UUID Functions

clojurewerkz.cassaforte.uuids is a namespace that provides various functions for working with UUIDs, including :timeuuid columns:

(require '[clojurewerkz.cassaforte.uuids :as uuids])

(uuids/random)
;= #uuid "d43fdc16-a9c3-4d0f-8809-512115289537"

(uuids/time-based)
;= #uuid "90cf6f40-4584-11e3-90c2-65c7571b1a52"

(uuids/unix-timestamp (uuids/time-based))
;= 1383592179743

(u/start-of (u/unix-timestamp (u/time-based)))
;= #uuid "ad1fd130-4584-11e3-8080-808080808080"

(u/end-of (u/unix-timestamp (u/time-based)))
;= #uuid "b31abb3f-4584-11e3-7f7f-7f7f7f7f7f7f"

Prepared Statements

Prepared statements have same meaning as in relational databases. Server pases query once, and assigns a unique identifier, which is cached by clients for future references. Each time query is executed, only values are passed between client and server. This reduces an overhead of parsing query each time and amount of data sent over the network.

For example, a simple query to insert values to the table would be:

INSERT INTO users (name, city, age) VALUES ('Alex', 'Munich', 19);

Prepared query would keep ? placeholders instead of values:

INSERT INTO users (name, city, age) VALUES (?, ?, ?);

In order to execute a prepared query, you can use client/execute function:

(ns cassaforte.docs
  (:require [clojurewerkz.cassaforte.client :as client]
            [clojurewerkz.cassaforte.cql    :refer :all]))

(let [session (client/connect ["127.0.0.1"])]
      prepared (client/prepare session
                               (query/insert :users
                                             {:name ?
                                              :city ?
                                              :age  ?}))]
  (client/execute session
                  (client/bind prepared
                               {:name "Alex" :city "Munich" :age (int 19)})))

Paginating Through Results

Pagination with Cassandra can at times be less convenient than with relational databases. Fortunately, Cassaforte provides a convenience function clojurewerkz.cassaforte.iterate-table that is sufficient for many cases. This section first introduces the strategy used by Cassaforte and then provides and example of clojurewerkz.cassaforte.iterate-table at the end, so you may want to skip to that.

To paginate through contents of an entire table, it is common to use the so called "token strategy". Token-based pagination is based on every row having a special "token" field that sorting can be performed on. Then fetch a batch of rows, take the last one and to load the next page, use the token.

To demonstrate, consider the following table:

(ns cassaforte.docs
  (:require [clojurewerkz.cassaforte.client :as client]
            [clojurewerkz.cassaforte.cql    :refer :all]))

(let [session (client/connect ["127.0.0.1"])]
  (create-table session :users
                (column-definitions {:name :varchar
                                     :age  :int
                                     :city :varchar
                                     :primary-key [:name]})))
CREATE TABLE users (age int, name varchar, city varchar, PRIMARY KEY (name));

Add 100 entries to it:

(ns cassaforte.docs
  (:require [clojurewerkz.cassaforte.client :as client]
            [clojurewerkz.cassaforte.cql    :refer :all]))

(let [session  (client/connect ["127.0.0.1"])]
  (dotimes [i 100]
    (insert session :users {:name (str "name_" i) :city (str "city" i) :age (int i)})))

Get the first page:

(ns cassaforte.docs
  (:require [clojurewerkz.cassaforte.client :as client]
            [clojurewerkz.cassaforte.cql    :refer :all]))

(let [session (client/connect ["127.0.0.1"])]
  (select session :users (limit 10)))
SELECT * FROM users LIMIT 10;

This will return us first 10 rows but in random order. This happens because ordering is only possible when partition key is restricted by one of the equality operators.

To load the next page ordered, get the name (which is a partition key value in that case) of the last user in the resulting collection. Say the value was name_53. In order to get the next page, you should use token function:

(ns cassaforte.docs
  (:require [clojurewerkz.cassaforte.client :as client]
            [clojurewerkz.cassaforte.cql    :refer :all]))

(let [session (client/connect ["127.0.0.1"])]
  (select session :users
    (where (token :name) [> (token "name_53")])
    (limit 10)))
SELECT * FROM users WHERE token(name) > token('name_53') LIMIT 10;

This will return next page in the desired order.

Cassaforte provides a convenience function clojurewerkz.cassaforte.iterate-table, which uses lazy sequences to implement the algorithm described above.

In the example below, we iterate over users collection, using name as a partition key, and get 10 results per page:

(ns cassaforte.docs
  (:require [clojurewerkz.cassaforte.client :as client]
            [clojurewerkz.cassaforte.cql    :refer :all]))

(let [session (client/connect ["127.0.0.1"])]
  (iterate-table session :users [[= :name 10]]))

Range Queries

In case you use compound keys, you have can perform range queries efficiently. Here, you can "lock" your partition key using IN or equality operator = and perform range queries on the results. It is possible, because Cassandra stores all entries with same partition key on same node, which guarantees good performance when retrieving records.

Consider a tv_series table, which will use a compound key. Partition key will be series_title, episode_id will also be part of the key:

(ns cassaforte.docs
  (:require [clojurewerkz.cassaforte.client :as client]
            [clojurewerkz.cassaforte.cql    :refer :all]))

(let [session (client/connect ["127.0.0.1"])]
  (create-table session :tv_series
                (column-definitions {:series_title  :varchar
                                     :episode_id    :int
                                     :episode_title :text
                                     :primary-key [:series_title :episode_id]})))
CREATE TABLE tv_series (episode_title text,
                        series_title varchar,
                        episode_id int,
                        PRIMARY KEY (series_title, episode_id));

Populate the table:

(ns cassaforte.docs
  (:require [clojurewerkz.cassaforte.client :as client]
            [clojurewerkz.cassaforte.cql    :refer :all]))

(let [session (client/connect ["127.0.0.1"])]
  (dotimes [i 20]
    (insert session :tv_series {:series_title "Futurama" :episode_id i :episode_title (str "Futurama Title " i)})
    (insert session :tv_series {:series_title "Simpsons" :episode_id i :episode_title (str "Simpsons Title " i)})))

If you lock partition key by using equality WHERE series_title = 'Futurama' or IN operator: WHERE series_title IN ('Futurama', 'Simpsons'), you can perform range queries on episode_id (which is a second part of compound key):

(ns cassaforte.docs
  (:require [clojurewerkz.cassaforte.client :as client]
            [clojurewerkz.cassaforte.cql    :refer :all]))

(let [session (client/connect ["127.0.0.1"])]
  (select session :tv_series
          (where [[:in :series_title ["Futurama" "Simpsons"]]
                  [>   :episode_id 10]])))
SELECT * FROM tv_series WHERE series_title IN ('Futurama', 'Simpsons') AND episode_id > 10;

In the same manner, you can use >=, >, < and <= operators for performing range queries. In addition, you can query for a closed range (from .. to):

(ns cassaforte.docs
  (:require [clojurewerkz.cassaforte.client :as client]
            [clojurewerkz.cassaforte.cql    :refer :all]))

(let [session (client/connect ["127.0.0.1"])]
  (select session :tv_series
          (where [[=  :series_title "Futurama"]
                  [>  :episode_id 10]
                  [<= :episode_id 15]])))
SELECT * FROM tv_series WHERE series_title = 'Futurama' AND episode_id > 10 AND episode_id <= 15;

Sorting Results

When partition key is used in query condition, you can also run queries with ORDER BY clause, which will order results by any part of the key except for the partition key:

(ns cassaforte.docs
  (:require [clojurewerkz.cassaforte.client :as client]
            [clojurewerkz.cassaforte.cql    :refer :all]))

(let [session (client/connect ["127.0.0.1"])]
  (select session :tv_series
          (where {:series_title "Futurama"})
          (order-by :episode_id)))
SELECT * FROM tv_series
  WHERE series_title = 'Futurama'
  ORDER BY episode_id;

Filtering

By default, Cassandra disallows potentially expensive queries, that involve data filtering on the server side. That is done to run queries with predictable performance, which is proportional to the amount of data returned from the server.

It's required to say that, depending on a dataset size, allowing filtering may hurt performance.

For this example, let's use the users table described aboe, and add index on age and city to it:

CREATE TABLE users
  (age int,
   name varchar,
   city varchar,
   PRIMARY KEY (name));
(ns cassaforte.docs
  (:require [clojurewerkz.cassaforte.client :as client]
            [clojurewerkz.cassaforte.cql    :refer :all]))

(let [session (client/connect ["127.0.0.1"])]
  (create-index session :users :age)
  (create-index session :users :city))
CREATE INDEX ON users (age);
CREATE INDEX ON users (city);

Now, it is possible to query for all users of certain age living in a certain city using ALLOW FILTERING clause:

(ns cassaforte.docs
  (:require [clojurewerkz.cassaforte.client :as client]
            [clojurewerkz.cassaforte.cql    :refer :all]))

(let [session (client/connect ["127.0.0.1"])]
  (select session :users
          (where [[= :city "Munich"]
                  [> :age  (int 5)]])
          (allow-filtering)))
SELECT * FROM users WHERE city = 'Munich' AND age > 5 ALLOW FILTERING;

Collection Columns

Cassandra tables can have collection columns, that is, columns of types list, map, and set. To define them with Cassaforte, use qbits.hayt.utils/list-type, qbits.hayt.utils/map-type, and qbits.hayt.utils/set-type, respectively:

(ns cassaforte.docs
  (:require [clojurewerkz.cassaforte.client :as client]
            [clojurewerkz.cassaforte.cql    :refer :all]))

(let [session (client/connect ["127.0.0.1"])]
  (create-table session :thingies
                (column-definitions {:name :varchar
                                     :test_map  (map-type :varchar :varchar)
                                     :test_set  (set-type :int)
                                     :test_list (list-type :varchar)
                                     :primary-key [:name]})))

When data is loaded from Cassandra, Cassaforte will convert the types to their respective immutable Clojure counterparts.

To add an entry to a map, use the + operator and a Clojure map:

(let [session (client/connect ["127.0.0.1"])]
  (update :foo
          {:test_map (put-values
                      (array-map "key1" value1
                                 "key2" value2))}
          (where :name "thingie1")))

You can also use a singular (put-value k v) and (remove-all-tail k1 k2 k3 ...) in a similar fashion.

Similarly, to append a value to a list column:

(let [session (client/connect ["127.0.0.1"])]
  (update session :thingies
          {:test_list (append "value1")}
          (where :name "thingie1")))

Similarly, you can use (prepend v), (discard idx), ()discard-all), (append-all [v1 v2 v3]) and (set-idx idx value) in a similar fashion.

Counters

Cassandra supports counter columns (also known as distributed counters). A Counter column provides an efficient way to count or sum integer values. It is achieved by using atomic increment/decrement operations on column values.

Counter is a special column type, whose value is a 64-bit (signed) integer. On write, new value is added (or subtracted) to previous counter value. It should be noted that usual consistency/availability tradeoffs apply to counter operations. In order to perform a counter update, Cassandra has to perform a read before write in a background, therefore updates on counters are slightly slower than regular updates.

Consider a table (user_counters) with counter column, and name key for counting user-specific operations, such as amount of operations performed by the user:

(ns cassaforte.docs
  (:require [clojurewerkz.cassaforte.client :as client]
            [clojurewerkz.cassaforte.cql    :refer :all]))

(let [session (client/connect ["127.0.0.1"])]
  (create-table session :user_counters
                (column-definitions {:name :varchar
                                     :user_count  :counter
                                     :primary-key [:name]})))
CREATE TABLE user_counters
  (name varchar,
   user_count counter,
   PRIMARY KEY (name));

In order to modify (increment or decrement) counter, you can use the following DSL syntax:

(ns cassaforte.docs
  (:require [clojurewerkz.cassaforte.client :as client]
            [clojurewerkz.cassaforte.cql    :refer :all]))

(let [session (client/connect ["127.0.0.1"])]
  (update session :user_counters
          {:user_count (increment-by 5)}
          (where :name "user1"))

  (update session :user_counters
          {:user_count (decrement-by 5)}
          (where :name "user1")))

Which will execute following CQL queries, correspondingly:

UPDATE user_counters SET user_count = user_count + 5 WHERE name = 'asd';
UPDATE user_counters SET user_count = user_count - 5 WHERE name = 'asd';

Wrapping Up

Cassaforte provides a nice way to use CQL with Cassandra. You can manipulate insert rows, perform queries, update data, delete data, use distributed counters.

The rest of this documentation covers more features Cassaforte and Cassandra provide.

Tell Us What You Think!

Please take a moment to tell us what you think about this guide on Twitter or the Cassaforte mailing list

Let us know what was unclear or what has not been covered. Maybe you do not like the guide style or grammar or discover spelling mistakes. Reader feedback is key to making the documentation better.