Palash Chauhan
← Phoenix Features

TTL

· Part 3 of 4 in Phoenix Features phoenix ttl phoenix-features

Most data does not live forever. Sessions go stale, logs age out, soft-deleted records should eventually vanish. Phoenix TTL lets you declare that expiry as a table property instead of running your own cleanup jobs.

A few things hold across every flavor of it:

  • Expiry is per row. All of a row’s columns expire together, so you never see a half-expired row.
  • Any write resets the clock. An UPSERT to a row re-arms its TTL; there is no separate touch operation.
  • Expired rows leave in two stages. They stop showing up in queries right away, and are physically removed later, during major compaction.

The TTL property takes a number of seconds, FOREVER, NONE, or a boolean SQL expression. Those last two map onto the three granularities below.

Table-level TTL

The simplest case: one TTL for every row in the table.

CREATE TABLE events (...) TTL = 604800;  -- one week

For a plain numeric TTL like this, Phoenix also copies the value onto the underlying HBase table, so older clients and non-Phoenix HBase tools see the same expiry.

View-level TTL

Recall from the views post that many views can share one physical table. Each of those views can set its own TTL, so a single table holds data with completely different retention policies:

CREATE VIEW orders   AS SELECT * FROM shared WHERE kind = 'O' TTL = 2592000; -- 30 days
CREATE VIEW audit    AS SELECT * FROM shared WHERE kind = 'A' TTL = 'FOREVER';
CREATE VIEW sessions AS SELECT * FROM shared WHERE kind = 'S' TTL = 3600;    -- 1 hour
flowchart TB
  subgraph views ["Views over one shared table"]
    o["orders view: TTL 30 days"]
    a["audit view: TTL forever"]
    s["sessions view: TTL 1 hour"]
  end
  base["one shared HBase table"]
  o --> base
  a --> base
  s --> base

At read and compaction time, Phoenix works out which view each row belongs to and applies that view’s TTL. You only set the property; nothing else changes. TTL on views is allowed only on updatable views, and is gated by a cluster flag.

Conditional TTL

Sometimes expiry is not a fixed age at all, but a condition on the row itself. Conditional TTL lets you write that rule as a boolean SQL expression: a row is expired exactly when the expression is true.

-- expire a row once it is cancelled
CREATE TABLE orders (...) TTL = 'status = ''CANCELLED''';

-- or give each row its own expiry time
CREATE TABLE orders (...) TTL = 'expires_at < CURRENT_TIME()';

With the first rule, expiry simply follows the data:

order_idstatusTTL says
4001openkept
4002cancelledexpired
4003shippedkept

This is closer to DynamoDB-style TTL than to a fixed clock: you decide what expired means. A few rules come with it. The expression must be boolean and cannot use aggregates, it is re-evaluated on every read (so CURRENT_TIME() is the query’s time, not the time you wrote the DDL), and you cannot drop a column the expression references.

What stands out

A few of these choices are unusual across databases:

  • Conditional TTL. Most engines expire purely by age. Expressing expiry as an arbitrary boolean over the row’s own data is rare.
  • Per-view TTL. Giving different logical views over one shared table their own retention is something most engines can only fake by splitting into separate tables.
  • Row-grained expiry. A row expires whole, so you avoid the partial-row problem that cell-level expiry can cause, where some columns vanish while the rest of the row lingers.

Further reading