Change data capture (CDC) detects row-level data changes in CockroachDB and emits those changes as messages for downstream processing. While CockroachDB is an excellent system of record, CDC allows it to integrate with other systems in your data ecosystem.
For example, you might want to:
- Stream messages to Kafka to trigger notifications in an application.
- Mirror your data in full-text indexes, analytics engines, or big data pipelines.
- Export a snapshot of tables to backfill new applications.
- Feed updates to data stores powering machine learning models.
The main feature of CockroachDB CDC is the changefeed, which targets an allowlist of tables, known as watched tables.
Stream row-level changes with changefeeds
Changefeeds are customizable jobs that monitor row-level changes in a table and emit updates in real time. These updates are delivered in your preferred format to a specified destination, known as a sink.
In production, changefeeds are typically configured with an external sink such as Kafka or cloud storage. However, for development and testing purposes, sinkless changefeeds allow you to stream change data directly to your SQL client.
Each emitted row change is delivered at least once, and the first emit of every event for the same key is ordered by timestamp.
Sinkless changefeeds | Changefeeds | |
---|---|---|
Use case | Useful for prototyping or quick testing. | Recommended for production use. |
Product availability | All products | All products |
Message delivery | Streams indefinitely until underlying SQL connection is closed. | Maintains connection to configured sink. |
SQL statement | Create with CREATE CHANGEFEED FOR TABLE table_name; |
Create with CREATE CHANGEFEED FOR TABLE table_name INTO 'sink'; |
Targets | Watches one or multiple tables in a comma-separated list. | Watches one or multiple tables in a comma-separated list. |
Filter change data | Use CDC queries to define the emitted change data. | Use CDC queries to define the emitted change data. |
Schedule changefeeds | Not supported | Create a scheduled changefeed with CREATE SCHEDULE FOR CHANGEFEED . |
Job execution locality | Not supported | Use execution_locality to determine the node locality for changefeed job execution. |
Message format | Emits every change to a "watched" row as a record to the current SQL session. | Emits every change to a "watched" row as a record in a configurable format. |
Management | Create the changefeed and cancel by closing the SQL connection. | Manage changefeed with CREATE , PAUSE , RESUME , ALTER , and CANCEL . |
Monitoring | Not supported | Metrics available to monitor in the DB Console and Prometheus. Job observability with SHOW CHANGEFEED JOBS . |
Get started with changefeeds
To get started with changefeeds in CockroachDB, refer to:
- Create and Configure Changefeeds: Learn about the fundamentals of using SQL statements to create and manage changefeeds.
- Changefeed Sinks: The downstream system to which the changefeed emits changes. Learn about the supported sinks and configuration capabilities.
- Changefeed Messages: The change events that emit from the changefeed. Learn about how messages are ordered and the options to configure and format messages.
- Changefeed Examples: Step-by-step examples for connecting to changefeed sinks or running sinkless changefeeds.
Authenticate to your changefeed sink
To send changefeed messages to a sink, it is necessary to provide the CREATE CHANGEFEED
statement with authentication credentials.
The following pages detail the supported authentication:
Sink | Authentication page |
---|---|
Cloud Storage | Refer to Cloud Storage Authentication for detail on setting up:
|
Kafka | Refer to:
|
Webhook | Refer to:
|
Google Cloud Pub/Sub | Refer to:
|
Monitor your changefeed job
It is a best practice to monitor your changefeed jobs for behavior such as failures and retries.
You can use the following tools for monitoring:
- The Changefeed Dashboard on the DB Console
- The
SHOW CHANGEFEED JOBS
statement - Changefeed metrics labels
Refer to the Monitor and Debug Changefeeds page for recommendations on metrics to track.
For detail on how protected timestamps and garbage collection interact with changefeeds, refer to Protect Changefeed Data from Garbage Collection.
Optimize a changefeed for your workload
Filter your change data with CDC queries
Change data capture queries allow you to define and filter the change data emitted to your sink when you create an changefeed.
For example, you can use CDC queries to:
- Filter out rows and columns from changefeed messages to decrease the load on your downstream sink.
- Modify data before it emits to reduce the time and operational burden of filtering or transforming data downstream.
- Stabilize or customize the schema of your changefeed messages for increased compatibility with external systems.
Refer to the Change Data Capture Queries page for more example use cases.
Use changefeeds to export a table
Changefeeds can export a single table scan to your sink. The benefits of using changefeeds for exports include: job management, observability, and sink configurability. You can also schedule changefeeds to export tables, which may be useful to avoid table scans during peak periods.
For examples and more detail, refer to:
Determine the nodes running a changefeed by locality
CockroachDB supports an option to set locality filter requirements that nodes must meet in order to take part in a changefeed job. This is helpful in multi-region clusters to ensure the nodes that are physically closest to the sink emit changefeed messages. For syntax and further technical detail, refer to Run a changefeed job by locality.