◾️はじめに
https://dk521123.hatenablog.com/entry/2025/05/26/132317
の続き。 dbtのsnapshot機能の座学はわかったので 今回は、dbt snapshotコマンドを実行して Hello world的なことをやってみる
目次
【1】前提条件 1)開発環境 【2】Hello world 0)snapshot対象のテーブルを用意 1)対象テーブルをSourceとして定義 2)snapshots配下にYAMLファイルを追加 3)dbt snapshotコマンド(1回目)を実行 4)対象テーブルを更新する 5)dbt snapshotコマンド(2回目)を実行 【3】使用上の注意 1)DELETEがある場合「hard_deletes」の指定を検討する 【4】snapshotの設定値 1)strategy 2)updated_at 3)check_cols 4)dbt_valid_to_current 5)hard_deletes (旧:invalidate_hard_deletes) 6)snapshot_meta_column_names 【5】dbt snapshotコマンド
【1】前提条件
* DB/dbtの環境設定ができている * dbt project作成済みであること
dbt ~ 環境設定編 ~
https://dk521123.hatenablog.com/entry/2023/12/16/152147
1)開発環境
Items | Values |
---|---|
DB | PostgreSQL |
DBT | v1.9.4 |
【2】Hello world
* 公式ドキュメントの「Add a snapshot to your project」をベースに作成
https://docs.getdbt.com/docs/build/snapshots#configuring-snapshots
0)snapshot対象のテーブルを用意
CREATE TABLE public.users ( user_id INT PRIMARY KEY, user_name TEXT NOT NULL, address TEXT, created_at TIMESTAMP NOT NULL, updated_at TIMESTAMP NOT NULL ); INSERT INTO public.users (user_id, user_name, address, created_at, updated_at) VALUES (1, 'Mike', 'USA', now(), now()), (2, 'Tom', 'UK', now(), now());
1)対象テーブルをSourceとして定義
/models/schema.yml
sources: - name: demo_source database: dbt_db schema: public tables: - name: users
2)snapshots配下にYAMLファイルを追加
/snapshots/users_snapshot.yaml
snapshots: - name: users_snapshot relation: source('demo_source', 'users') config: schema: public database: dbt_db unique_key: user_id strategy: timestamp updated_at: updated_at # Specifies that current records should have `dbt_valid_to` set to `'9999-12-31'` instead of `NULL`. dbt_valid_to_current: "TO_TIMESTAMP('9999-12-31 23:59:59', 'YYYY-MM-DD HH24:MI:SS')" # !重要! hard_deletes: 'invalidate'
3)dbt snapshotコマンド(1回目)を実行
$ dbt snapshot 15:05:02 Running with dbt=1.9.4 15:05:02 Registered adapter: postgres=1.9.0 15:05:03 Found 4 models, 4 data tests, 1 snapshot, 1 source, 434 macros 15:05:03 15:05:03 Concurrency: 1 threads (target='dev') 15:05:03 15:05:03 1 of 1 START snapshot hello_public.users_snapshot .............................. [RUN] 15:05:03 1 of 1 OK snapshotted hello_public.users_snapshot .............................. [SELECT 2 in 0.12s] 15:05:03 15:05:03 Finished running 1 snapshot in 0 hours 0 minutes and 0.43 seconds (0.43s). 15:05:03 15:05:03 Completed successfully 15:05:03 15:05:03 Done. PASS=1 WARN=0 ERROR=0 SKIP=0 TOTAL=1
dbt snapshotコマンド後
select * from hello_public.users_snapshot;
user_id | user_name | address | created_at | updated_at | dbt_scd_id | dbt_updated_at | dbt_valid_from | dbt_valid_to |
---|---|---|---|---|---|---|---|---|
1 | Mike | USA | 2025-05-26 14:09:54.444133 | 2025-05-26 14:09:54.444133 | ebfae4... | 2025-05-26 14:09:54.444133 | 2025-05-26 14:09:54.444133 | 9999-12-31 23:59:59+00 |
2 | Tom | UK | 2025-05-26 14:09:54.444133 | 2025-05-26 14:09:54.444133 | f38a5... | 2025-05-26 14:09:54.444133 | 2025-05-26 14:09:54.444133 | 9999-12-31 23:59:59+00 |
4)対象テーブルを更新する
INSERT INTO public.users (user_id, user_name, address, created_at, updated_at) VALUES (3, 'Kevin', 'Australia', now(), now()); UPDATE public.users SET address = 'New Zealand', updated_at = now() WHERE user_id = 2; DELETE FROM public.users WHERE user_id = 1;
5)dbt snapshotコマンド(2回目)を実行
dbt snapshot 15:33:57 Running with dbt=1.9.4 15:33:58 Registered adapter: postgres=1.9.0 15:33:58 Found 3 models, 4 data tests, 1 snapshot, 1 source, 434 macros 15:33:58 15:33:58 Concurrency: 1 threads (target='dev') 15:33:58 15:33:58 1 of 1 START snapshot hello_public.users_snapshot .............................. [RUN] 15:33:58 1 of 1 OK snapshotted hello_public.users_snapshot .............................. [INSERT 0 2 in 0.20s] 15:33:58 15:33:58 Finished running 1 snapshot in 0 hours 0 minutes and 0.50 seconds (0.50s). 15:33:58 15:33:58 Completed successfully 15:33:58 15:33:58 Done. PASS=1 WARN=0 ERROR=0 SKIP=0 TOTAL=1
dbt snapshotコマンド後
select * from hello_public.users_snapshot;
user_id | user_name | address | created_at | updated_at | dbt_scd_id | dbt_updated_at | dbt_valid_from | dbt_valid_to |
---|---|---|---|---|---|---|---|---|
1 | Mike | USA | 2025-05-26 14:09:54.444133 | 2025-05-26 14:09:54.444133 | ebfae4... | 2025-05-26 14:09:54.444133 | 2025-05-26 14:09:54.444133 | 2025-05-26 15:33:45.152848+00 |
2 | Tom | UK | 2025-05-26 14:09:54.444133 | 2025-05-26 14:09:54.444133 | f38a5f0... | 2025-05-26 14:09:54.444133 | 2025-05-26 14:09:54.444133 | 2025-05-26 15:33:45.152848+00 |
3 | Kevin | Australia | 2025-05-26 15:33:45.152848 | 2025-05-26 15:33:45.152848 | 76765f... | 2025-05-26 15:33:45.152848 | 2025-05-26 15:33:45.152848 | 9999-12-31 23:59:59+00 |
2 | Tom | New Zealand | 2025-05-26 14:09:54.444133 | 2025-05-26 15:33:45.152848 | 1a29d... | 2025-05-26 15:33:45.152848 | 2025-05-26 15:33:45.152848 | 9999-12-31 23:59:59+00 |
【3】使用上の注意
1)DELETEがある場合「hard_deletes」の指定を検討する
(Hello worldをやって気づけたが) hard_deletesがない状態で、DELETE文が走っても 何も変化しない(hard_deletes: 'ignore'がデフォルトのため) => 詳細は、以下のサイトを参照
https://zenn.dev/tenajima/articles/209694b892ef6d
【4】snapshotの設定値
https://docs.getdbt.com/docs/build/snapshots#configuring-snapshots
snapshots: - name: string relation: relation # source('my_source', 'my_table') or ref('my_model') description: markdown_string config: database: string schema: string alias: string strategy: timestamp | check unique_key: column_name_or_expression check_cols: [column_name] | all updated_at: column_name snapshot_meta_column_names: dictionary dbt_valid_to_current: string hard_deletes: ignore | invalidate | new_record
1)strategy
* timestamp or check を選択 => 詳細は、以下の関連記事の「【3】スナップショットする際の種類」を参照
https://dk521123.hatenablog.com/entry/2025/05/26/132317
https://docs.getdbt.com/reference/resource-configs/unique_key
snapshots: <resource-path>: +strategy: timestamp | check
2)updated_at
* 「strategy: timestamp」時に使用 * 対象テーブルの更新日時のカラム名を指定
https://docs.getdbt.com/reference/resource-configs/updated_at
3)check_cols
* 「strategy: check」時に使用 * check戦略時で変更対象カラム名(複数可)を指定 * 全カラムだったら「check_cols: all」を指定
https://docs.getdbt.com/reference/resource-configs/check_cols
snapshots: - name: orders_snapshot_check relation: source('jaffle_shop', 'orders') config: schema: snapshots unique_key: id strategy: check check_cols: - status - is_cancelled
4)dbt_valid_to_current
* dbt_valid_to のデフォルト値を指定(例: 未来の日付) * デフォルトは NULL
https://docs.getdbt.com/reference/resource-configs/dbt_valid_to_current
snapshots: - name: my_snapshot config: strategy: timestamp updated_at: updated_at dbt_valid_to_current: "to_date('9999-12-31')"
5)hard_deletes (旧:invalidate_hard_deletes)
* From v1.9 * 削除時の振る舞いを指定
https://docs.getdbt.com/reference/resource-configs/hard-deletes
snapshots: - name: <snapshot_name> config: hard_deletes: 'ignore' | 'invalidate' | 'new_record'
Methods | Description |
---|---|
ignore (Default) | Deleteされても何もしない |
invalidate | Deleteされた場合、dbt_valid_toに値を反映 |
new_record | Deleteされた場合、履歴に新しいレコードとして追加される |
invalidate_hard_deletes
* 以下は、既に非推奨(~v1.8) * invalidate_hard_deletes=trueは、「hard_deletes: 'invalidate'」と同じ
https://docs.getdbt.com/reference/resource-configs/invalidate_hard_deletes
6)snapshot_meta_column_names
* From v1.9 * 動追加されるメタフィールド(例: dbt_valid_from)の名前を変更
https://docs.getdbt.com/reference/resource-configs/snapshot_meta_column_names
snapshots: - name: <snapshot_name> config: snapshot_meta_column_names: dbt_valid_from: <string> dbt_valid_to: <string> dbt_scd_id: <string> dbt_updated_at: <string> dbt_is_deleted: <string>
【5】dbt snapshotコマンド
* 基本、オプションなしで使えそうだが、、、
https://docs.getdbt.com/reference/commands/snapshot
dbt snapshot [-h] [--profiles-dir PROFILES_DIR] [--profile PROFILE] [--target TARGET] [--vars VARS] [--bypass-cache] [--threads THREADS] [--select SELECTOR [SELECTOR ...]] [--exclude EXCLUDE [EXCLUDE ...]] optional arguments: --select SELECTOR [SELECTOR ...] Specify the snapshots to include in the run. --exclude EXCLUDE [EXCLUDE ...] Specify the snapshots to exclude in the run.
参考文献
https://qiita.com/imaik_/items/d470b3b98e15a1e36f63
https://qiita.com/hanon52_/items/6b1ed1a5247d0cac759f
関連記事
dbt ~ 基礎知識編 ~
https://dk521123.hatenablog.com/entry/2023/06/30/000000
dbt ~ 環境設定編 ~
https://dk521123.hatenablog.com/entry/2023/12/16/152147
dbt ~ 入門編 ~
https://dk521123.hatenablog.com/entry/2023/05/30/151003
dbt CLI ~ 入門編 ~
https://dk521123.hatenablog.com/entry/2024/07/21/234811