【dbt】dbt CLI ~ dbt snapshot ~

◾️はじめに

https://dk521123.hatenablog.com/entry/2025/05/26/132317

の続き。

dbtのsnapshot機能の座学はわかったので
今回は、dbt snapshotコマンドを実行して
Hello world的なことをやってみる

目次

【1】前提条件
 1)開発環境
【2】Hello world
 0)snapshot対象のテーブルを用意
 1)対象テーブルをSourceとして定義
 2)snapshots配下にYAMLファイルを追加
 3)dbt snapshotコマンド(1回目)を実行
 4)対象テーブルを更新する
 5)dbt snapshotコマンド(2回目)を実行
【3】使用上の注意
 1)DELETEがある場合「hard_deletes」の指定を検討する
【4】snapshotの設定値
 1)strategy
 2)updated_at
 3)check_cols
 4)dbt_valid_to_current
 5)hard_deletes (旧:invalidate_hard_deletes)
 6)snapshot_meta_column_names
【5】dbt snapshotコマンド

【1】前提条件

* DB/dbtの環境設定ができている
* dbt project作成済みであること

dbt ~ 環境設定編 ~
https://dk521123.hatenablog.com/entry/2023/12/16/152147

1)開発環境

Items Values
DB PostgreSQL
DBT v1.9.4

【2】Hello world

* 公式ドキュメントの「Add a snapshot to your project」をベースに作成

https://docs.getdbt.com/docs/build/snapshots#configuring-snapshots

0)snapshot対象のテーブルを用意

CREATE TABLE public.users (
    user_id INT PRIMARY KEY,
    user_name TEXT NOT NULL,
    address TEXT,
    created_at TIMESTAMP NOT NULL,
    updated_at TIMESTAMP NOT NULL
);

INSERT INTO public.users (user_id, user_name, address, created_at, updated_at)
VALUES 
    (1, 'Mike', 'USA', now(), now()),
    (2, 'Tom', 'UK', now(), now());

1)対象テーブルをSourceとして定義

/models/schema.yml

sources:
  - name: demo_source
    database: dbt_db
    schema: public
    tables:
      - name: users

2)snapshots配下にYAMLファイルを追加

/snapshots/users_snapshot.yaml

snapshots:
  - name: users_snapshot
    relation: source('demo_source', 'users')
    config:
      schema: public
      database: dbt_db
      unique_key: user_id
      strategy: timestamp
      updated_at: updated_at
      # Specifies that current records should have `dbt_valid_to` set to `'9999-12-31'` instead of `NULL`.
      dbt_valid_to_current: "TO_TIMESTAMP('9999-12-31 23:59:59', 'YYYY-MM-DD HH24:MI:SS')"
      # !重要!
      hard_deletes: 'invalidate'

3)dbt snapshotコマンド(1回目)を実行

$ dbt snapshot
15:05:02  Running with dbt=1.9.4
15:05:02  Registered adapter: postgres=1.9.0
15:05:03  Found 4 models, 4 data tests, 1 snapshot, 1 source, 434 macros
15:05:03  
15:05:03  Concurrency: 1 threads (target='dev')
15:05:03  
15:05:03  1 of 1 START snapshot hello_public.users_snapshot .............................. [RUN]
15:05:03  1 of 1 OK snapshotted hello_public.users_snapshot .............................. [SELECT 2 in 0.12s]
15:05:03  
15:05:03  Finished running 1 snapshot in 0 hours 0 minutes and 0.43 seconds (0.43s).
15:05:03  
15:05:03  Completed successfully
15:05:03  
15:05:03  Done. PASS=1 WARN=0 ERROR=0 SKIP=0 TOTAL=1

dbt snapshotコマンド後

select * from hello_public.users_snapshot;
user_id user_name address created_at updated_at dbt_scd_id dbt_updated_at dbt_valid_from dbt_valid_to
1 Mike USA 2025-05-26 14:09:54.444133 2025-05-26 14:09:54.444133 ebfae4... 2025-05-26 14:09:54.444133 2025-05-26 14:09:54.444133 9999-12-31 23:59:59+00
2 Tom UK 2025-05-26 14:09:54.444133 2025-05-26 14:09:54.444133 f38a5... 2025-05-26 14:09:54.444133 2025-05-26 14:09:54.444133 9999-12-31 23:59:59+00

4)対象テーブルを更新する

INSERT INTO public.users (user_id, user_name, address, created_at, updated_at)
VALUES 
    (3, 'Kevin', 'Australia', now(), now());

UPDATE public.users
SET address = 'New Zealand', updated_at = now()
WHERE user_id = 2;

DELETE FROM public.users
WHERE user_id = 1;

5)dbt snapshotコマンド(2回目)を実行

dbt snapshot
15:33:57  Running with dbt=1.9.4
15:33:58  Registered adapter: postgres=1.9.0
15:33:58  Found 3 models, 4 data tests, 1 snapshot, 1 source, 434 macros
15:33:58  
15:33:58  Concurrency: 1 threads (target='dev')
15:33:58  
15:33:58  1 of 1 START snapshot hello_public.users_snapshot .............................. [RUN]
15:33:58  1 of 1 OK snapshotted hello_public.users_snapshot .............................. [INSERT 0 2 in 0.20s]
15:33:58  
15:33:58  Finished running 1 snapshot in 0 hours 0 minutes and 0.50 seconds (0.50s).
15:33:58  
15:33:58  Completed successfully
15:33:58  
15:33:58  Done. PASS=1 WARN=0 ERROR=0 SKIP=0 TOTAL=1

dbt snapshotコマンド後

select * from hello_public.users_snapshot;
user_id user_name address created_at updated_at dbt_scd_id dbt_updated_at dbt_valid_from dbt_valid_to
1 Mike USA 2025-05-26 14:09:54.444133 2025-05-26 14:09:54.444133 ebfae4... 2025-05-26 14:09:54.444133 2025-05-26 14:09:54.444133 2025-05-26 15:33:45.152848+00
2 Tom UK 2025-05-26 14:09:54.444133 2025-05-26 14:09:54.444133 f38a5f0... 2025-05-26 14:09:54.444133 2025-05-26 14:09:54.444133 2025-05-26 15:33:45.152848+00
3 Kevin Australia 2025-05-26 15:33:45.152848 2025-05-26 15:33:45.152848 76765f... 2025-05-26 15:33:45.152848 2025-05-26 15:33:45.152848 9999-12-31 23:59:59+00
2 Tom New Zealand 2025-05-26 14:09:54.444133 2025-05-26 15:33:45.152848 1a29d... 2025-05-26 15:33:45.152848 2025-05-26 15:33:45.152848 9999-12-31 23:59:59+00

【3】使用上の注意

1)DELETEがある場合「hard_deletes」の指定を検討する

(Hello worldをやって気づけたが)
hard_deletesがない状態で、DELETE文が走っても
何も変化しない(hard_deletes: 'ignore'がデフォルトのため)
 => 詳細は、以下のサイトを参照

https://zenn.dev/tenajima/articles/209694b892ef6d

【4】snapshotの設定値

https://docs.getdbt.com/docs/build/snapshots#configuring-snapshots

snapshots:
  - name: string
    relation: relation # source('my_source', 'my_table') or ref('my_model')
    description:  markdown_string
    config:
      database: string
      schema: string
      alias: string
      strategy: timestamp | check
      unique_key: column_name_or_expression
      check_cols: [column_name] | all
      updated_at: column_name
      snapshot_meta_column_names: dictionary
      dbt_valid_to_current: string
      hard_deletes: ignore | invalidate | new_record 

1)strategy

* timestamp or check を選択
 => 詳細は、以下の関連記事の「【3】スナップショットする際の種類」を参照

https://dk521123.hatenablog.com/entry/2025/05/26/132317

https://docs.getdbt.com/reference/resource-configs/unique_key

snapshots:
  <resource-path>:
    +strategy: timestamp | check

2)updated_at

* 「strategy: timestamp」時に使用
* 対象テーブルの更新日時のカラム名を指定

https://docs.getdbt.com/reference/resource-configs/updated_at

3)check_cols

* 「strategy: check」時に使用
* check戦略時で変更対象カラム名(複数可)を指定
* 全カラムだったら「check_cols: all」を指定

https://docs.getdbt.com/reference/resource-configs/check_cols

snapshots:
  - name: orders_snapshot_check
    relation: source('jaffle_shop', 'orders')
    config:
      schema: snapshots
      unique_key: id
      strategy: check
      check_cols:
        - status
        - is_cancelled

4)dbt_valid_to_current

* dbt_valid_to のデフォルト値を指定(例: 未来の日付)
* デフォルトは NULL

https://docs.getdbt.com/reference/resource-configs/dbt_valid_to_current

snapshots:
  - name: my_snapshot
    config:
      strategy: timestamp
      updated_at: updated_at
      dbt_valid_to_current: "to_date('9999-12-31')"

5)hard_deletes (旧:invalidate_hard_deletes)

* From v1.9
* 削除時の振る舞いを指定

https://docs.getdbt.com/reference/resource-configs/hard-deletes

snapshots:
  - name: <snapshot_name>
    config:
      hard_deletes: 'ignore' | 'invalidate' | 'new_record'
Methods Description
ignore (Default) Deleteされても何もしない
invalidate Deleteされた場合、dbt_valid_toに値を反映
new_record Deleteされた場合、履歴に新しいレコードとして追加される

invalidate_hard_deletes

* 以下は、既に非推奨(~v1.8)
* invalidate_hard_deletes=trueは、「hard_deletes: 'invalidate'」と同じ

https://docs.getdbt.com/reference/resource-configs/invalidate_hard_deletes

6)snapshot_meta_column_names

* From v1.9
* 動追加されるメタフィールド(例: dbt_valid_from)の名前を変更

https://docs.getdbt.com/reference/resource-configs/snapshot_meta_column_names

snapshots:
  - name: <snapshot_name>
    config:
      snapshot_meta_column_names:
        dbt_valid_from: <string>
        dbt_valid_to: <string>
        dbt_scd_id: <string>
        dbt_updated_at: <string>
        dbt_is_deleted: <string>

【5】dbt snapshotコマンド

* 基本、オプションなしで使えそうだが、、、

https://docs.getdbt.com/reference/commands/snapshot

dbt snapshot [-h] [--profiles-dir PROFILES_DIR]
                                     [--profile PROFILE] [--target TARGET]
                                     [--vars VARS] [--bypass-cache]
                                     [--threads THREADS]
                                     [--select SELECTOR [SELECTOR ...]]
                                     [--exclude EXCLUDE [EXCLUDE ...]]

optional arguments:
  --select SELECTOR [SELECTOR ...]
                        Specify the snapshots to include in the run.
  --exclude EXCLUDE [EXCLUDE ...]
                        Specify the snapshots to exclude in the run.

参考文献

https://qiita.com/imaik_/items/d470b3b98e15a1e36f63
https://qiita.com/hanon52_/items/6b1ed1a5247d0cac759f

関連記事

dbt ~ 基礎知識編 ~
https://dk521123.hatenablog.com/entry/2023/06/30/000000
dbt ~ 環境設定編 ~
https://dk521123.hatenablog.com/entry/2023/12/16/152147
dbt ~ 入門編 ~
https://dk521123.hatenablog.com/entry/2023/05/30/151003
dbt CLI ~ 入門編 ~
https://dk521123.hatenablog.com/entry/2024/07/21/234811