【dbt】dbt ~ 更新 / Update or Insert / Insert ~

■ はじめに

https://dk521123.hatenablog.com/entry/2023/12/07/060129

の続き。

今回は、Update or Insert (Upsert)について、深堀っていく
ついでに、普通の Insert も載せておく。

目次

【1】Update or Insert に関する属性
  1)merge_update_columns / merge_exclude_columns
【2】構文
【3】サンプル
【4】Insert に関して

【1】Update or Insert に関する属性

1)merge_update_columns / merge_exclude_columns

merge_update_columns

* 更新対象のテーブル項目

merge_exclude_columns

* 更新対象外のテーブル項目(この項目以外は更新対象になる)

【2】構文

{{
    config(
        materialized='incremental',
        unique_key=['user_id' , ...],
        incremental_strategy='merge',
        merge_update_columns=['user_name', ...]
    )
}}

https://docs.getdbt.com/docs/build/incremental-models#strategy-specific-configs

【3】サンプル

[0] サンプルテーブル定義

DROP TABLE IF EXISTS daily_sample_user;
CREATE TABLE daily_sample_user (
  user_id VARCHAR NOT NULL,
  user_name VARCHAR NOT NULL,
  remarks VARCHAR NULL,
  updated_at DATE NOT NULL,
  created_at DATE NOT NULL
);

[1] 初期データ (1回目)

INSERT INTO daily_sample_user(user_id, user_name, remarks, updated_at, created_at) VALUES ('X001', 'Mike', 'Hello', '2023-12-01', '2023-12-01');
INSERT INTO daily_sample_user(user_id, user_name, remarks, updated_at, created_at) VALUES ('X002', 'Tom', 'World', '2023-12-01', '2023-12-01');

SELECT * FROM daily_sample_user;
~~~~~~
"user_id","user_name","remarks","updated_at","created_at"
"X001","Mike","Hello","2023-12-01","2023-12-01"
"X002","Tom","World","2023-12-01","2023-12-01"
~~~~~~

[3] 「models/schema.yml」追加

version: 2

# Add
sources:
  - name: demo_source
    # database: sample_db
    schema: hello
    tables:
      - name: daily_sample_user

[4] モデル「models/full_sample_user.sql」定義 (1回目)

{{
  config(
    materialized='incremental',
    unique_key='user_id',
    incremental_strategy='merge',
    merge_update_columns=['user_name', 'remarks', 'updated_at']
  )
}}

WITH
updated_user AS (
  SELECT
    user_id AS user_id,
    user_name AS user_name,
    remarks AS remarks,
    '2023-12-01' AS updated_at,
    '2023-12-01' AS created_at
  FROM
    {{ source('demo_source', 'daily_sample_user') }}
)

SELECT * FROM updated_user

[5] dbt run (1回目)

dbt run

[6] 確認

SELECT * FROM full_sample_user;
~~~~~~
"user_id","user_name","remarks","updated_at","created_at"
"X001","Mike","Hello","2023-12-01","2023-12-01"
"X002","Tom","World","2023-12-01","2023-12-01"
~~~~~~

[7] 追加データ投入 (2回目)

-- 1回目のデータをクリア
Truncate table daily_sample_user;

-- 更新データ
INSERT INTO daily_sample_user(user_id, user_name, remarks, updated_at, created_at) VALUES ('X001', 'Mike', 'After updated', '2023-12-02', '2023-12-02');
INSERT INTO daily_sample_user(user_id, user_name, remarks, updated_at, created_at) VALUES ('X003', 'Kevin', 'This is a new data', '2023-12-02', '2023-12-02');

SELECT * FROM daily_sample_user;
~~~~~~
"user_id","user_name","remarks","updated_at","created_at"
"X001","Mike","After updated","2023-12-02","2023-12-02"
"X003","Kevin","This is a new data","2023-12-02","2023-12-02"
~~~~~~

[8] モデル「models/full_sample_user.sql」修正 (2回目)

-- ...
    remarks AS remarks,
    -- '2023-12-01' AS updated_at,
    -- '2023-12-01' AS created_at
    '2023-12-02' AS updated_at, -- ★ここを変えた
    '2023-12-02' AS created_at -- ★ここを変えた
  FROM
-- ...

[9] dbt run (2回目)

dbt run

[10] 確認

SELECT * FROM full_sample_user  ORDER BY user_id;
~~~~~~
"user_id","user_name","remarks","updated_at","created_at"
"X001","Mike","After updated","2023-12-02","2023-12-01" <= 更新されたが「created_at」はそのまま
"X002","Tom","World","2023-12-01","2023-12-01" <= 1回目のまま
"X003","Kevin","This is a new data","2023-12-02","2023-12-02" <= 新規行は、そのまま追加
~~~~~~

【4】Insert に関して

* Insert に関しては、
 「incremental_strategy='append'」にするといい

モデル「models/insert_sample_user.sql」定義

{{
  config(
    materialized='incremental',
    unique_key='user_id',
    incremental_strategy='append'
  )
}}

WITH
insert_user AS (
  SELECT
    user_id AS user_id,
    user_name AS user_name,
    remarks AS remarks,
    updated_at AS updated_at,
    created_at AS created_at
  FROM
    {{ source('demo_source', 'daily_sample_user') }}
)

SELECT * FROM insert_user

結果例

* 同じようにやっていくと、以下のようになる。

"user_id","user_name","remarks","updated_at","created_at"
"X001","Mike","Hello","2023-12-01","2023-12-01"
"X002","Tom","World","2023-12-01","2023-12-01"
"X001","Mike","After updated","2023-12-02","2023-12-02"
"X003","Kevin","This is a new data","2023-12-02","2023-12-02"

関連記事

dbt ~ 基礎知識編 ~
https://dk521123.hatenablog.com/entry/2023/06/30/000000
dbt ~ 入門編 ~
https://dk521123.hatenablog.com/entry/2023/05/30/151003
dbt ~ 基本編 / Source ~
https://dk521123.hatenablog.com/entry/2023/12/08/111012
dbt ~ 更新 / 基礎知識編 ~
https://dk521123.hatenablog.com/entry/2023/12/07/060129