◾️はじめに

　既存システム調査をしている際に
AWS Lake formation のテーブルをAthenaで取得している処理があった。
これをリプレースする際に、dbt が使えるとスッキリするのになーっと思い
調べてみた。

【１】dbt-athena
　１）補足：dbt-athena-community
　２）dbtサポートバージョン
【２】インストール
　１）dbtプロジェクト作成
【３】設定
　１）~/.dbt/profiles.yml
　２）モデルの設定
【４】使用上の注意
　１）dbt-athenaを使用する場合、ファイルだけ保存したいという使用はできない

【１】dbt-athena

* dbtのAWS Athena用アダプタ

https://docs.getdbt.com/docs/core/connect-data-platform/athena-setup

１）補足：dbt-athena-community

dbt-athena VS dbt-athena-community
https://docs.getdbt.com/docs/core/connect-data-platform/athena-setup

より抜粋（和訳）
~~~~
dbt-athena-community は、2024年末に dbt Labs がメンテナンスを引き継ぐまで、
コミュニティによって維持管理されていたアダプタでした。
現在、dbt-athena と dbt-athena-community の両方は 
dbt Labs によって維持管理されていますが、
dbt-athena-community は現在、下位互換性のために公開されている dbt-athena 
のラッパーに過ぎません。
~~~~

２）dbtサポートバージョン

* Supported dbt Core version: v1.3.0 and newer

# 最近のdbt使っていれば問題なさそう、、、

https://docs.getdbt.com/docs/dbt-versions/core

【２】インストール

https://docs.getdbt.com/docs/core/connect-data-platform/athena-setup#

# pip install --upgrade pip

# インストール
pip install dbt-core dbt-athena

# 確認
dbt --version
Core:
  - installed: 1.10.15
  - latest:    1.10.15 - Up to date!

Plugins:
  - athena: 1.9.5 - Up to date!

１）dbtプロジェクト作成

# dbt init <DBT_PROJECT_NAME>
dbt init demo_althena

XX:XX:XX  Running with dbt=1.10.15
XX:XX:XX  
Your new dbt project "demo_althena" was created!

For more information on how to configure the profiles.yml file,
please consult the dbt documentation here:

  https://docs.getdbt.com/docs/configure-your-profile

One more thing:

Need help? Don't hesitate to reach out to us via GitHub issues or on Slack:

  https://community.getdbt.com/

Happy modeling!

09:42:29  Setting up your profile.
Which database would you like to use?
[1] athena

(Don't see the one you want? https://docs.getdbt.com/docs/available-adapters)

↓★入力していく★
Enter a number: 1       
s3_staging_dir (S3 location to store Athena query results and metadata, e.g. s3://athena_query_result/prefix/): s3://your-s3-bucket1/prefix/
s3_data_dir (S3 location where to store data/tables, e.g. s3://bucket_name/prefix/): s3://your-s3-bucket2/prefix/                        
region_name (AWS region of your Athena instance): us-west-2
schema (Specify the schema (Athena database) to build models into (lowercase only)): demo_db  
database (Specify the database (Data catalog) to build models into (lowercase only)) [awsdatacatalog]: 
threads (1 or more) [1]: 1
XX:XX:XX  Profile demo_althena written to /Users/user1/.dbt/profiles.yml using target's profile_template.yml and your supplied values. Run 'dbt debug' to validate the connection.

# 【任意】profiles.ymlをdbtプロジェクト配下にコピー
$ cp ~/.dbt/profiles.yml ./demo_althena/profiles.yml

【３】設定

１）~/.dbt/profiles.yml

https://github.com/dbt-labs/dbt-adapters/tree/main/dbt-athena#configuring-your-profile

default:
  outputs:
    dev:
      type: athena
      # クエリ結果やmetadataを格納するS3パス
      s3_staging_dir: s3://your-s3-bucket/athena_query_result
      # athenaのデータを格納するS3パス
      s3_data_dir: s3://your-s3-bucket/tables
      # s3_data_dir内でどうパスを生成するか（後述）
      s3_data_naming: schema_table_unique
      # AWS Region
      region_name: us-west-2
      # データソースの設定
      database: awsdatacatalog
      # デフォルトのスキーマ名(Athena database)
      schema: dev_demo_schema
      aws_profile_name: default
      # スレッド数（1 or more）
      threads: 1
      # リトライ数（0 or more. Defaults to 5）
      num_retries: 3
      # AWSのプロファイル名
      aws_profile_name: default
      # Athenaのワーキンググループ
      workgroup: primary
  target: dev

s3_data_naming

* 公式GithubのReadmeがわかりやすい

https://github.com/dbt-labs/dbt-adapters/tree/main/dbt-athena#table-location

Value	Path
unique	{s3_data_dir}/{uuid4()}/
table	{s3_data_dir}/{table}/
table_unique	{s3_data_dir}/{table}/{uuid4()}/
schema_table	{s3_data_dir}/{schema}/{table}/
schema_table_unique	{s3_data_dir}/{schema}/{table}/{uuid4()}/

２）モデルの設定

* 以下を参考にするといい

https://docs.getdbt.com/reference/resource-configs/athena-configs

Parameter	Default	Description
external_location	None	保存先のs3フルパス。materialized='incremental'のみ。haがtrueに設定されているHiveテーブルでは利用不可
ha	False	高可用性方式（high-availability）を使用してテーブルを構築。Hiveテーブルでのみ利用可
format	Parquet	ORC, PARQUET, AVRO, JSON, and TEXTFILE
partitioned_by	None	パーティションの設定。100パーティションが限度
table_type	Hive	hive or iceberg

models/demo.sql

{{ config(
    materialized='table',
    table_type='iceberg',
    format='parquet',
    partitioned_by=['bucket(user_id, 5)'],
    table_properties={
     'optimize_rewrite_delete_file_threshold': '2'
     }
) }}

select 'A'          as user_id,
       'pi'         as name,
       'active'     as status,
       17.89        as cost,
       1            as quantity,
       100000000    as quantity_big,
       current_date as my_date

【４】使用上の注意

* あくまで自分用のメモ

１）dbt-athenaを使用する場合、ファイルだけ保存したいという使用はできない

* dbtの特性上当たり前なのだが、以下のように指定するので、
　必ず外部テーブルが作成される
 => そのため、Athenaで管理しているテーブルデータをs3上に格納したいって用途では不向き

https://docs.getdbt.com/reference/resource-configs/athena-configs

{{
    config(
        materialized='table',
        partitioned_by=['date'],
        external_location='s3://your-s3-bucket/data/xxxx/'
    )
}}

参考文献

https://zenn.dev/koo_0208/articles/202312_dbt_athena

dbt ～基礎知識編～
https://dk521123.hatenablog.com/entry/2023/06/30/000000
dbt ～環境設定 / Docker 編～
https://dk521123.hatenablog.com/entry/2024/10/11/230419
dbt ～入門編～
https://dk521123.hatenablog.com/entry/2023/05/30/151003
Athena ～入門編～
https://dk521123.hatenablog.com/entry/2020/06/17/173717
Athena ～ AWS CLI ～
https://dk521123.hatenablog.com/entry/2023/10/13/144254

ランキング参加中

プログラミング

プログラムの超個人的なメモ

Memo for Programming.

【dbt】dbt-athena ～入門編～

◾️はじめに

目次

【１】dbt-athena

１）補足：dbt-athena-community

２）dbtサポートバージョン

【２】インストール

１）dbtプロジェクト作成

【３】設定

１）~/.dbt/profiles.yml

２）モデルの設定

【４】使用上の注意

１）dbt-athenaを使用する場合、ファイルだけ保存したいという使用はできない

参考文献

関連記事