◾️はじめに
https://dk521123.hatenablog.com/entry/2025/12/19/233012
の続き。 以下のようなサービス構成を考えたので、 Terraformを使って、以下の構成ができるように組んでいく (Glue Jobの正式な内容は、別途作成する) Terraformも大分久しぶりに触るので、リハビリしながらやっていく
サービス構成
| Items | Choose |
|---|---|
| IaC | Terraform |
| Cloud | AWS |
| Platform | AWS Glue |
| Language | Python |
| Tool Library | Python SDV (Synthetic Data Vault) |
目次
【1】フォルダ構成 【2】サンプル 0)環境ごとの設定ファイル 1)variables.tf 2)provider.tf 3)main.tf 4)modules/glue/main.tf 5)modules/glue/variables.tf 6)glue_jobs/synthetic_files_generator.py 【3】デプロイ 1)DEV環境へのコマンド例
【1】フォルダ構成
└── modules
├── glue
├── main.tf
└── variables.tf
└── glue_jobs
└── synthetic_files_generator.py << ここは今は適当
├── main.tf
├── provider.tf
├── dev.tfbackend
├── provider.tf
└── variables.tf
1)前提条件
* 今回は、以下が既存のものを使う(variables.tfで入力させる) + s3 + IAM
【2】サンプル
0)環境ごとの設定ファイル
* いちいち実行ごとに入力するのは面倒なので、設定ファイル化しておく
dev.tfbackend
bucket = "your-s3-bucket" key = "tf-for-synthtic/dev/terraform.tfstate" region = "us-west-2"
dev.tfvars
env = "dev" # For Glue glue_role_arn = "arn:aws:iam:00000000:role/your-glue-job-role-arn" glue_job_script_source = "glue_jobs/synthetic_files_generator.py" glue_job_script_location_bucket = "your-bucket-name" glue_job_script_location_key = "demo/glue_jobs/synthetic_files_generator.py" glue_job_connections = ["your-glue-connection"]
1)variables.tf
variable "aws_region" { description = "AWS region to deploy resources" type = string default = "us-west-2" } variable "env" { description = "Environment" type = string default = "dev" validation { condition = contains(["dev", "stage", "prod"], var.env) error_message = "Only dev / stage / prod" } } variable "project_name" { description = "Project name" type = string default = "synthetic_files" } variable "glue_role_arn" { description = "AWS Glue job role ARN" type = string } variable "glue_version" { description = "AWS Glue job version" type = string default = "5.0" } variable "glue_job_script_source" { description = "Python script path of AWS Glue job script" type = string default = "glue_jobs/synthetic_files_generator.py" } variable "glue_job_script_location_bucket" { description = "s3 bucket of AWS Glue job script" type = string } variable "glue_job_script_location_key" { description = "s3 bucket key of AWS Glue job script" type = string } variable "glue_job_connections" { description = "AWS Glue job connections" type = list(string) }
2)provider.tf
# Configure the AWS Provider provider "aws" { region = var.aws_region default_tags { tags = { env = var.env } } }
3)main.tf
module "demo_generate_synthetic_data_glue_module" { source = "./modules/glue" env = var.env glue_role_arn = var.glue_role_arn glue_job_script_source = var.glue_job_script_source glue_job_script_location_bucket = var.glue_job_script_location_bucket glue_job_script_location_key = var.glue_job_script_location_key glue_job_connections = var.glue_job_connections }
4)modules/glue/main.tf
# [1] To create Glue workflow resource "aws_glue_workflow" "demo_generate_synthetic_data_workflow" { name = "demo-${var.env}-generate-synthetic-data-workflow" description = "This is the glue workflow to generate synthetic dada." tags = { name = "demo-${var.env}-generate-synthetic-data-workflow" } } # [2-1] To upload Glue script to s3 resource "aws_s3_object" "clanyan_source" { source = var.glue_job_script_source bucket = var.glue_job_script_location_bucket key = var.glue_job_script_location_key etag = filemd5(var.glue_job_script_source) } # [2-2] To create Glue Job resource "aws_glue_job" "demo_generate_synthetic_data_job" { name = "demo-${var.env}-generate-synthetic-data-job" description = "This is the glue job to generate synthetic dada." role_arn = var.glue_role_arn glue_version = var.glue_version max_retries = var.env == "dev" ? 0 : 3 timeout = var.env == "dev" ? 60 : 2880 worker_type = var.env == "dev" ? "G.2X" : "G.4X" number_of_workers = var.env == "dev" ? 2 : 10 command { name = "glueetl" script_location = "s3://${var.glue_job_script_location_bucket}${var.glue_job_script_location_key}" } connections = var.glue_job_connections # Glue Job parameters default_arguments = { "--job-language" = "python" # 今回はなくていいけど、将来的には必要 # "--additional-python-modules" = "sdv==1.32.0" "--TempDir" = "s3://${var.glue_job_script_location_bucket}/temp/" "--continuous-log-logGroup" = "/aws-glue/jobs" "--enable-continuous-cloudwatch-log" = "true" "--enable-continuous-log-filter" = "true" "--enable-metrics" = "" "--enable-auto-scaling" = "true" } tags = { name = "demo-${var.env}-generate-synthetic-data-job" } } # [3] To create Glue start trigger resource "aws_glue_trigger" "demo_generate_synthetic_data_start_trigger" { name = "demo-${var.env}-generate-synthetic-data-start-trigger" description = "This is the glue start trigger to generate synthetic dada." type = "ON_DEMAND" workflow_name = aws_glue_workflow.demo_generate_synthetic_data_workflow.name actions { job_name = aws_glue_job.demo_generate_synthetic_data_job.name } tags = { name = "demo-${var.env}-generate-synthetic-data-start-trigger" } }
5)modules/glue/variables.tf
variable "env" { description = "Environment" type = string } variable "glue_role_arn" { description = "AWS Glue job role ARN" type = string } variable "glue_version" { description = "AWS Glue job version" type = string default = "5.0" } variable "glue_job_script_source" { description = "Python script path of AWS Glue job script" type = string default = "glue_jobs/synthetic_files_generator.py" } variable "glue_job_script_location_bucket" { description = "s3 bucket of AWS Glue job script" type = string } variable "glue_job_script_location_key" { description = "s3 bucket key of AWS Glue job script" type = string } variable "glue_job_connections" { description = "AWS Glue job connections" type = list(string) }
6)glue_jobs/synthetic_files_generator.py
# ひとまず、今回は、Hello worldを実行させるだけ import sys import boto3 from awsglue.utils import getResolvedOptions glue_client = boto3.client("glue") args = getResolvedOptions(sys.argv, ['WORKFLOW_NAME', 'WORKFLOW_RUN_ID']) workflow_name = args['WORKFLOW_NAME'] workflow_run_id = args['WORKFLOW_RUN_ID'] workflow_params = glue_client.get_workflow_run_properties( Name=workflow_name, RunId=workflow_run_id)["RunProperties"] # 設定 workflow_params = { 'key_1': 'hello_world' } glue_client.put_workflow_run_properties( Name=workflow_name, RunId=workflow_run_id, RunProperties=workflow_params) print("Done")
【3】デプロイ
1)DEV環境へのコマンド例
# 初期化 terraform init -backend-config=dev.tfbackend # チェック terraform plan -var-file=dev.tfvars # デプロイ terraform apply -var-file=dev.tfvars --auto-approve
関連記事
AWS上に合成データ生成システムを構築する 〜 構想編 〜
https://dk521123.hatenablog.com/entry/2025/12/19/233012
Terraform ~ 環境構築編 ~
https://dk521123.hatenablog.com/entry/2023/04/05/000224
Terraform ~ 入門編 ~
https://dk521123.hatenablog.com/entry/2019/12/09/222057
Terraform ~ 基本編 ~
https://dk521123.hatenablog.com/entry/2023/05/03/000000
Terraform ~ Module ~
https://dk521123.hatenablog.com/entry/2023/05/19/113544
Terraform ~ local ~
https://dk521123.hatenablog.com/entry/2023/12/24/173633
Terraform ~ terraform initコマンド ~
https://dk521123.hatenablog.com/entry/2025/09/24/221918
Terraform ~ tfstate / Backend ~
https://dk521123.hatenablog.com/entry/2023/05/05/004939
Terraform ~ variable / 入門編 ~
https://dk521123.hatenablog.com/entry/2025/09/19/093537
Terraform ~ variable / 基本編 ~
https://dk521123.hatenablog.com/entry/2025/09/20/002058
Terraform ~ Provider ~
https://dk521123.hatenablog.com/entry/2024/06/03/001929
Terraform ~ terraform initコマンド ~
https://dk521123.hatenablog.com/entry/2025/09/24/221918
Terraform ~ 複数環境へデプロイすることを考える ~
https://dk521123.hatenablog.com/entry/2023/05/06/003645
Terraform ~ AWS Glue ~
https://dk521123.hatenablog.com/entry/2023/04/08/220411
AWS Glue ~ 入門編 ~
https://dk521123.hatenablog.com/entry/2019/10/01/221926
Glue Job から パーティションを更新することを考える
https://dk521123.hatenablog.com/entry/2021/05/15/130604
AWS Glue ~ Wheelファイル作成 ~
https://dk521123.hatenablog.com/entry/2026/01/13/191945
AWS Data Wrangler ~ 入門編 ~
https://dk521123.hatenablog.com/entry/2021/04/28/174316
Snowflake ~ GENERATE_SYNTHETIC_DATA ~
https://dk521123.hatenablog.com/entry/2025/12/24/001900
Python SDV 〜 入門編 〜
https://dk521123.hatenablog.com/entry/2025/12/21/000330
Python SDV 〜 HMASynthesizer 〜
https://dk521123.hatenablog.com/entry/2025/12/26/001812