【Terraform】AWS上に合成データ生成システムを構築する 〜 インフラ編 〜

◾️はじめに

https://dk521123.hatenablog.com/entry/2025/12/19/233012

の続き。

以下のようなサービス構成を考えたので、
Terraformを使って、以下の構成ができるように組んでいく
(Glue Jobの正式な内容は、別途作成する)

Terraformも大分久しぶりに触るので、リハビリしながらやっていく

サービス構成

Items Choose
IaC Terraform
Cloud AWS
Platform AWS Glue
Language Python
Tool Library Python SDV (Synthetic Data Vault)

目次

【1】フォルダ構成
【2】サンプル
 0)環境ごとの設定ファイル
 1)variables.tf
 2)provider.tf
 3)main.tf
 4)modules/glue/main.tf
 5)modules/glue/variables.tf
 6)glue_jobs/synthetic_files_generator.py
【3】デプロイ
 1)DEV環境へのコマンド例

【1】フォルダ構成

└── modules
    ├── glue
        ├── main.tf
        └── variables.tf
└── glue_jobs
    └──  synthetic_files_generator.py << ここは今は適当
├── main.tf
├── provider.tf
├── dev.tfbackend
├── provider.tf
└── variables.tf

1)前提条件

* 今回は、以下が既存のものを使う(variables.tfで入力させる)
 + s3
 + IAM

【2】サンプル

0)環境ごとの設定ファイル

* いちいち実行ごとに入力するのは面倒なので、設定ファイル化しておく

dev.tfbackend

bucket = "your-s3-bucket"
key = "tf-for-synthtic/dev/terraform.tfstate"
region = "us-west-2"

dev.tfvars

env = "dev"

# For Glue
glue_role_arn = "arn:aws:iam:00000000:role/your-glue-job-role-arn"
glue_job_script_source = "glue_jobs/synthetic_files_generator.py"
glue_job_script_location_bucket = "your-bucket-name"
glue_job_script_location_key = "demo/glue_jobs/synthetic_files_generator.py"
glue_job_connections = ["your-glue-connection"]

1)variables.tf

variable "aws_region" {
  description = "AWS region to deploy resources"
  type = string
  default = "us-west-2"
}

variable "env" {
  description = "Environment"
  type = string
  default = "dev"

  validation {
    condition     = contains(["dev", "stage", "prod"], var.env)
    error_message = "Only dev / stage / prod"
  }
}

variable "project_name" {
  description = "Project name"
  type = string
  default = "synthetic_files"
}

variable "glue_role_arn" {
  description = "AWS Glue job role ARN"
  type = string
}

variable "glue_version" {
  description = "AWS Glue job version"
  type = string
  default = "5.0"
}

variable "glue_job_script_source" {
  description = "Python script path of AWS Glue job script"
  type = string
  default = "glue_jobs/synthetic_files_generator.py"
}

variable "glue_job_script_location_bucket" {
  description = "s3 bucket of AWS Glue job script"
  type = string
}

variable "glue_job_script_location_key" {
  description = "s3 bucket key of AWS Glue job script"
  type = string
}

variable "glue_job_connections" {
  description = "AWS Glue job connections"
  type = list(string)
}

2)provider.tf

# Configure the AWS Provider
provider "aws" {
  region = var.aws_region

  default_tags {
    tags = {
      env = var.env
    }
  }
}

3)main.tf

module "demo_generate_synthetic_data_glue_module" {
  source = "./modules/glue"
  env = var.env
  glue_role_arn = var.glue_role_arn
  glue_job_script_source = var.glue_job_script_source
  glue_job_script_location_bucket = var.glue_job_script_location_bucket
  glue_job_script_location_key = var.glue_job_script_location_key
  glue_job_connections = var.glue_job_connections
}

4)modules/glue/main.tf

# [1] To create Glue workflow
resource "aws_glue_workflow" "demo_generate_synthetic_data_workflow" {
  name = "demo-${var.env}-generate-synthetic-data-workflow"
  description = "This is the glue workflow to generate synthetic dada."
  tags = {
    name = "demo-${var.env}-generate-synthetic-data-workflow"
  }
}

# [2-1] To upload Glue script to s3
resource "aws_s3_object" "clanyan_source" {
  source = var.glue_job_script_source
  bucket = var.glue_job_script_location_bucket
  key    = var.glue_job_script_location_key
  etag   = filemd5(var.glue_job_script_source)
}

# [2-2] To create Glue Job
resource "aws_glue_job" "demo_generate_synthetic_data_job" {
  name = "demo-${var.env}-generate-synthetic-data-job"
  description = "This is the glue job to generate synthetic dada."
  role_arn = var.glue_role_arn
  glue_version = var.glue_version
  max_retries = var.env == "dev" ? 0 : 3
  timeout = var.env == "dev" ? 60 : 2880
  worker_type = var.env == "dev" ? "G.2X" : "G.4X"
  number_of_workers = var.env == "dev" ? 2 : 10

  command {
    name = "glueetl"
    script_location = "s3://${var.glue_job_script_location_bucket}${var.glue_job_script_location_key}"
  }
  connections = var.glue_job_connections

  # Glue Job parameters
  default_arguments = {
    "--job-language" = "python"

    # 今回はなくていいけど、将来的には必要
    # "--additional-python-modules" = "sdv==1.32.0"

    "--TempDir"          = "s3://${var.glue_job_script_location_bucket}/temp/"
    "--continuous-log-logGroup"          = "/aws-glue/jobs"
    "--enable-continuous-cloudwatch-log" = "true"
    "--enable-continuous-log-filter"     = "true"
    "--enable-metrics"                   = ""
    "--enable-auto-scaling"              = "true"
  }

  tags = {
    name = "demo-${var.env}-generate-synthetic-data-job"
  }
}

# [3] To create Glue start trigger
resource "aws_glue_trigger" "demo_generate_synthetic_data_start_trigger" {
  name = "demo-${var.env}-generate-synthetic-data-start-trigger"
  description = "This is the glue start trigger to generate synthetic dada."
  type = "ON_DEMAND"
  workflow_name = aws_glue_workflow.demo_generate_synthetic_data_workflow.name

  actions {
    job_name = aws_glue_job.demo_generate_synthetic_data_job.name
  }
  tags = {
    name = "demo-${var.env}-generate-synthetic-data-start-trigger"
  }
}

5)modules/glue/variables.tf

variable "env" {
  description = "Environment"
  type = string
}

variable "glue_role_arn" {
  description = "AWS Glue job role ARN"
  type = string
}

variable "glue_version" {
  description = "AWS Glue job version"
  type = string
  default = "5.0"
}

variable "glue_job_script_source" {
  description = "Python script path of AWS Glue job script"
  type = string
  default = "glue_jobs/synthetic_files_generator.py"
}

variable "glue_job_script_location_bucket" {
  description = "s3 bucket of AWS Glue job script"
  type = string
}

variable "glue_job_script_location_key" {
  description = "s3 bucket key of AWS Glue job script"
  type = string
}

variable "glue_job_connections" {
  description = "AWS Glue job connections"
  type = list(string)
}

6)glue_jobs/synthetic_files_generator.py

# ひとまず、今回は、Hello worldを実行させるだけ
import sys
import boto3
from awsglue.utils import getResolvedOptions

glue_client = boto3.client("glue")
args = getResolvedOptions(sys.argv, ['WORKFLOW_NAME', 'WORKFLOW_RUN_ID'])
workflow_name = args['WORKFLOW_NAME']
workflow_run_id = args['WORKFLOW_RUN_ID']
workflow_params = glue_client.get_workflow_run_properties(
  Name=workflow_name, RunId=workflow_run_id)["RunProperties"]

# 設定
workflow_params = { 'key_1': 'hello_world' }
glue_client.put_workflow_run_properties(
  Name=workflow_name, RunId=workflow_run_id, RunProperties=workflow_params)
print("Done")

【3】デプロイ

1)DEV環境へのコマンド例

# 初期化
terraform init -backend-config=dev.tfbackend

# チェック
terraform plan -var-file=dev.tfvars

# デプロイ
terraform apply -var-file=dev.tfvars --auto-approve

関連記事

AWS上に合成データ生成システムを構築する 〜 構想編 〜
https://dk521123.hatenablog.com/entry/2025/12/19/233012
Terraform ~ 環境構築編 ~
https://dk521123.hatenablog.com/entry/2023/04/05/000224
Terraform ~ 入門編 ~
https://dk521123.hatenablog.com/entry/2019/12/09/222057
Terraform ~ 基本編 ~
https://dk521123.hatenablog.com/entry/2023/05/03/000000
Terraform ~ Module ~
https://dk521123.hatenablog.com/entry/2023/05/19/113544
Terraform ~ local ~
https://dk521123.hatenablog.com/entry/2023/12/24/173633
Terraform ~ terraform initコマンド ~
https://dk521123.hatenablog.com/entry/2025/09/24/221918
Terraform ~ tfstate / Backend ~
https://dk521123.hatenablog.com/entry/2023/05/05/004939
Terraform ~ variable / 入門編 ~
https://dk521123.hatenablog.com/entry/2025/09/19/093537
Terraform ~ variable / 基本編 ~
https://dk521123.hatenablog.com/entry/2025/09/20/002058
Terraform ~ Provider ~
https://dk521123.hatenablog.com/entry/2024/06/03/001929
Terraform ~ terraform initコマンド ~
https://dk521123.hatenablog.com/entry/2025/09/24/221918
Terraform ~ 複数環境へデプロイすることを考える ~
https://dk521123.hatenablog.com/entry/2023/05/06/003645
Terraform ~ AWS Glue ~
https://dk521123.hatenablog.com/entry/2023/04/08/220411
AWS Glue ~ 入門編 ~
https://dk521123.hatenablog.com/entry/2019/10/01/221926
Glue Job から パーティションを更新することを考える
https://dk521123.hatenablog.com/entry/2021/05/15/130604
AWS Glue ~ Wheelファイル作成 ~
https://dk521123.hatenablog.com/entry/2026/01/13/191945
AWS Data Wrangler ~ 入門編 ~
https://dk521123.hatenablog.com/entry/2021/04/28/174316
Snowflake ~ GENERATE_SYNTHETIC_DATA ~
https://dk521123.hatenablog.com/entry/2025/12/24/001900
Python SDV 〜 入門編 〜
https://dk521123.hatenablog.com/entry/2025/12/21/000330
Python SDV 〜 HMASynthesizer 〜
https://dk521123.hatenablog.com/entry/2025/12/26/001812