【AWS】Amazon EMR ~ boto3 編 ~

■ はじめに

https://dk521123.hatenablog.com/entry/2020/02/20/230519
https://dk521123.hatenablog.com/entry/2020/05/27/175610

の続き。
boto3 を使って、 Amazon EMR を操作する

■ boto3 API 仕様

https://boto3.amazonaws.com/v1/documentation/api/latest/reference/services/emr.html
run_job_flow : EMR起動
https://boto3.amazonaws.com/v1/documentation/api/latest/reference/services/emr.html#EMR.Client.run_job_flow
https://docs.aws.amazon.com/emr/latest/APIReference/API_RunJobFlow.html

■ サンプル

例1:EMR起動

import boto3

emr_endpoint_url = "https://xxxxx/"

emr_client = boto3.client(
  'emr',
  region_name='ap-northeast-1',
  endpoint_url=emr_endpoint_url)

# ★ここで起動★
response = emr_client.run_job_flow(
  Name="Hello_world",
  LogUri="s3://your-bucket-name/logs/",
  # https://docs.aws.amazon.com/ja_jp/emr/latest/ReleaseGuide/emr-release-components.html
  ReleaseLabel="emr-5.29.0",
  Applications=[
    {
      "Name": "Spark"
    },
    {
      "Name": "Hive"
    },
  ],
  Instances={
    'InstanceGroups': [
      {
        "Name": "emr_master",
        # ON_DEMAND | SPOT
        "Market": "ON_DEMAND",
        "InstanceCount": 5,
        # MASTER | CORE | TASK
        "InstanceRole": "MASTER",
        "InstanceType": "string",
        ...
      },
      {
        ...
      }
    ],
    'EC2KeyName': "sample_ec2_key_name",
    'KeepJobFlowAliveWhenNoSteps': False,
    'TerminationProtected': False,
    'Ec2SubnetId': "xxxxxxx",
    'EmrManagedMasterSecurityGroup': "xxxxx",
    'EmrManagedSlaveSecurityGroup': "xxxxxx",
    'ServiceAccessSecurityGroup': "xxxxx"
  },
  Steps=[
    {
      'Name': 'hello_hive',
      # TERMINATE_JOB_FLOW | TERMINATE_CLUSTER | CANCEL_AND_WAIT | CONTINUE
      'ActionOnFailure': "TERMINATE_CLUSTER",
      'HadoopJarStep': {
        'Jar': 'command-runner.jar',
        'Args': [
          's3://your-bucket-name/script-path/my_script.sh',
        ]
      }
    },
  ],
  # https://dev.classmethod.jp/articles/emr-bootstrap-action/
  BootstrapActions=[
    {
      'Name': 'hello_bootstrap_action',
      'ScriptBootstrapAction': {
        'Path': 's3://your-bucket-name/script-path/install_script.sh',
        'Args': [
          'param1',
        ]
      }
    }
  ],
  # https://qiita.com/kazz_ogawa/items/eea9c378193d84139b5d
  VisibleToAllUsers=True,
  JobFlowRole="xxxxxx",
  ServiceRole="xxxxxx",
  AutoScalingRole="xxxxxx",
  Tags={
    "Name": "hello_world_emr",
  }
)

# [Response]
# {
#    'JobFlowId': 'string',
#    'ClusterArn': 'string'
# }
print(response)

参考文献

http://laughingman7743.hatenablog.com/entry/2016/02/11/185319
https://gist.github.com/laughingman7743/5c675c9b1d9ed02539e6
https://dev.classmethod.jp/articles/boto3-emr-step-wait/

関連記事

Amazon EMR ~ 入門編 ~
https://dk521123.hatenablog.com/entry/2020/02/20/230519
Amazon EMR ~ 基本編 ~
https://dk521123.hatenablog.com/entry/2020/05/27/175610
Amazon EMR ~ AWS Glueとの連携 ~
https://dk521123.hatenablog.com/entry/2020/11/12/113312
Amazon EMR ~ EMRFS ~
https://dk521123.hatenablog.com/entry/2020/11/13/145545
Amazon EMR に関するトラブルシューティング
https://dk521123.hatenablog.com/entry/2020/08/05/144724
Amazon S3Python boto3でS3を操作する ~
https://dk521123.hatenablog.com/entry/2019/10/21/230004