【AWS】Amazon S3 ~ Python boto3で操作~

■ はじめに

 boto3でS3を操作する方法をメモ。

■ boto3とは?

AWS を Python から操作するためのライブラリ

API 仕様

https://boto3.amazonaws.com/v1/documentation/api/latest/reference/services/s3.html
主なAPI

【1】list_objects_v2
【2】get_object
【3】copy / copy_object
【4】delete_object / delete_objects

【1】list_objects_v2

https://boto3.amazonaws.com/v1/documentation/api/latest/reference/services/s3.html#S3.Client.list_objects_v2
https://docs.aws.amazon.com/AmazonS3/latest/API/API_ListObjectsV2.html

サンプル:ファイル一覧表示

import boto3

client = boto3.client('s3')

response = client.list_objects_v2(
    Bucket='bucket-name',
    Prefix='xxxx/yyy/zzzz/',
)
s3_contents = response['Contents']
for s3_content in s3_contents:
   key = s3_content.get('Key')
   print(key)

使用上の注意

以下の関連記事を参照のこと。

https://dk521123.hatenablog.com/entry/2019/12/06/232617

【2】get_object

ファイルの読み込みに使える

サンプル

例1)ファイルを読み込む

import boto3
import yaml

s3_client = boto3.client('s3')
response =
   s3_client.get_object(Bucket='bucket-name', Key="sample.sql")
body = response["Body"].read()
print(body.decode('utf-8'))

例2)YAMLファイルを読み込む

import boto3
import yaml

s3_client = boto3.client('s3')
response =
   s3_client.get_object(Bucket='bucket-name', Key="filename.yaml")

try:
    config = yaml.safe_load(response["Body"])
except yaml.YAMLError as exc:
    return exc

参考文献
https://qiita.com/NoriakiOshita/items/2f9e3a16110679e0efac
https://gist.github.com/coingraham/c6153809e4d179396421

【3】copy / copy_object

ファイルのコピーに使える

copy
https://boto3.amazonaws.com/v1/documentation/api/latest/reference/services/s3.html#S3.Client.copy
copy_object
https://boto3.amazonaws.com/v1/documentation/api/latest/reference/services/s3.html#S3.Client.copy_object

使用上の注意:copy_object のファイル制限について

大きいファイル(5GB超)をコピーする際は、copy_objectは使えない。
 => 使うと、以下「エラー内容」のような例外が発生する
 => 代わりに copy() を使うっとのこと

https://boto3.amazonaws.com/v1/documentation/api/latest/reference/services/s3.html#S3.Client.copy_object

より抜粋
~~~~
Note

You can store individual objects of up to 5 TB in Amazon S3.
You create a copy of your object up to 5 GB in size in a single atomic operation using this API.
However, to copy an object greater than 5 GB,
 you must use the multipart upload Upload Part - Copy API.
For more information, see Copy Object Using the REST Multipart Upload API .
~~~~

エラー内容

ClientError: An error occurred (invalidRequest) when calling the CopyObject operation:
The specified copy source is larger than the maximum allowable size for a copy source: 5368709120

参考文献
https://dev.classmethod.jp/articles/python-boto3-s3-meta-client-copy/

サンプル

例:S3の指定した先の全ファイルをコピー又は切り取りする

import boto3

def copy_all_files(
  s3_client,
  s3_resource,
  src_s3_bucket, src_prefix,
  dest_s3_bucket, dest_prefix,
  is_moving=False,
  is_dry_run=False):

  try:
    next_token = ''
    mark_of_dry_run = "[Dry run] " if (is_dry_run is True) else ""
    while True:
      if next_token == '':
        response = s3_client.list_objects_v2(
          Bucket=src_s3_bucket,
          Prefix=src_prefix)
      else:
        response = s3_client.list_objects_v2(
          Bucket=src_s3_bucket,
          Prefix=src_prefix,
          ContinuationToken=next_token)

      if 'Contents' in response:
        contents = response['Contents']
        for content in contents:
          content_key = content['Key']
          source_prefix = content_key
          destination_prefix = "{}/{}".format(
            dest_prefix, content_key)

          source_dict = {
            'Bucket': src_s3_bucket,
            'Key': source_prefix
          }
          print("{} Coping s3://{}/{} to s3://{}/{}".format(
            mark_of_dry_run,
            src_s3_bucket,
            source_prefix,
            dest_s3_bucket,
            destination_prefix))

          if not is_dry_run:
            response_copy = s3_resource.meta.client.copy(
              source_dict,
              dest_s3_bucket,
              destination_prefix)
            print(response_copy)

          if is_moving:
            print("{} Deleting s3://{}/{}".format(
              mark_of_dry_run, src_s3_bucket, source_prefix))
            if not is_dry_run:
              response_delete = s3_client.delete_object(
                Bucket=src_s3_bucket, Key=source_prefix)
              print(response_delete)

      if 'NextContinuationToken' in response:
        next_token = response['NextContinuationToken']
      else:
        print("Done")
        break
  except Exception as ex:
    print(str(ex))
    raise ex

if __name__ == "__main__":
  s3_client = boto3.client('s3')
  s3_resource = boto3.resource('s3')
  copy_all_files(
    s3_client,
    s3_resource,
    "your-s3-bucket1",
    "target/directory/src",
    "your-s3-bucket2",
    "target/directory/dict",
    True,
    True)

【4】delete_object / delete_objects

delete_object
https://boto3.amazonaws.com/v1/documentation/api/latest/reference/services/s3.html#S3.Client.delete_object

response = client.delete_object(
    Bucket='string',
    Key='string',
    ...
)

delete_objects
https://boto3.amazonaws.com/v1/documentation/api/latest/reference/services/s3.html#S3.Client.delete_objects

response = client.delete_objects(
    Bucket='string',
    Delete={
        # 1000 keys まで指定可能
        'Objects': [
            {
                'Key': 'string',
                'VersionId': 'string'
            },
        ],
        'Quiet': True|False
    },
    ...
)

使用上の注意

* ディレクトリ削除は、結構面倒くさい
 => ファイルが中にあった場合、さっくりディレクトリごと削除ってことができない

https://dev.classmethod.jp/articles/20180625-how-to-delete-s3folder/
サンプル:ディレクトリ削除

import boto3

def delete_directory(s3_client, s3_bucket, prefix):
  if prefix[-1] != "/":
    prefix = prefix + "/"

  try:
    next_token = ''
    while True:
      if next_token == '':
        response = s3_client.list_objects_v2(
          Bucket=s3_bucket,
          Prefix=prefix)
      else:
        response = s3_client.list_objects_v2(
          Bucket=s3_bucket,
          Prefix=prefix,
          ContinuationToken=next_token)

      if 'Contents' in response:
        contents = response['Contents']
        for content in contents:
          content_key = content['Key']
          print("Deleting s3://{}/{}/{}".format(
            s3_bucket,
            prefix,
            content_key))
          s3_client.delete_object(
            Bucket=s3_bucket, Key=content_key)
      if 'NextContinuationToken' in response:
        next_token = response['NextContinuationToken']
      else:
        print("Done")
        break
  except Exception as ex:
    print(str(ex))
    raise ex

if __name__ == "__main__":
  s3_client = boto3.client('s3')
  delete_directory(s3_client, "your-s3-bucket", "target/directory")

関連記事

Amazon S3 ~ 入門編 ~
https://dk521123.hatenablog.com/entry/2017/03/06/212734
Amazon S3AWS CLIでS3を操作する ~
https://dk521123.hatenablog.com/entry/2017/04/01/235355
boto3 API / list_objects_v2 の 使用上の注意 と その対策
https://dk521123.hatenablog.com/entry/2019/12/06/232617
AWS のコスト節約を考える ~ S3編 ~
https://dk521123.hatenablog.com/entry/2020/07/22/195336
Python ~ 基本編 / YAML
https://dk521123.hatenablog.com/entry/2019/10/16/225401
Amazon EMR ~ boto3 編 ~
https://dk521123.hatenablog.com/entry/2020/06/24/173334
PySpark で 出力ファイル名を変更する
https://dk521123.hatenablog.com/entry/2021/05/12/003047