【AWS】Amazon S3 ~ Boto3でファイル存在チェック ~

■ はじめに

https://dk521123.hatenablog.com/entry/2019/10/21/230004

の続き。

 今回は、
Boto3でS3上にあるファイル存在チェックについて
考えてみる。

目次

案1:list_objects_v2 を使う方法
案2:例外「ClientError:Error Code=404」を利用する方法
補足:パフォーマンスについて

案1:list_objects_v2 を使う方法

import boto3


s3_client = boto3.client("s3")

def has_file(
  bucket_name: str,
  target_key: str,
  is_allowed_zero_byte: bool=True
) -> bool:
  """Check to exist a file in S3.
  Args:
      bucket_name (str): S3 bucket name
      target_key (str): target of S3 key
      is_allowed_zero_byte (bool, optional): True: Allow an empty file, False: Not allow. Defaults to True.
  Returns:
      bool: True: Exists file. False: Not exists.
  """
  response = s3_client.list_objects_v2(
    Bucket=bucket_name,
    Prefix=target_key,
  )
  s3_contents = response["Contents"]
  for s3_content in s3_contents:
    if s3_content.get("Key") == target_key:
      if not is_allowed_zero_byte and s3_content.get("Size") == 0:
        # is_allowed_zero_byte=Trueを指定し、かつ、空ファイル(0byte)だった場合
        return False
      return True
  return False

参考文献
https://qiita.com/ikai/items/22281b3cfc9636d8587d

案2:例外「ClientError:Error Code=404」を利用する方法

import boto3
try:
  from botocore.exceptions import ClientError
except ImportError:
  pass


s3_client = boto3.client('s3')

def get_file_content(
  bucket_name: str,
  target_key: str,
  file_code: str="utf-8"
) -> str, int:
  """Get file content in S3.
  Args:
    bucket_name (str): S3 bucket name
    target_key (str): target of S3 key
    file_code (str, optional): File code. Defaults to "utf-8".
  Raises:
    ex: Exception from boto3/get_object etc
  Returns:
    str: file content in S3 (None: file not found)
    int: Size of the body in bytes
  """
  try:
    response = s3_client.get_object(
      Bucket=bucket_name, Key=target_key)
    body = response["Body"].read()
    return body.decode(file_code), int(response["ContentLength"])
  except ClientError as ex:
    if ex.response['Error']['Code'] == '404':
      # File not found
      return None, -1
    else:
      raise ex

def has_file(
  bucket_name: str,
  target_key: str,
  is_allowed_zero_byte: bool=True
) -> bool:
  """Check to exist a file in S3.
  Args:
      bucket_name (str): S3 bucket name
      target_key (str): target of S3 key
      is_allowed_zero_byte (bool, optional): True: Allow an empty file, False: Not allow. Defaults to True.
  Returns:
      bool: True: Exists file. False: Not exists.
  """
  content, file_size = get_file_content(bucket_name, target_key)
  if content is None or \
    (is_allowed_zero_byte and file_size <= 0):
    return False
  else:
    return True

補足:パフォーマンスについて

上記の実装と多少異なるが、以下のサイトで比較している。

https://www.peterbe.com/plog/fastest-way-to-find-out-if-a-file-exists-in-s3

より抜粋
~~~~~~
FUNCTION: _key_existing_size__list Used 511 times
    SUM    148.2740752696991
    MEAN   0.2901645308604679
    MEDIAN 0.2569708824157715
    STDEV  0.17742598775696436

FUNCTION: _key_existing_size__head Used 489 times
    SUM    249.79622673988342
    MEAN   0.510830729529414
    MEDIAN 0.4780092239379883
    STDEV  0.14352671121877011

Because it's network bound, it's really important to avoid the 'MEAN'
 and instead look at the 'MEDIAN'.
My home broadband can cause temporary spikes.

Clearly, using client.list_objects_v2 is faster. It's 90% faster than client.head_object.
~~~~~~

関連記事

Amazon S3 ~ 入門編 ~
https://dk521123.hatenablog.com/entry/2022/02/26/182526
Amazon S3 ~ Boto3編 ~
https://dk521123.hatenablog.com/entry/2019/10/21/230004
boto3 API / list_objects_v2 の 使用上の注意 と その対策
https://dk521123.hatenablog.com/entry/2019/12/06/232617
Python ~ 基本編 / パス情報抽出 ~
https://dk521123.hatenablog.com/entry/2022/02/23/000000