プログラムの超個人的なメモ

Memo for Programming.

【Shell】AWS上のファイルを一部取得するシェル

Shell AWS

■ はじめに

AWS S3 上にあるファイルをダウンロードして
一部を取得するようなシェルを書く

目次

【１】今回の実装方針
【２】必要なライブラリ
【３】サンプル
【４】Tips
　１）ディレクトリ配下のファイル名一覧を取得

【１】今回の実装方針

[1] 初期処理（以下を準備）
 + 出力用ディレクトリを用意
 + 取得したいディレクトリが格納されている配列（今回は「連想配列」）を用意
[2] その配列をループし「aws s3 cp s3://xxx/xx1/ ./xx1 --recursive」でダウンロード
[3] 取得したディレクトリ内のファイルをループ
[4] 「head -n 5 xx1.csv > ./output/xx1.csv」で一部取得
[5] 後処理（処理が終わったら、以下を行う）
 + ダウンロードしたファイルを全て削除
 + 出力したファイルをディレクトリごとZIPで固める

【２】必要なライブラリ

* AWS CLI
* ZIP

【３】サンプル

#!/bin/sh

echo "********** Initialize... **********"
LINE_NUMBER=5
WAITTING_TIME="30s"
OUTPUT_ZIP_PASSWORD="your-password"

OUTPUT_DIR_NAME="output"
rm -rf ${OUTPUT_DIR_NAME}
mkdir ${OUTPUT_DIR_NAME}

OUTPUT_FILE="./${OUTPUT_DIR_NAME}.zip"
rm -rf ${OUTPUT_FILE}

TEMP_DIR_NAME="temp"
rm -rf ${TEMP_DIR_NAME}
mkdir ${TEMP_DIR_NAME}

declare -A TARGET_PATH;

TARGET_PATH=(
  ["case1_xxx1"]="s3://your-s3-ex1-bucket/xxx1/"
  ["case1_xxx2"]="s3://your-s3-ex1-bucket/xxx2/"
  ["case2_xxx1"]="s3://your-s3-ex2-bucket/xxx1/"
  ["case2_xxx2"]="s3://your-s3-ex2-bucket/xxx2/"
)

# Loop
for target_key in ${!TARGET_PATH[@]};
do
  target_path=${TARGET_PATH[$target_key]}
  echo "********** Start. ${target_key} - ${target_path} **********"

  output_path="${OUTPUT_DIR_NAME}/${target_key}/"
  mkdir -p ${output_path}
  temp_output_path="${TEMP_DIR_NAME}/${target_key}/"
  mkdir -p ${temp_output_path}

  aws s3 cp "${target_path}" "${temp_output_path}" --recursive
  echo "============= Wait for downloading... in ${WAITTING_TIME} ============="
  sleep ${WAITTING_TIME}
  echo "============= Wake up!!! ============="

  target_file_path_list=`find ${temp_output_path} -type f`
  for target_file_path in ${target_file_path_list[@]};
  do
    echo "============= ${target_file_path} ============="
    target_file_name=$(basename ${target_file_path})
    head -n ${LINE_NUMBER} ${target_file_path} > "${output_path}${target_file_name}"
  done
done

echo "********** Finalize... **********"

# ZIP with password
zip -e -r --password=${OUTPUT_ZIP_PASSWORD} ${OUTPUT_FILE} ${OUTPUT_DIR_NAME}

# Remove temp directory
rm -rf ${TEMP_DIR_NAME}
# Remove output directory
rm -rf ${OUTPUT_DIR_NAME}

echo "+*+*+*+*+*+*+*+*+*+*+*+*+*+*+*+*+*+*+*+*+*+*+*+*"
echo "Done... See ${OUTPUT_DIR_NAME}"
echo "+*+*+*+*+*+*+*+*+*+*+*+*+*+*+*+*+*+*+*+*+*+*+*+*"

【４】Tips

１）ディレクトリ配下のファイル名一覧を取得

find /home/dir1/ -type f

https://gist.github.com/Buravo46/e0028f890757fe38e6cd10ffd9dc9fae

list=`find ${temp_output_path} -type f`
for item in ${list[@]};
do
  echo "FILEPATH: "${item}
  echo "FILENAME: "`basename ${item}`
done

findコマンド
https://tech-blog.rakus.co.jp/entry/20220831/find

* Type => -type d: ディレクトリ, -type f: ファイル, -type l: シンボリックリンク

関連記事

大きいファイルを扱う際のコマンド
https://dk521123.hatenablog.com/entry/2020/06/12/000000
シェル～基本編 / 配列～
https://dk521123.hatenablog.com/entry/2021/08/11/000000
シェル～基本編 / 連想配列 - ディクショナリ～
https://dk521123.hatenablog.com/entry/2021/09/11/000000