■ はじめに

https://dk521123.hatenablog.com/entry/2019/10/25/232155

の続き。
AWS Glue のトラブルについて、少しづつだが記録しておく
今回は、クローラのトラブルについて、まとめる。

boto3 AWS Glue API のトラブル～ trigger全般編～
https://dk521123.hatenablog.com/entry/2020/10/23/110821
boto3 AWS Glue API のトラブル～ scheduled trigger編～
https://dk521123.hatenablog.com/entry/2020/01/16/205331
boto3 AWS Glue API のトラブル～ job/crawler編～
https://dk521123.hatenablog.com/entry/2020/02/05/223307
AWS Glue のトラブル～ job編 - [1] ～
https://dk521123.hatenablog.com/entry/2019/10/25/232155
AWS Glue のトラブル～ job編 - [2] ～
https://dk521123.hatenablog.com/entry/2020/10/12/152659
AWS Glue のトラブル～ job編 - [3] ～
https://dk521123.hatenablog.com/entry/2021/02/16/145848

【１】Crawlerからエラー「ERROR: Internal Service Exception」が発生

　Crawler を実行後に
エラー「ERROR: Internal Service Exception」が発生

原因

* 代表的な原因が以下に記載されている

https://aws.amazon.com/jp/premiumsupport/knowledge-center/glue-crawler-internal-service-exception/

AWS Glue データカタログ

* 列名は 255 文字以内で、特殊文字を含めないようにしてください。
　列の要件の詳細については、列を参照してください。

* 不正な形式のデータをチェックします。
　例えば、列名が「[\u0020-\uD7FF\uE000-\uFFFD\uD800\uDC00-\uDBFF\uDFFF\t]」の
　正規表現パターンに従っていない場合、クローラーは機能しません。

* 長さが 0 の列をチェックします。
　これは、データの列がテーブルのデータ形式と一致しない場合に発生します。

* データに「(precision, scale)」形式の DECIMAL 列が含まれている場合、
　スケール値が精度値以下であることを確認してください。

AWS KMS

* AWS KMS を使用している場合、AWS Glue クローラーは
　AWS KMS にアクセスできる必要があります。

https://docs.aws.amazon.com/ja_jp/glue/latest/dg/encryption-security-configuration.html

より抜粋
~~~~~~~~~~
VPC 内に AWS KMS VPC エンドポイントを作成することができます。
この手順を実行しないと、ジョブまたはクローラが
ジョブの kms timeout またはクローラの internal service exception 
で失敗する可能性があります。

・・・略・・・

VPC コンソールで次の操作を行う必要があります。

* [プライベート DNS 名を有効にする] を選択します。
* Java Database Connectivity (JDBC) にアクセスするジョブまたは
　クローラに使用する [セキュリティグループ] (自己参照ルールを持つ) を選択します。
~~~~~~~~~~

【２】Crawlerからエラー「Error Access Denied (Service: Amazon S3 Status Code 403...)」が発生する

暗号化したS3に対して、AWS Crawler でクローリングしたところ、
以下の「エラー内容」が表示された

エラー内容

ERROR: Error Access Denied
(Service: Amazon S3 Status Code 403; Error Code: AccessDenied; Request ID: xxxxxx; S3 Extended Request ID: xxxxx)
retrieving file at s3://your-bucket/xxxx/xxxx/xxxxx.
Tables created did not infer schemas from this file.

原因

* 2点考えられる

１）使用しているRoleに対して、暗号化キーが設定されていない

https://stackoverflow.com/questions/51899441/aws-glue-access-denied-for-crawler-with-administrator-policy-attached

２）対象のS3バケットの設定で、使用しているRoleに対して、Deny（拒否）が設定されている

AWS Glue ～入門編～
https://dk521123.hatenablog.com/entry/2019/10/01/221926
boto3 AWS Glue API のトラブル～ trigger全般編～
https://dk521123.hatenablog.com/entry/2020/10/23/110821
boto3 AWS Glue API のトラブル～ scheduled trigger編～
https://dk521123.hatenablog.com/entry/2020/01/16/205331
boto3 AWS Glue API のトラブル～ job/crawler編～
https://dk521123.hatenablog.com/entry/2020/02/05/223307
AWS Glue のトラブル～ job編 - [1] ～
https://dk521123.hatenablog.com/entry/2019/10/25/232155
AWS Glue のトラブル～ job編 - [2] ～
https://dk521123.hatenablog.com/entry/2020/10/12/152659
AWS Glue のトラブル～ job編 - [3] ～
https://dk521123.hatenablog.com/entry/2021/02/16/145848

プログラムの超個人的なメモ

Memo for Programming.

【トラブル】【AWS】AWS Glue のトラブル～ crawler編～

■ はじめに

目次

【１】Crawlerからエラー「ERROR: Internal Service Exception」が発生

原因

【２】Crawlerからエラー「Error Access Denied (Service: Amazon S3 Status Code 403...)」が発生する

エラー内容

原因

関連記事