■ はじめに

https://dk521123.hatenablog.com/entry/2019/10/19/104805

でJSONの扱いはやったが、
テストデータとして、ndjson のファイルを作る
可能性がでてきたので、予習。

【０】ndjson

* ndjson = Newline Delimited JSON
 => JSON値を改行文字で区切ったデータ

* 区切り文字に使う改行は、「\n」
　（「\r\n」でも可ってndjson の以下の公式サイトで入っている)

http://ndjson.org/

ndjsonの例

{"id":"xx01","name":"Mike"}
{"id":"xx02","name":"Tom"}
{"id":"xx03","name":"Sam"}

【１】ndjsonモジュールを使う

* ndjsonモジュールを使う
 => 標準の json モジュールのように扱うことができる

https://pypi.org/project/ndjson/

１）インストール

pip install ndjson

２）サンプル

import ndjson

# load from file-like objects
with open('input.ndjson') as input_file:
  input_data = ndjson.load(input_file)

# convert to and from objects
text = ndjson.dumps(input_data)
converted_data = ndjson.loads(text)

# dump to file-like objects
with open('output.ndjson', 'w') as output_file:
  ndjson.dump(converted_data, output_file)

input.ndjson

{"date":"2021-04-12","gender":"F","age":"17"}
{"date":"2022-09-09","gender":"F","age":"45"}
{"date":"2001-12-15","gender":"M","age":"32"}

output.ndjson

{"date": "2021-04-12", "gender": "F", "age": "17"}
{"date": "2022-09-09", "gender": "F", "age": "45"}
{"date": "2001-12-15", "gender": "M", "age": "32"}

【２】pandas を使う

* Pandas があれば読み込める
 => lines=True (Read the file as a json object per line.)

https://pandas.pydata.org/docs/reference/api/pandas.read_json.html
https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.to_json.html

１）インストール

pip install pandas

２）サンプル

import pandas as pd

df = pd.read_json('./input.ndjson', lines=True)
print(df)

df.to_json(
  './output2.ndjson',
  orient="records",
  date_format="iso",
  lines=True
)

print('Done')

出力結果

        date gender  age
0 2021-04-12      F   17
1 2022-09-09      F   45
2 2001-12-15      M   32
Done

output2.ndjson (dateが変わってもうた、、、)

{"date":"2021-04-12T00:00:00.000Z","gender":"F","age":17}
{"date":"2022-09-09T00:00:00.000Z","gender":"F","age":45}
{"date":"2001-12-15T00:00:00.000Z","gender":"M","age":32}

補足：ファイル出力「to_json」の注意点

[1] lines=True時には「orient=records」の指定が必要
https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.to_json.html

より抜粋
~~~~~~~~
lines bool, default False
If ‘orient’ is ‘records’ write out line-delimited json format.
Will throw ValueError if incorrect ‘orient’ since others are not list-like.
~~~~~~~~

指定しないと以下の例外が発生する
~~~~~~~~
ValueError: 'lines' keyword only valid when 'orient' is records
~~~~~~~~

[2] dateの指定を明示的にしないと数字で出力される

df.to_json(
  './output3.ndjson',
  orient="records",
  #date_format="iso", # コメントアウト
  lines=True
)

output3.ndjson (dateに注目)

{"date":1618185600000,"gender":"F","age":17}
{"date":1662681600000,"gender":"F","age":45}
{"date":1008374400000,"gender":"M","age":32}

参考文献

https://qiita.com/suin/items/246691382ea2a2b22031
https://zenn.dev/katoaki/articles/aef091a3580f5e
読み込み
https://www.hamlet-engineer.com/posts/EEggs_0112.html

Python ～基本編 / JSON ～
https://dk521123.hatenablog.com/entry/2019/10/19/104805
Pandas ～基本編 / JSON編～
https://dk521123.hatenablog.com/entry/2022/02/16/000000
Python ～基本編 / ファイル圧縮/解凍～
https://dk521123.hatenablog.com/entry/2019/09/03/000000
Snowflakeアンロードの使用上の注意
https://dk521123.hatenablog.com/entry/2022/07/06/145724

プログラムの超個人的なメモ

Memo for Programming.

【Python】Python ～ ndjson を扱う～

■ はじめに

目次

【０】ndjson

【１】ndjsonモジュールを使う

１）インストール

２）サンプル

【２】pandas を使う

１）インストール

２）サンプル

補足：ファイル出力「to_json」の注意点

参考文献

関連記事