■ はじめに

https://dk521123.hatenablog.com/entry/2019/10/22/014957
https://dk521123.hatenablog.com/entry/2020/10/14/000000
https://dk521123.hatenablog.com/entry/2021/04/07/105858

の続き。

to_dict / to_json などを使用したのだが、
他にも色々あるので調べてみた。

以下に載せた形式以外にもあるみたいだけど、
載せきれないので、気になったものだけあげておく。

【１】出力ファイル

１）to_csv

https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.to_csv.html

* CSVファイル出力
* 以下の関連記事を参照のこと

Pandas ～基本編 / CSV編～
https://dk521123.hatenablog.com/entry/2020/11/17/000000

２）to_excel

https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.to_excel.html

* Excelファイル出力
* 以下の関連記事を参照のこと

Pandas ～基本編 / Excel編～
https://dk521123.hatenablog.com/entry/2020/11/18/000000
Pandas ～基本編 / Excel => CSVに変換～
https://dk521123.hatenablog.com/entry/2021/01/25/000000

３）to_parquet

https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.to_parquet.html

* Parquet形式での出力
 => Parquetファイルの詳細は、以下の関連記事を参照のこと）

https://dk521123.hatenablog.com/entry/2020/06/03/000000
圧縮形式 / compression

compression=‘snappy’, ‘gzip’, ‘brotli’, None
 => default は、‘snappy’
 => None は、非圧縮

サンプル

* 以下の関連記事を参照のこと。

https://dk521123.hatenablog.com/entry/2021/11/13/095519

４）to_pickle

https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.to_pickle.html

* pickleファイルで出力

５）to_latex

https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.to_latex.html

* LaTexファイルで出力

参考文献
https://www.haya-programming.com/entry/2018/05/31/020009

６）to_feather

https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.to_feather.html

* バイナリFeatherファイル（列志向）で出力

７）to_hdf

https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.to_hdf.html

* HDF5ファイル
（Hierarchical Data Format=階層的データ形式。バージョン5）で出力

８）to_stata

https://pandas.pydata.org/pandas-docs/version/1.2.0/reference/api/pandas.DataFrame.to_stata.html

* Stata dtaファイル(***.dta)で出力

９）to_html

https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.to_html.html

* HTMLファイルで出力

出力例

<table border="1" class="dataframe">
  <thead>
    <tr style="text-align: right;">
      <th></th>
      <th>item1</th>
      <th>item2</th>
      <th>item3</th>
    </tr>
  </thead>
  <tbody>
    <tr>
      <th>0</th>
      <td>1</td>
      <td>2</td>
      <td>3</td>
    </tr>
    <tr>
      <th>1</th>
      <td>4</td>
      <td>5</td>
      <td>6</td>
    </tr>
    <tr>
      <th>2</th>
      <td>7</td>
      <td>8</td>
      <td>9</td>
    </tr>
  </tbody>
</table>

【２】その他

１）to_dict

https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.to_dict.html

* 辞書に変換

圧縮形式 / orient

* orient = ‘dict’, ‘list’, ‘series’, ‘split’, ‘records’, ‘index’
 => とりあえず、「orient=‘records’」を押さえておけば大丈夫そう

サンプル

import pandas as pd

list = [
  {'item1': 1, 'item2': 2, 'item3': 3},
  {'item1': 4, 'item2': 5, 'item3': 6},
  {'item1': 7, 'item2': 8, 'item3': 9}
]
df = pd.DataFrame(list)

print('*********')
# {'item1': {0: 1, 1: 4, 2: 7}, 'item2': {0: 2, 1: 5, 2: 8}, 'item3': {0: 3, 1: 6, 2: 9}}
print(df.to_dict(orient='dict'))

print('*********')
# {'item1': [1, 4, 7], 'item2': [2, 5, 8], 'item3': [3, 6, 9]}
print(df.to_dict(orient='list'))

print('*********')
# {'item1': 0    1
# 1    4
# 2    7
# Name: item1, dtype: int64, 'item2': 0    2
# 1    5
# 2    8
# Name: item2, dtype: int64, 'item3': 0    3
# 1    6
# 2    9
# Name: item3, dtype: int64}
print(df.to_dict(orient='series'))

print('*********')
# {'index': [0, 1, 2], 'columns': ['item1', 'item2', 'item3'], 'data': [[1, 2, 3], [4, 5, 6], [7, 8, 9]]}
print(df.to_dict(orient='split'))

print('*********')
# [{'item1': 1, 'item2': 2, 'item3': 3}, {'item1': 4, 'item2': 5, 'item3': 6}, {'item1': 7, 'item2': 8, 'item3': 9}]
print(df.to_dict(orient='records'))

print('*********')
# {0: {'item1': 1, 'item2': 2, 'item3': 3}, 1: {'item1': 4, 'item2': 5, 'item3': 6}, 2: {'item1': 7, 'item2': 8, 'item3': 9}}
print(df.to_dict(orient='index'))

２）to_json

https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.to_json.html

* JSON形式に変換
 => サンプルなどは、以下の関連記事を参照のこと。

Pandas ～基本編 / JSON編～
https://dk521123.hatenablog.com/entry/2022/02/16/000000

使用上の注意

df.to_json の戻り値は、文字列なので、
for などでループさせるには
以下のようにする必要がある

json_data = json.loads(json_str)

３）to_numpy

https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.to_numpy.html

* Numpy 配列に変換

参考文献
https://note.nkmk.me/python-pandas-numpy-conversion/

４）to_sql

https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.to_sql.html

* DBへの書き込み

参考文献
https://www.haya-programming.com/entry/2019/05/03/043334

５）to_gbq

https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.to_gbq.html

* BigQueryへの書き込み

参考文献
https://qiita.com/i_am_miko/items/68cb516ad2be61d59554

参考文献

https://blog.imind.jp/entry/2019/04/12/224942
https://blog.amedama.jp/entry/2018/07/11/081050

プログラムの超個人的なメモ

Memo for Programming.

【Python】 Pandas ～ to_xxxx / 出力編～

■ はじめに

目次

【１】出力ファイル

１）to_csv

２）to_excel

３）to_parquet

４）to_pickle

５）to_latex

６）to_feather

７）to_hdf

８）to_stata

９）to_html

【２】その他

１）to_dict

２）to_json

３）to_numpy

４）to_sql

５）to_gbq

参考文献

関連記事