■ はじめに
https://dk521123.hatenablog.com/entry/2020/03/02/233902
の続き。 YouTube で機械学習(Machine Learnig; ML)を 勉強できるいい感じ動画があったのでメモ。 (英語なので、ついでに英語の勉強もできる)
教材(動画)
全10回あるみたい。
はじめに - 機械学習レシピ1
https://www.youtube.com/watch?v=cKxRvEZd3Mw
決定木を可視化する- 機械学習2
https://www.youtube.com/watch?v=tNa99PG8hR8
良い特徴量になるのは? - 機械学習レシピ3
https://www.youtube.com/watch?v=N9fDIAflCMY
パイプラインを書こう - 機械学習レシピ
https://www.youtube.com/watch?v=84gqSbLcBFE
最初の分類器を書こう - 機械学習レシピ5
https://www.youtube.com/watch?v=AoeEHqVSNOw
TensorFlow for Poets 画像分類器をトレーニングする - 機械学習レシピ6
https://www.youtube.com/watch?v=cSKfRcEDGUs
Classifying Handwritten Digits with TF.Learn - Machine Learning Recipes #7
https://www.youtube.com/watch?v=Gj0iyo265bc
Let’s Write a Decision Tree Classifier from Scratch - Machine Learning Recipes #8
https://www.youtube.com/watch?v=LDRbO9a6XPU
Intro to Feature Engineering with TensorFlow - Machine Learning Recipes #9
https://www.youtube.com/watch?v=d12ra3b_M-0
Getting Started with Weka - Machine Learning Recipes #10
https://www.youtube.com/watch?v=TF1yh5PKaqI
補足
今回は、「機械学習レシピ1~機械学習レシピ4 (パイプラインを書こう)」を扱う。 ただし、以下の動画は、scikit-learn をあまり扱ってなかったので別記。
良い特徴量になるのは? - 機械学習レシピ3
https://www.youtube.com/watch?v=N9fDIAflCMY
以下の関連記事を参照のこと。
Matplotlib ~ グラフ描画ライブラリ ~
https://dk521123.hatenablog.com/entry/2020/03/01/000000
【1】Hello world (決定木)
* 決定木(Decision Tree) を使って、Hello world。 * 決定木については、以下の関連記事を参照。
https://dk521123.hatenablog.com/entry/2020/04/04/021413
動画
https://www.youtube.com/watch?v=cKxRvEZd3Mw
サンプル
from sklearn import tree features = [[140, 0], [130, 0], [150, 1], [170, 1]] labels = [0, 0, 1, 1] # 決定木(Decision Tree) clf = tree.DecisionTreeClassifier() clf = clf.fit(features, labels) print(clf.predict([[153, 1]]))
出力結果
[1]
【2】決定木(Decision Tree)を可視化する
分類器(Classfiers; クラシファイア)
動画
https://www.youtube.com/watch?v=tNa99PG8hR8
データセットについて
https://scikit-learn.org/stable/datasets/index.html
https://en.wikipedia.org/wiki/Iris_flower_data_set
https://scikit-learn.org/stable/auto_examples/datasets/plot_iris_dataset.html
サンプル
import numpy as np import pydotplus from sklearn.datasets import load_iris from sklearn import tree # [1] Import dataset iris = load_iris() print("[アヤメ(iris)データ]") print(iris.feature_names) print(iris.target_names) print(iris.data[0]) print(iris.target[0]) # Output for all data #for i in range(len(iris.target)): # print("Example {}: label {}, features".format( # i, iris.target[i], iris.data[i] # )) # [2] Train a classifier test_index = [0, 50, 100] # training data train_target = np.delete( iris.target, test_index) train_data = np.delete( iris.data, test_index, axis=0) # testing data test_target = iris.target[test_index] test_data = iris.data[test_index] classfier = tree.DecisionTreeClassifier() classfier.fit(train_data, train_target) # [3] Predict label for new flower print("Predict") print(classfier.predict(test_data)) print("Target Answer") print(test_target) # [4] Visualize the tree dot_data = tree.export_graphviz( classfier, feature_names=iris.feature_names, filled=True, rounded=True, impurity=False) graph = pydotplus.graph_from_dot_data(dot_data) graph.write_pdf("viz_tree.pdf")
出力結果
[アヤメ(iris)データ] ['sepal length (cm)', 'sepal width (cm)', 'petal length (cm)', 'petal width (cm)'] ['setosa' 'versicolor' 'virginica'] [5.1 3.5 1.4 0.2] 0 Predict [0 1 2] Target Answer [0 1 2] "viz_tree.pdf" もPDFで出力されている
【3】パイプラインを書こう
動画
https://www.youtube.com/watch?v=84gqSbLcBFE
サンプル
from sklearn.datasets import load_iris from sklearn.model_selection import train_test_split from sklearn.metrics import accuracy_score from sklearn.neighbors import KNeighborsClassifier from sklearn import tree iris = load_iris() x = iris.data y = iris.target x_train, x_test, y_train, y_test = train_test_split( x, y, test_size=0.5, train_size=0.5) classifier = tree.DecisionTreeClassifier() classifier.fit(x_train, y_train) predictions = classifier.predict(x_test) print("Tree") print(accuracy_score(y_test, predictions)) kn_classifier = KNeighborsClassifier() kn_classifier.fit(x_train, y_train) kn_predictions = kn_classifier.predict(x_test) print("KNeighbors") print(accuracy_score(y_test, kn_predictions))
出力結果
# 実行1回目 Tree 0.9333333333333333 KNeighbors 0.9866666666666667 # 実行2回目 Tree 0.96 KNeighbors 0.9866666666666667
【4】最初の分類器を書こう
動画
https://www.youtube.com/watch?v=AoeEHqVSNOw
サンプル
from sklearn.datasets import load_iris from sklearn.model_selection import train_test_split from sklearn.metrics import accuracy_score from scipy.spatial import distance # 独自のカスタム分類器クラス class ScrappyKNN(): def fit(self, x_train, y_train): self.x_train = x_train self.y_train = y_train def predict(self, x_train): predictions = [] for row in x_train: label = self.closest(row) predictions.append(label) return predictions def closest(self, row): best_distance = distance.euclidean(row, self.x_train[0]) best_index = 0 for i in range(1, len(self.x_train)): target_distance = distance.euclidean(row, self.x_train[i]) if target_distance < best_distance: best_distance = target_distance best_index = i return self.y_train[best_index] iris = load_iris() x = iris.data y = iris.target x_train, x_test, y_train, y_test = train_test_split( x, y, test_size=0.5, train_size=0.5) custom_classifier = ScrappyKNN() custom_classifier.fit(x_train, y_train) predictions = custom_classifier.predict(x_test) print("Custom") print(accuracy_score(y_test, predictions))
出力結果
# 実行1回目 Custom 0.9466666666666667 # 実行2回目 Custom 0.9733333333333334
関連記事
scikit-learn ~ 入門編 ~
https://dk521123.hatenablog.com/entry/2020/03/02/233902
scikit-learn ~ 線形回帰 ~
https://dk521123.hatenablog.com/entry/2020/07/04/000000
scikit-learn ~ リッジ回帰 ~
https://dk521123.hatenablog.com/entry/2020/04/25/174503
scikit-learn ~ 重回帰 / ロッソ回帰・エラスティックネット ~
https://dk521123.hatenablog.com/entry/2020/11/29/193247
scikit-learn ~ 決定木 / ランダムフォレスト ~
https://dk521123.hatenablog.com/entry/2020/04/04/021413
TensorFlow ~ 入門編 ~
https://dk521123.hatenablog.com/entry/2018/02/16/103500
TensorFlow ~ 環境構築 / Windows 編 ~
https://dk521123.hatenablog.com/entry/2018/02/17/102927
Keras ~ 深層学習用ライブラリ ~
https://dk521123.hatenablog.com/entry/2020/03/03/235302
Matplotlib ~ グラフ描画ライブラリ ~
https://dk521123.hatenablog.com/entry/2020/03/01/000000
NumPy ~ 数値計算ライブラリ ~
https://dk521123.hatenablog.com/entry/2018/03/28/224532
機械学習に関する覚書
https://dk521123.hatenablog.com/entry/2018/10/23/230800