Machine Learning
=================

- `Machine Learning Glossary | Google Developers <https://developers.google.com/machine-learning/glossary>`_

Tutorial
--------------

- `Introduction to Machine Learning | Machine Learning Crash Course <https://developers.google.com/machine-learning/crash-course/ml-intro>`_
- `CS229: Machine Learning, Stanford | Autumn 2018 <https://youtube.com/playlist?list=PLoROMvodv4rMiGQp3WXShtMGgzqpfVfbU>`_


GitHub
--------------

- `Géron, A. (2019). Hands-on Machine Learning with Scikit-Learn, Keras, and TensorFlow: Concepts, Tools, and Techniques to Build Intelligent Systems. O'Reilly Media, Incorporated. <https://github.com/ageron/handson-ml2>`_
- `Bishop, C. (2006). Pattern Recognition and Machine Learning. Springer. <https://github.com/ctgk/PRML>`_
- Hastie, T., Tibshirani, R., & Friedman, J. (2001). The Elements of Statistical Learning. Springer New York Inc..
    - https://github.com/empathy87/The-Elements-of-Statistical-Learning-Python-Notebooks 
    - https://github.com/maitbayev/the-elements-of-statistical-learning

- `須山敦志. (2017). 機械学習スタートアップシリーズ ベイズ推論による機械学習入門. 講談社. <https://github.com/sammy-suyama/BayesBook>`_
- https://github.com/josephmisiti/awesome-machine-learning

Book
--------------

- `Hastie, T., Tibshirani, R., & Friedman, J. (2001). The Elements of Statistical Learning. Springer New York Inc.. <https://web.stanford.edu/~hastie/ElemStatLearn/>`_
- `Christoph, M., (2020). Interpretable Machine Learning. Lulu.com. <https://hacarus.github.io/interpretable-ml-book-ja/>`_
- `Machine Learning Systems Design - Chip Huyen <https://huyenchip.com/machine-learning-systems-design/toc.html>`_


Web Page
--------------

- `Choosing the Right Machine Learning Algorithm - Rajat Harlalka <https://hackernoon.com/choosing-the-right-machine-learning-algorithm-68126944ce1f>`_
- `10. Common pitfalls and recommended practices - scikit-learn <https://scikit-learn.org/stable/common_pitfalls.html>`_

Metrics
^^^^^^^^

- `機械学習の評価指標 分類編：適合率や再現率、AUC（ROC曲線、PR曲線）を解説 - codexa <https://www.codexa.net/ml-evaluation-cls/>`_

Cross Validation
^^^^^^^^

- `Python: パラメータ選択を伴う機械学習モデルの交差検証について - もみじあめ <https://blog.amedama.jp/entry/2018/07/23/084500>`_
- `一流の「ものさし」職人になろう　Cross Validation (交差検証)を深堀り - Hatomugi <https://qiita.com/Hatomugi/items/620c1bc757266b00e87f>`_
- `Mean(scores) vs Score(concatenation) in cross validation - Stack Exchange <https://stats.stackexchange.com/questions/34611/meanscores-vs-scoreconcatenation-in-cross-validation>`_
- `PCA and the train/test split - Cross Validated <https://stats.stackexchange.com/questions/55718/pca-and-the-train-test-split>`_

Hyperparameters
^^^^^^^^

- `機械学習におけるハイパーパラメータ最適化の理論と実践 - 野村将寛 <https://logmi.jp/tech/articles/322678>`_

Regularization
^^^^^^^^

- `鈴木 大慈, 過学習と正則化, 応用数理, 2018, 28 巻, 2 号, p. 28-33, 公開日 2018/09/30 <https://doi.org/10.11540/bjsiam.28.2_28>`_
- `Differences between L1 and L2 as Loss Function and Regularization - log0 <http://www.chioka.in/differences-between-l1-and-l2-as-loss-function-and-regularization/>`_

Class Imbalance
^^^^^^^^

- `8 Tactics to Combat Imbalanced Classes in Your Machine Learning Dataset - Jason Brownlee <https://machinelearningmastery.com/tactics-to-combat-imbalanced-classes-in-your-machine-learning-dataset/>`_


Multicollinearity
^^^^^^^^

- `regression - Multicollinearity and predictive performance - Cross Validated <https://stats.stackexchange.com/questions/361247/multicollinearity-and-predictive-performance>`_


機械学習モデル作成の流れ
--------------

課題設定
^^^^^^^^

- 入力による課題の分類
    - ラベルがあるなら教師あり学習
    - ラベルがなく構造を見つけたいなら教師なし学習
    - 環境との相互作用で目的関数を最適化したいなら強化学習
- 出力による課題の分類
    - 数値なら回帰
    - クラスなら分類
    - グループならクラスタリング
    - 異常の検出なら異常検出
- ストレージの空き容量
- 予測/学習にどのぐらいの時間を掛けられるか
- 人間のベースラインを確認する

データの確認
^^^^^^^^

- 要約統計量(パーセンタイル，平均値と中央値，相関関係)
- 箱ヒゲ図で外れ値を確認，ヒストグラム，散布図
- 欠損値, 外れ値の除去

初期設定
^^^^^^^^

- シード値の固定
- 評価の有効数字
- 初期化が正しくできているか確認する
- Train Lossが減少するか確認する
- 入力に依存しないベースライン（すべての入力を0にしたパフォーマンス）を確認する
- 少数サンプル（2つ？）に過学習するか確認する

単純な方法から試す (i.e. reducing bias)
^^^^^^^^

- Data Augmentationをしない
- 正則化をしない
- モデルのサイズを大きくする
- 入力データの次元を大きくする

過学習する方法を試してから正則化を加える (i.e. reducing variance)
^^^^^^^^

- Train Lossが過学習してから正則化を加える
    - Train Lossが十分に小さくならなければバグがあるかもしれない -> トラブルシューティング&デバッグ
- 1つずつ正則化を加えてフォーマンスを確認する
- Data Augmentation
- バッチサイズを小さくする（正則化が強くなる）
- ドロップアウト
- Weight Decay
- Early Stopping
- ハイパラチューニング
- 入力データの次元を小さくする
- モデルのサイズを小さくする
- モデルアンサンブル

References
^^^^^^^^

- `A Recipe for Training Neural Networks - Andrej Karpathy <http://karpathy.github.io/2019/04/25/recipe/>`_
- `Troubleshooting Deep Neural Networks - Josh Tobin <http://josh-tobin.com/troubleshooting-deep-neural-networks.html>`_
- `Choosing the Right Machine Learning Algorithm - Rajat Harlalka <https://hackernoon.com/choosing-the-right-machine-learning-algorithm-68126944ce1f>`_