Jon Rumsey

An online markdown blog and knowledge repository.


Project maintained by nojronatron Hosted on GitHub Pages — Theme by mattgraham

Machine Learning

Introduction to Machine Learning

Table of Contents

About AI and ML

Artificial Intelligence (AI): Science of getting machines to accomplish tasks that typically require human level experience and skills.

Machine Learning (ML): A subset of AI

Deep Learning: A further class of AI and ML, beyond the scope of this writup.

Classical Machine Learning

Why use it?

ML Core Concepts

ML History

ML Statistical Techniques

Includes: Regression, Classifications, Clustering, etc.

Core concepts:

Additional tidbits:

Tools To Build and Use ML

Note: Windows Store can be used to install Python3.

Regression Models

Investigate the relationship between variables.

Linear Regression: Straight-line trend between data points. Useful for numerical data analysis.

Polynomian Regression: Curved regression plotting e.g. x^2 + 2x + 3

Logistic Regression: Categorical prediction based on results nearing 0 or 1.

Build Regression Models

  1. Install Python 3 and Jupyter Notebooks (see Jupyter Notebooks for more).
  2. Clone ML-for-beginners.git from GitHub.
  3. Activate venv and add .venv to GitIgnore.
  4. Pip Install Pandas - Data analysis and manipulation tool.
  5. Pip Install MatPlotLib - Create data visualizations.
  6. Pip Install NumPy - Library used in scientific computing.
  7. Pip Install SciKit-Learn - Predictive data analysis.
  8. Pip Install IPyKernel - backend for Jupyter Notebooks.

Note: Use venv to virtualize environments (.venv directory within the project).

Note: Use piplist or browse the 'lib' directory to verify installations succeeded.

Analyze and Clean Data

Note: Empty cells within data will not be helpful before extracting information from it.

A DataFrame is a subset of data that has been normalized and prepared for use in ML.

  1. Determine which columns of data are actually necessary for the current analysis.
  2. Filter-out rows with empty cells data.
  3. Drop columns that are not necessary for this analysis.
  4. Normalize the data implementing code that compares data types in equivalent amounts. E.g. bushel sizes, or quarts vs gallons.

Key Takeaway: Python code will need to be written to cleans data, so become familiar with traversing data, finding non-normalized data cells, and applying a function to normalize them while creating a Data Frame.

Using MatPlotLib

MatPlotLib:

Data visualiation types: Plot, Scatter, Bar, Vector Fields, Statistics, Contours, and more.

Graph visualizations can be adjusted using many built-in options.

Note: Pandas uses MatPlotLib.

Resources and Attributions

Primary source of material is from Microsoft Reactor: Introduction to Machine Learning for Beginners.

Host: Beatriz Stollnitz @beastollnitz - Principal Cloud Advocate AI/ML.

Return to ContEd Index

Return to root README