Machine learning (an overview)
glossing over the many ways a thing can learn
2025-12-18 17:53
// updated 2025-12-19 11:00
// updated 2025-12-19 11:00
Machine learning refers to the gathering, processing and analyzing of data by some human-made device, based on statistics and patterns:
Machine learning types
Machine learning could fall under:
Supervised learning
- training a model to recognize inputs, based on previously labelled datasets, to classify new, unseen data or predict an outcome or an output
- further divided into:
- classification
- assigning a label based on inputs
- regression
- predicting a numeric quantity based on inputs
- quantity based on other records with similar characteristics
- classification
Unsupervised learning
- training a model to classify or group data without the benefit of previously-labelled data
- further divided into:
- clustering
- finding groups of similar or similarly-positioned inputs
- dimensionality reduction
- reducing columns (but not rows) of data
- e.g. for real estate prices, remove duplicate features such as "area in square metres" if another column has "area in square feet"
- reducing columns (but not rows) of data
- association rule learning
- finding tendencies of one variable as a good predictor of another
- e.g. "customers who bought X also bought Y"
- finding tendencies of one variable as a good predictor of another
- clustering
Reinforcement learning
- a device will interact with an environment and adjust based on feedback (either via "rewards" [positive] or "punishment" [negative])
Semi-supervised learning
- a mix between supervised and unsupervised learning (in which some data would have labels, while others won't)
Deep learning
- convolutional neural networks
- recognizes spatial patterns
- great for image-based ("spatially-oriented") data
- recurrent neural networks
- recognizes sequential data
- great for text and speech ("temporally-oriented") data
Machine learning process
Machine learning happens in a process of steps also known as machine learning pipeline; many versions of this pipeline exist, steps and order of steps may vary, but most versions include some form of the "input + processing + output" shape:
Data collection
- gather data from reliable sources
Data processing
- preparation and transformation before th emodel
- check for missing or incorrect values
- check for correct data types
- feature scaling
- transform (or "normalize") the values to a more interpretable range
- e.g. transform from a range of "-337.28 to 5828.91", to "0 to 1"
- this helps both humans and computers more easily distinguish between low and high values
- an optional step if the data values already make sense to everyone using the data
- transform (or "normalize") the values to a more interpretable range
Model training
- split data into "training data" and "testing data"
- ~80% for training and ~20% for testing
Model building
- formula derivation with data visualization
- use the training data to create a model (such as a formula)
- e.g. an equation in the form of
y = mx + bory = m1x1 + m1x2 + ... + mnxn + b(or whatever linear or non-linear equation)
- e.g. an equation in the form of
- graph the model if possible and/or desired
- model typologies include:
- k-nearest neighbours
- decision trees
- random forest
- boosting
- support vector machines
- neutral networks
- use the training data to create a model (such as a formula)
Model testing
- formula validation
- use the testing data to test the model against the model
- plug the independent variables (x) of the testing data into the model to compare its prediction (y_pred) with the actual dependent (y_test) variable of the data
- use Python libraries and metrics such as:
- mean absolute error (MAE)
- mean square error (MSE)
- root mean square error (RMSE)
- see which x (or combination of x's) has the error metrics closest to 0
Backward elimination
- noisy data removal
- for multiple regression models, remove any variable(s) that would cause the model to classify or predict poorly
Model deployment
- taking that model
- to the internet
- to an intranet
- to some private users for their own use cases