Introduction to Machine Learning: Core Concepts
Machine Learning & Deep Learning
What is Machine Learning?
Machine learning (ML) enables computers to learn from data and make predictions or decisions without being explicitly programmed for each specific task. It involves training algorithms by feeding them data, allowing them to identify underlying patterns and subsequently make predictions on new, unseen data.
Types of Machine Learning
Machine learning is further divided into categories based on the data on which we are training our model.
Supervised Learning - This method is used when we have Training data along with the labels for the correct answer.
Unsupervised Learning - In this task our main objective is to find the patterns or groups in the dataset at hand because we don’t have any particular labels in this dataset.
What is Deep Learning?
Deep Learning (DL) is a specialized subfield of machine learning (ML) that utilizes artificial neural networks with multiple layers to automatically learn intricate patterns and representations from data. Unlike traditional ML methods, DL excels at processing large-scale, high-dimensional data and is loosely inspired by the structure and function of the human brain.
Why Do We Need Machine Learning?
The Limitations of Traditional Programming
Advantages
- Rule-Based Clarity: For tasks with well-defined rules and easily identifiable characteristics (like specific shapes, colors, or smells), you can directly program the recognition logic. This is straightforward.
- Computational Efficiency: When objects can be reliably distinguished using a small set of pre-defined properties, traditional programming is typically very fast and efficient.
Disadvantages
Handling Complex Distinctions: Traditional approaches struggle when the defining characteristics overlap significantly or are complex. For example, distinguishing apples from cherries based only on being round, red, and sweet is difficult. While adding another dimension (like size) can help in this specific case, designing such distinguishing rules becomes impractical for highly complex or nuanced problems.
Lack of Adaptability: Programs built with traditional logic rely solely on the specific features and rules explicitly defined by the programmer. They lack the ability to generalize these features to recognize new objects or operate effectively in significantly different settings without manual reprogramming.
Neural Networks
Neural networks can autonomously learn and identify patterns directly from data without relying on pre-defined rules. These networks consist of several fundamental components:
- Neurons: The basic computational units that receive inputs. Each neuron applies an activation function to the weighted sum of its inputs (plus a bias) to determine its output.
- Connections: Pathways between neurons that transmit information. Each connection has an associated weight.
- Weights and Biases: Adjustable parameters that govern the strength of connections (weights) and shift the activation threshold of neurons (biases).
- Propagation: The process by which data flows through the network:
- Forward Propagation: Input data is processed layer-by-layer to generate an output.
- Backward Propagation (Backpropagation): The primary algorithm for calculating gradients used to update weights and biases during training.
- Learning Rule: The optimization algorithm (e.g., Gradient Descent) that iteratively adjusts weights and biases to minimize prediction error.
Layers in Neural Network Architecture
- Input Layer: This is where the network receives its input data. Each input neuron in the layer corresponds to a feature in the input data.
- Hidden Layers: These layers perform most of the computational heavy lifting. A neural network can have one or multiple hidden layers. Each layer consists of units (neurons) that transform the inputs into something that the output layer can use.
- Output Layer: The final layer produces the output of the model. The format of these outputs varies depending on the specific task like classification, regression.

Gradient Descent Optimization
Objective of Gradient Descent
To minimize the loss function (which quantifies the difference between model predictions and true values) by systematically adjusting weights (w) and biases (b).
- Weights (w) determine the influence of each input feature on the neuron’s output.
- Biases (b) adjust the neuron’s activation threshold independently of its inputs.
Algorithm Steps
- Step 1: Initialize weights and biases randomly.
- Step 2: For a batch of data, calculate the loss using a chosen loss function (e.g., Mean Squared Error - MSE).
- Step 3: Compute the gradient of the loss function with respect to each weight and bias. The gradient is a vector pointing in the direction of steepest increase in loss.
- Step 4: Update each weight and bias by moving a small step in the opposite direction of its gradient (scaled by a learning rate, η):
w_new = w_old - η * (∂Loss/∂w)b_new = b_old - η * (∂Loss/∂b)
Understanding Gradients (Partial Derivatives)
- The partial derivative
∂Loss/∂wmeasures how much the loss changes for an infinitesimal change in weightw(holding all other parameters constant). - Interpretation:
∂Loss/∂w > 0: Increasingwincreases the loss. To minimize loss, decreasew.∂Loss/∂w < 0: Increasingwdecreases the loss. To minimize loss, increasew.
- The magnitude of
∂Loss/∂windicates how sensitive the loss is to changes inw.
- The partial derivative
Intuitive Analogy
Imagine navigating down a mountain in thick fog, representing the loss landscape. Your goal is to reach the valley floor (minimum loss). You can only feel the steepness of the ground beneath your feet (the gradient).- At each step, you assess the slope (calculate the gradient).
- You then take a step downhill (opposite the gradient direction).
- The size of your step is determined by how steep the slope is and your caution (learning rate).
- Repeating this process (iterations) guides you towards the valley bottom.