[Summary] Module 3: Concepts and Principles of ML in Healthcare

Module 3: Concepts and Principles of ML in Healthcare #

1 Introduction to Deep Learning and Neural Networks #

Q1: Why are neural networks considered a turning point in machine learning? #

Neural networks mark a major departure from traditional ML models because they enable much deeper interactions between features and parameters. Unlike models like logistic regression or decision trees, neural networks—especially deep ones—organize parameters in layers, allowing complex feature transformations.

Traditional ML: Parameters interact directly with input features.
Neural networks: Parameters are arranged in layers; outputs of one layer become inputs to the next.
Result: Increased expressive power and ability to model complex patterns.

➡️ To understand what truly sets deep learning apart, we need to explore the unique features of deep neural networks themselves.

Q2: What makes deep learning different from traditional models? #

Deep learning models, also called deep neural networks, often involve millions or even billions of parameters across multiple layers. This structure allows:

Hierarchical representation of data (low-level to high-level features).
Repeated multiplication and addition of feature weights across layers.
The final output is shaped by a sequence of transformations rather than a direct mapping.

➡️ These layered transformations contribute to the overall power of deep models—so how exactly do these layers work to increase complexity?

Q3: How do layers in a neural network contribute to model complexity? #

Each layer in a neural network captures increasingly abstract representations of the data:

Early layers might detect edges or basic patterns in an image.
Middle layers combine these into shapes or motifs.
Deeper layers can represent complex concepts like organs or specific pathologies in medical images.

This layering increases the non-linearity and representational power of the model.

➡️ To better grasp the idea behind these computational units, it helps to look at their biological inspiration.

Q4: What are some biological inspirations behind neural networks? #

The structure of neural networks was inspired by the human brain:

Each neuron in a network is a mathematical function mimicking a brain cell.
Neurons process inputs and produce outputs that can be fed into other neurons.
This setup enables distributed computation, similar to how the brain processes information.

➡️ With such a powerful computational design, what makes neural networks particularly valuable in healthcare?

Q5: Why are neural networks especially promising for healthcare applications? #

Healthcare data is complex and high-dimensional—from imaging to text (EHRs), genomics, and time-series data. Deep learning shines in this space because:

It doesn’t need hand-engineered features.
It can learn representations directly from raw data.
It’s particularly powerful in image analysis, speech recognition, and clinical text processing.

2 Deep Learning and Neural Networks #

Q1: What does the typical training loop of a neural network look like? #

The training loop consists of iterative steps to improve model performance:

Step 1: Pass each training sample through the model to generate predictions.
Step 2: Compute a loss value to quantify prediction error.
Step 3: Update model parameters using optimization to reduce the loss.
This process is repeated over multiple epochs, with each full pass through the training data called an epoch.

➡️ Once training is underway, how do we ensure the model isn’t just memorizing the data?

Q2: How do we validate whether the model is generalizing well? #

Model evaluation is typically done on a validation dataset, which the model hasn’t seen during training.

After each epoch (or a few), we evaluate the model’s performance on this set.
This helps detect overfitting, where a model performs well on training data but poorly on unseen data.
Generalization is key in healthcare to ensure predictions work on real-world, diverse patient populations.

➡️ Speaking of overfitting, one factor that contributes to this is the sheer number of parameters in neural networks.

Q3: Why are deep learning models prone to overfitting? #

Deep neural networks often contain millions of parameters, which gives them immense capacity to:

Memorize training data rather than learning general patterns.
Fit even random noise if not properly regularized.
This is why data quantity and quality, as well as regularization strategies, are critical.

➡️ But what’s the actual structure of these models, and how do they transform data layer by layer?

Q4: What happens within each layer of a neural network during computation? #

Each layer performs a mathematical transformation:

Takes input (either original features or previous layer output),
Applies weighted summation and non-linear activation functions,
Passes output to the next layer.

This sequence of operations allows the model to build up increasingly abstract representations of the data.

➡️ To build a solid foundation in understanding training, we need to connect this to how data and loss flow through the model.

Q5: What is backpropagation and how does it optimize the model? #

Backpropagation is a core algorithm for training neural networks:

It calculates how the loss changes with respect to each model parameter.
These gradients are then used to update the parameters using an optimizer like Stochastic Gradient Descent (SGD).
This iterative process enables the model to gradually learn better predictions.

3 Cross Entropy Loss #

Q1: What is the purpose of a loss function in machine learning? #

A loss function quantifies how far off a model’s predictions are from the actual labels. It’s a numerical signal used to update model parameters during training:

Lower loss = better prediction performance.
The loss guides the optimization process.
Without it, the model has no sense of how to improve.

➡️ For classification tasks, what specific loss function is widely used and why?

Q2: What is cross-entropy loss and why is it used in classification? #

Cross-entropy loss measures the dissimilarity between the predicted probability distribution and the true distribution (the one-hot encoded label).

Especially useful in multi-class classification.
It penalizes wrong, confident predictions more heavily than uncertain ones.
Helps push the model to make confident and correct predictions.

➡️ What does this loss look like mathematically and how is it interpreted?

Q3: How is cross-entropy loss computed mathematically? #

For a single class, the loss is defined as:

\[ L = -\log(p) \]

Where p is the predicted probability for the true class. In general:

\[ L = -\sum y_i \log(p_i) \]

Where:

( y_i ) is 1 for the correct class, 0 otherwise.
( p_i ) is the predicted probability for class ( i ).

➡️ Since the model outputs raw scores, how are these converted into probabilities?

Q4: How does the softmax function turn logits into probabilities? #

The softmax function transforms the model’s output scores (logits) into a probability distribution across classes:

\[ \text{softmax}(z_i) = \frac{e^{z_i}}{\sum_j e^{z_j}} \]

Ensures the outputs are positive and sum to 1.
Prepares predictions for comparison with actual labels using cross-entropy.

➡️ How does cross-entropy loss impact model training?

Q5: How does cross-entropy guide parameter updates in training? #

During backpropagation, gradients of the cross-entropy loss with respect to parameters are computed.
These gradients are used to update weights so predictions align better with true labels.
As training progresses, cross-entropy loss typically decreases, signaling improved classification.

4 Gradient Descent #

Q1: Why is optimization necessary in training neural networks? #

Optimization is the process that adjusts model parameters to minimize prediction error:

Neural networks are trained by minimizing a loss function.
The model learns by iteratively updating parameters to reduce this loss.
Effective optimization is crucial for learning accurate patterns from data.

➡️ What specific algorithm is most commonly used to perform this optimization?

Q2: What is gradient descent and how does it work? #

Gradient descent is an algorithm used to minimize a function by moving in the direction of the steepest descent:

It calculates the gradient (slope) of the loss function with respect to each parameter.
Parameters are updated by subtracting a portion (the learning rate) of the gradient.
This process continues until the model reaches a local minimum.

➡️ How do we determine how big of a step to take in each update?

Q3: What role does the learning rate play in gradient descent? #

The learning rate controls how much the model updates its parameters in response to the calculated gradient:

Too high: can overshoot and destabilize training.
Too low: training can be very slow or get stuck in a poor local minimum.
It’s often treated as a hyperparameter that must be tuned carefully.

➡️ Are there variations of gradient descent that address real-world training challenges?

Q4: What are the common variants of gradient descent? #

There are three main types:

Batch Gradient Descent: Uses the entire training set for each update—slow but stable.
Stochastic Gradient Descent (SGD): Updates using a single data point—faster but noisier.
Mini-batch Gradient Descent: A compromise that uses a small batch of data—efficient and commonly used in practice.

➡️ Beyond variants, can the algorithm adapt during training for better performance?

Q5: What are some adaptive optimization methods beyond basic gradient descent? #

Advanced optimizers improve training by adjusting learning rates automatically:

Momentum: Adds a fraction of the previous update to the current one to smooth progress.
RMSProp: Scales updates by a moving average of recent gradients.
Adam: Combines momentum and RMSProp for adaptive, robust performance—popular in practice.

9 Commonly Used and Advanced Neural Network Architectures #

Q1: Why explore multiple neural network architectures in healthcare? #

Different architectures are designed to solve specific problems:

Standard models may not perform well on complex or domain-specific tasks.
Specialized architectures improve performance, interpretability, and training speed.

Understanding these models is key to designing solutions for clinical settings.

➡️ What are some commonly used image-based architectures?

Q2: What are ResNet and DenseNet, and what problems do they solve? #

ResNet: Introduces residual connections to skip layers and prevent vanishing gradients.
DenseNet: Connects each layer to every other layer to encourage feature reuse.

Both architectures improve training of very deep networks, commonly used in radiology and pathology.

➡️ Are there architectures tailored for medical image segmentation?

Q3: What is U-Net and why is it useful in healthcare? #

U-Net is designed for semantic segmentation, particularly in biomedical imaging:

Encoder-decoder structure with skip connections.
Captures both local and global context.
Used for tasks like tumor segmentation, organ delineation, and cell counting.

➡️ Beyond visual data, how do we model structured or complex input spaces?

Q4: What are Autoencoders and what role do they play? #

Autoencoders are unsupervised neural networks that learn to reconstruct their input:

Useful for dimensionality reduction, denoising, and anomaly detection.
In healthcare: Identify rare diseases or compress high-dimensional patient data.
Latent representations can be used as features in other models.

➡️ What about generating new data or simulating clinical scenarios?

Q5: What are GANs and how are they applied in clinical ML? #

Generative Adversarial Networks (GANs) consist of a generator and discriminator:

Used to generate synthetic but realistic data (e.g., images, waveforms).
Helps with data augmentation, especially in rare disease settings.
Also used in privacy-preserving machine learning and image translation (e.g., CT to MRI).