Learning Outcomes

Machine Learning

AI generated with ChatGPT 4o
OpenAI (2024)
  1. Articulate the legal, social, ethical and professional issues faced by machine learning professionals.
  2. Understand the applicability and challenges associated with different datasets for the use of machine learning algorithms.
  3. Apply and critically appraise machine learning techniques to real-world problems, particularly where technical risk and uncertainty is involved.
  4. Systematically develop and implement the skills required to be effective member of a development team in a virtual professional environment, adopting real-life perspectives on team roles and organisation.

Collaborative

The Fourth Industrial Revolution

Topic

Schwab's (2016) article on Industry 4.0's impact.

Outcomes

What (1)

Explained Industry 5.0's augmentation, Channel 4's regulatory failure from Red Bee Media's service outage, and mitigation (Ofcom, 2022; Ziatdinov et al., 2024).

So What (4)

My peer responses identified areas missed in my analysis, like personal costs.

Now What / Feedback (3)

Summary contrasted peer feedback on interconnectedness and cloud computing but concurred on mitigation via disaster recovery and communication (Adams, 2024; Zapka, 2024).

Skills

Disaster recovery, Industry 4.0, Industry 5.0

EDA Tutorial

AutoMPG dataset

EDA Correlation

FIGURE 1 | Correlation

Topic

Exploratory Data Analysis (EDA) on AutoMPG dataset using Google Colab.

Outcomes

What (2)

EDA discovered nominal “?” in horsepower. High negative correlation (Figure 1) indicates lower miles per gallon (MPG) for higher weight, displacement, and cylinders.

So What (2)

Text-based EDA complements visualisations (Oluleye, 2023).

Now What (2)

Use range of tools in future EDA.

Feedback (4)

Positive feedback on my "Using Google Colab with GitHub".

Skills

Google Colab, GitHub, Matplotlib, missingno, NumPy, pandas, Python, seaborn, SciPy

Regression

FIGURE 2 | Linear versus polynomial regression

Topic

Correlation and regression.

Outcomes

What (2)

Explored bivariate (two variables), covariance (directional relationship), Pearson's Correlation (linear relationship), linear and polynomial regression, and multiple regression.

So What (2)

Covariance depends on standard deviation (sd). Pearson Correlation not affected by shifts to mean or sd (Oluleye, 2023). Regression predicts y from x, but polynomial uses non-linear relationship (Figure 2).

Now What (1)

Linear relationships use Pearson's Correlation and linear regression. Non-linear use polynomial regression.

Skills

Google Colab, GitHub, Python, NumPy, Matplotlib, seaborn, SciPy, pandas, scikit-learn

Linear Regression

With scikit-learn

Linear and Log-Linear Regression

FIGURE 3 | Linear versus Log-Linear Regression

Topic

Correlation and linear regression with scikit-learn.

Outcomes

What (2)

Pre-processed population and gross domestic product (GDP) data per country (2001-2021), investigated Pearson Correlation, and performed linear regression.

So What (2)

Discovered missing NaN values, missing 2021 global_GDP column, and global_population as all non-numeric objects. Pre-processing enabled analysis (Oluleye, 2023).

Now What (2)

Linear regression may need logarithmic transformation to be meaningful (Figure 3).

Skills

Google Colab, GitHub, Matplotlib, pandas, Python, scikit-learn

Jaccard
FIGURE 4 | Jaccard

Topic

Jaccard coefficient.

Outcomes

What (1, 2)

Assignment asked for Jaccard coefficient (similarity) but used Jaccard distance (dissimilarity).

So What (2)

Jaccard coefficient is J = f11 / (f01 + f10 + f11), or intersection divided by union. Jaccard distance is dJ = f01 + f10 / (f01 + f10 + f11), or 1 - J (Chung et al., 2019).

Now What (2)

Figure 4 excludes symmetric features like gender, uses asymmetry, and combines terms to get 0 or 1, where only 1 means attribute present.

Skills

Jaccard coefficient and distance.

Cluster Mean Analysis

FIGURE 6 | Airbnb NYC cluster mean analysis

Topic

Team assignment on Airbnb New York City (NYC) 2019 dataset (Kaggle, 2021).

Outcomes

What (2, 4)

Collaboratively delivered EDA, pre-processing, statistical analysis, data visualisation, and k-means clustering (Oluleye, 2023).

So What (2, 4)

Figure 6 shows how cluster-specific strategies optimise revenue. Followed EDA within Cross Industry Standard Process for Data Mining (CRISP-DM) (Niakšu, 2015, Mukhiya & Ahmed, 2020; Oluleye, 2023).

Now What (2,4)

First practical machine learning (ML) coding challenged team.

Feedback

Distinction. Tutor says "impressive submission" clearly articulated with thorough EDA and practical and actionable specific recommendations. Team felt group would work well professionally.

Skills

EDA, k-means clustering, Matplotlib, NumPy, pandas, Python, scikit-learn, seaborn.

k-means Tutorial

FIGURE 7 | k-means Tutorial

Topic

k-means clustering on: Iris, Wine, and WeatherAUS (Figure 7).

Outcomes

What (2, 3)

Explore, preprocess, cluster, compare, and visualise data (Oluleye, 2023).

So What (1, 2, 3)

Iris clusters achieved 64.1% accuracy, with Figure 7 showing cluster 1 matched but 0 and 2 overlapped. Wine matched 89.7% with perfect matches for clusters 2 and 3. WeatherAUS reduced features using Principal Component Analysis (PCA), showing clearest relationships for k=3 within temperature, wind, and humidity.

Now What (3)

Understand class, PCA, and visualisation to assess cluster effectiveness.

Skills

EDA, k-means clustering, PCA, Python, scikit-learn, seaborn.

Perceptron Tutorial

FIGURE 8 | Multi-layer perceptron error by epochs

Topic

Perceptron and weights in artificial neural networks (ANNs) (Kubat, 2021).

Outcomes

What (2)

Simple perceptron returned 0 or 1. Perceptron AND operator trained a single-layer perceptron (no hidden layer) to output binary classification (0 or 1). Multi-layer perceptron solved XOR using hidden layer and sigmoid activation.

So What (2)

Perceptrons depend on inputs, weights, and error correction through training. Learning rate affects. Figure 8 shows reduced prediction error over training epochs, converging on XOR values.

Now What (2)

Foundation for how perceptrons in ANNs work.

Skills

NumPy, perceptron, weights, Matplotlib, Python

Gradient Descent Tutorial

FIGURE 9 | Gradient descent cost function

Topic

Observe iteration and learning rate effect on cost function (Mayo, 2017).

Outcomes

What (2)

Use mean squared cost function to find linear equation (y = mx + b) best representing input x and output y to reduce error between actual and predicted values over iterations at learning rate (Mayo, 2017).

So What (2)

Iteration (number of steps) balances overfitting (too many) with missing minimum error (too few). Learning rate controls step size: too big overshoots; too small is inefficient (Kubat, 2021).

Now What (2)

Doubling learning rate risks overshooting, while halving may not converge. Increasing iterations can help converge (Figure 9).

Skills

NumPy, cost function, gradient descent, Python

Topics

Pruciak's (2021) ANNs in personalisation, and Centre for Data Ethics and Innovation (2019) Artificial Intelligence (AI) in insurance.

Outcomes

What (3)

ANNs enable Netflix's recommender systems (Steck et al., 2021). AI in insurance concerns include privacy, uninsurability, and intrusive advertising.

So What (3)

While ANNs personalise content, deep learning can amplify recommender bias (Steck et al., 2021; Gonzalez et al., 2022). The Grenfell fire favoured property developers over individuals.

Now What (3)

ANN recommendations risk filter bubbles. AI insurance issues partly addressed by European Union (EU) AI Act, but regulation is not global (Flamind & Sonner, 2024).

Skills

ANNs, ethics, EU AI Act

Collaborative

Legal and Ethical Views on ANN Applications

Topic

Benefits and risks of AI writing (Hutson, 2021).

Outcomes

What (3)

Writing by large language models (LLMs) from generative pre-trained transformer 3 (GPT-3) onwards.

So What (3)

LLMs can inspire science fiction creativity, but hallucinations impact factual, and uncompensated training harms creatives (Timsit, 2023; Wafa et al., 2024; Wiggers, 2024).

Now What / Feedback (3, 4)

Address legislative, ethical, and critical thinking gaps.

Skills

ANN, GPT, ethics, LLM, regulation

CNN Model Tutorial

FIGURE 10 | CNN prediction

Topic

Convolutional Neural Networks (CNNs).

Outcomes

What

Explore CNN impact and predict images (Figure 10) (Wall, 2019).

So What (1, 2)

Police facial recognition risks inaccuracy from bias. Trained, evaluated, and predicted with CNN.

Now What (1, 2)

EU AI Act makes some facial recognition an unacceptable risk (European Parliament, 2023).

Skills

CNN, Keras, Matplotlib, NumPy, scikit-learn, TensorFlow

Flat White as Espresso

FIGURE 11 | CNN predicting flat white as espresso

Topic

Wang et al.'s (N.D.) CNN tutorial.

Outcomes

What

Interactive CNN example and documentation.

So What (2)

Explained input, convolutional layer with rectified linear activation (ReLU), pooling, flattening, and Softmax classification.

Now What (2)

Critically considered why flat white identified as espresso with some red panda (Figure 11).

Skills

CNN, pooling, ReLU, Softmax

ROC AUC and R-squared

FIGURE 12 | ROC AUC and R-squared

Topic

Receiver Operating Characteristics area underneath the curve (ROC AUC) and R-squared (Bruce et al., 2020)

Outcomes

What

Changed ROC AUC and R-squared parameters and observed impact.

So What (1)

ROC AUC micro (0.73), macro (0.77), and weighted (0.75) similar, while Iris classes vary (Figure 12). R-squared normally ranges from 0 (baseline) to 1 (perfect prediction) (Bruce et al., 2020).

Now What (1)

Know worse-than-baseline negative R-squared is possible (Figure 12).

Skills

Accuracy, confusion matrix, F1-score, precision, recall, ROC AUC, R-squared

CNN presentation

FIGURE 13 | CNN presentation

Topic

CNN models for object recognition using CIFAR-10.

Outcomes

What

Created five CNN models adding dropout, data augmentation, L2, early stopping, and additional convolution and pooling.

So What (1)

Data augmentation was the most effective improvement. L2 failed.

Now What (1)

Gained understanding by sharing mistakes and resolutions.

Skills

CNN, confusion matrix, pooling, ReLU, scikit-learn, Softmax, TensorFlow

Topic

Prognostic ML models in Industry 4.0 (Diez-Olivan et al., 2019).

Outcomes

What (2)

Explored descriptive, predictive, and prescriptive.

So What (3)

Red Bee Media incident flaws: descriptive misidentified fire, predictive missed fire sensor failure, and prescriptive disaster recovery failed.

Now What (1)

Prescriptive disaster recovery is essential to broadcast and compliance.

Skills

Prognosis: descriptive, predictive, prescriptive