Big

lecture 7
from fully connected layer
to
convolutional neural networks
+
from pattern recognition to pattern generation:
GAN 🤩

SeAts APp SEAtS ApP SEaTS APP

gentle reminder of your presentation
starting early by thinking about these questions:
- what are the problems/design questions that can be potentially solved by ML models?
- if so, what ML models to use and how? (i'll help with this part)

lecture plan
Part 1 🧠
- few words on fully connected layer from last lecture
- hands-on tweaking with an MLP NN
- convolutional neural network: a practical guide
Part 2 🧑‍🎤
- FUN GAN applications
- the GAN itself

a follow-up on our most ambitious lecture of this unit (last one)

with some extra tools(concepts) introduced today ...

(hopefully) after this lecture, the whole world of AI opens its portal for ya

GAME TIME!
the 12

Mini Recap{

overly simplified biological neurons simulation

neuron: single number holder

layer: neurons being grouped together

connections: neurons in consecutive layers are connected

connection strength: each neuron-neuron pair has its own connection strength, aka weight

layer types: input, output (shapes matters, and they are task- and data- dependent )

layer type: hidden layer, in-between input and output

layer type: hidden layer, shape to be defined by us as NN architect

layer function: F(x) = Relu(wx + b)

x: input (vector, or matrix) for THIS layer function
where does it come from: the output (vector, or matrix) from PREVIOUS layer function

w: weights matrix,
multiplying x with w accounts for "accumulating incoming signal according to their connection strengths"

b: bias vector
together with Relu, it accounts for "set a threshold bar of when the neuron is sufficiently charged" part

write down the BIG function for entire MLP: chain these layer functions together

from function perspective: scaffold (predefiend and fixed), muscle (parameters, to be learned by training)

the form (the multiplication, addition, wrapping with Relu, chaining) is the scaffold

numbers in weights matrices and biases are the "muscle" aka parameters to be learned during training (more on this next semester)

🎉🥂🎊

小index: 3 months
it is a legit part of MSc-level MLP walk through

/*end of recap*/
}

🤓

high-level stuff on layers and hierachical structure

welcome to deep learning era: any NN that has more than one hidden layer is a legit deep neural network 🤪

why more hidden layers? the key is hierachical structure and abstraction

(not so) scary deep MLP

another dot being connected: remember representation?

each layer's neurons vector can be seen as a feature vector, or a representaion vector

in a way, each layer function can be seen as a "feature extractor" that outputs some level of representation

the "weights matrix multiplication & bias vec addtion & Relu wrapping" together work as a extractor,
that extracts feature vector(output of this layer) from the input

even though 99.9% of time these feature vectors does not make semantic sense to us at all
(number friendly yet human unfriendly 🥰)

"blackbox": unlike human reasoning process, we can barely interpret the numbers (activations or weights or biases) between input and output

instead of trying to interpret these extracted feature vectors in hidden layers, let's only look at high-level abstraction level

representations from different layers form a sense of hierachical structure,
they are extracted from the original input at different levels

but what is this hierachical structure???

i use the terms "high-level and low-level" a lot nowadays, but only after i started studying ML 😅

high-level: more abstract and more general and smaller piece of infomation

the representations of different layers can be seen as at different levels,
for example,
from low-level(input, pixels),
middle-level(some local features of input like edges and simple shapes),
to high-level or semantic level or the most abstract level(output, global and general information)

the success of DL (having more hidden layers, or going "deep"), is largely attributed to its capacity of extracting representations hierachically

⚠️ note: there is always a "dimension reduction" process in SL

here comes one single high-level idea on NN that is a revelation to me:

there is always a "dimension reduction" process in classification task
why?

Classifying is a task of extracting high-level features from low-level inputs
NN does this "elegantly" (lazily actually)
by setting up layers that have fewer neurons
it is forced to squeeze information into smaller container so that only general information could stay

fully connected layer 🥹

confession: there is more than one way of connection between consecutive layers

what we have seen in the MLP hidden layer is "fully connected layer"

simply because every neuron is connected to every neuron in previous layer

this full connectivity brings a convenience for NN architects (you)

when defining a fully connected layer (function), the only quantity to be specified, that is left to our choice, is the numebr of neurons

once that choice is made,
the shape of weights matrix and bias vector would be automatically calculated using #of neurons in this layer and #of neurons in previous layer.
WHY???

the matrix addition and multiplication shape rule: once the input/output shapes are known, there is only one shape of matrix that could comply with the shape rule and complete the task of multiplication. Examples on whiteboard.

let's confirm we have learned something today by revisiting this model training process, especially "Set up the layers" part

our first time hands-on tweaking with a neural network:
add another fully connected layer comprising [your choice here] neurons

help me with basic maths: how many parameters are there for this MLP that handle this more practical input data? hint: use the shape of weights matrix and bias vector

apparently, we don't encounter image of 28x28 much often, say if we have more realistic images of 1024 x 1024

with the task being a dog-or-not classification problem

the size of input layer?

1,048, 576 ~= 1m

with first hidden layer having 1000 neurons, what is the size of this weights matrix?

~= 1B

probably more than the number of dogs in the world (check this up)

recall that we oNLy have 100 billion neurons

100 different 1024x1024 image classifications models and our brain juice is drained

when the task and data get slightly more difficult... the constraint of FC layer becomes prominent

curse of dimensionality

sorry, fully connected layer 🥹

reflection on why FC layer suffers from CoD:
1. everything connects to everthing
2. every connection has its own weight
hence the big weight matrix

🫥 bye for now

Convolutional neural networkds CNN{

CNN is the stepstone for working with vision or any 2D data

it is a variant on top of MLP, most things remain the same (it has input output layer, and fully connected layer🤨)

one good thing: no need to flattening the 2D input, recall the cost of losing neighboring information in flattening

and many other good things it brings...

but how?

new ideas in CNN are essentially introducing two new layer types on top of MLP: convolutional layer and pooling layer

2. convolution and conv layer 🫣
-- convolution is a math operation that is a special case of "multiplication" applied between two matrices or functions
-- just another meaningful computation rules
-- check this video out for a very smooth introduction

it is actually quite easy:(filter as the weights matrix) element-wise multiplication and sum up the results similar to dot product

and it just move to the next part like a sliding window, all the way till the end

recall multiplication has a flavour of "computing similarity"

view the weights matrix here as a filter that has a graphical pattern on it,
and by convolution with input
it is to detect if that graphical pattern is present in the image or not
( how similar it is between the filter pattern and input patches pattern)

lingo: the output of a conv layer is called "feature map" (2D feel huh)

recall the hierachical feature extractions in NN?

here is a visualisation of "hierachical 2D features extraction done properly by CNN"

https://cs.colby.edu/courses/F19/cs343/lectures/lecture11/Lecture11Slides.pdf

the filters doing pattern detection with an overall hierarchical structure are guess what, biologically inspired!

conv layer is profound
-- this single math operation is a game changer in AI, the first working AI in 1998 by Lecun Yann
-- reduced connectivity (see next slides)
-- shared(copied/tied) weights (see next slides)
-- its design coincides with image's intrinsic properties, right tool for the right problem (hierachical structure of patterns, self-similarity)

it also helps me get a job lol, this is the exact CNN that helped my previous job a looot

Compare Conv layer with FC layer
- to build a FC layer
1. we only need to specify how many neurons are there in the FC layer, equivalent to what the FC layer output shape is.
2. in other words, it can output any shape regardless of the input shape
3. the weights matrix shape is then caculated according to the mat.multi. shape rule (by computer)

Comparison Conv layer with FC layer
- to build a conv layer
0. numbers in weights matrix(now called a filter) stand for pattern instead of connection strength
1. we firstly define the weight matrix shape, aka the filter size (conventionally small like 3x3 )
2. the output shape is then calculated (by computer, and it cannot do massive shape reduction)
3. we lose the fine control of defining the output shape with freedom
for remedy, this is why in a CNN, there are always FC layers at the end to take care of shapes
see da VGG16

3.5. conv layer why cool:
-- reduced connectivity: a neuron only connects to some of the neurons in prev layer
-- not one link one weight anymore: the connection weights are shared (copied)
-- image has self-similarity structure (patches at diff locations looks similar)

https://youtu.be/XTZwB_jicMI

4. (max) pooling layer: no parameters, just an operation of taking the max pixel values in a given neighborhood, "downsample" (like blurring the image)

max pooling is just another easy math operation, it reduces each dimension by half in this example

why max pooling layer ?
- to compensate the fact that conv layer cannot largely reduce the dimension
why do we need dimension reduction?
- to squeeze information into more general ones
and max pooling does dimension reduction brutally, with no parameters to be learned

takeaway message:
this ordered layering of "conv+pooling" is very conventional, like shown in VGG
sometimes called a "conv block" or "conv module"

the process of designing a CNN architecture (scaffold) is mostly just stacking conv blocks, and add some FC layers in the end

modern state-of-the-art AI architecture: posenet. Intimidating maybe at the first look, it is just deep and with many good old friends

that is something even beyond some master degree taught modules! 小index: 2 years

the joy of being an architect stacking blocks?

here is an interesting work showing our primitive stacking behaviour and some other ideas

GAN

brilliant idea, it gives machine "imagination"

core idea is quite simple, simple is beautiful :)

this person does not exist

generating infinite landscape image (literally what i am studying this week)

$1,000,000 question on detect deep fake image thanks to GAN irony of verifying if photo is from reality where it used to be that we use photos to verify reality

because it is beautifully simple, i'm going to explain it in simple words:

GAN has two networks (🫢), one called generator (G) and the other one called discriminator (D)

G generates images (G-image)

D classifies if the any input image is real or fake (binary classification task)

D tries not to be fooled by G
G tries to fool the D

once trained, we can throw away D and just use G to generate images

that's it, an ingenious AI model designed using AI mindset (let the computer do the job...)

in little bit more details...

GAN structure

images from real life (dataset) and images from generator (fake images) are both fed to D

D tries not to be fooled by G:
D is trained to output "no it is fake" when ever an G-image is the input

G tries to fool the D :
G is trained to generate image that D would output "yes it is reallll!!!"

what's cool?

it steps outside the SL, where a strict teacher (correct answers or labels) is needed

instead it shapes a teacher for its own, like shadow boxing

not so much human in the loop, save the labelling time, just let computer do its job!

what are the plausible architectures?

some example codes here
(with confidence you can understand this codes more, even if you have not done a single line of python coding yet)

final words

start thinking about the presentation, aka dreaming regardless of feasibility

some sources of inspiration:
- Kaggle for practical ML problems, with dataset and codes
- hugging face for AI models, datasets, codes in general
- google arts & culture for AI in arts and culture
...more to come (i'll update this slide)

here's my take on "what are the problems that can be potentially solved by machine learning?"

- mundane, repetitive works that i don't bother doing (transcribe handwriting into texts on computer, etc.) 🙄
- "it'd be great to have that" but i don't know how to implement that just using programming (face recogtion, etc.) 🥺
- just for fun 🤗

final final words

very nice to meet yall and happy xmas, new year, and everyday 🤘🥰

ML applications found by lovely students:
Move Mirror by Google Creative Lab (pose estimation)
AI Voice Generator and Cloner (text to speech)
Beyond Imitation (pose prediction: given a sequence of poses as input, predict the next pose)
chimera-painter (conditional GAN: generative model)

reference
- Human neurons
- A neuron with dendrites
- Lenet 5
- VGG16