lecture 7
from fully connected layer
to
convolutional neural networks
+
from pattern recognition to pattern generation:
GAN 🤩
SeAts APp SEAtS ApP SEaTS APP
gentle reminder of your presentation
starting early by thinking about these questions:
- what are the problems/design questions that can be potentially solved by ML models?
- if so, what ML models to use and how? (i'll help with this part)
lecture plan
Part 1 🧠
- few words on fully connected layer from last lecture
- hands-on tweaking with an MLP NN
- convolutional neural network: a practical guide
Part 2 🧑‍🎤
- FUN GAN applications
- the GAN itself
a follow-up on our most ambitious lecture of this unit (last one)
with some extra tools(concepts) introduced today ...
(hopefully) after this lecture, the whole world of AI opens its portal for ya
GAME TIME!
the 12
Mini Recap{
overly simplified biological neurons simulation
neuron: single number holder
layer: neurons being grouped together
connections: neurons in consecutive layers are connected
connection strength: each neuron-neuron pair has its own connection strength, aka weight
layer types: input, output (shapes matters, and they are task- and data- dependent )
layer type: hidden layer, in-between input and output
layer type: hidden layer, shape to be defined by us as NN architect
layer function: F(x) = Relu(wx + b)
x: input (vector, or matrix) for THIS layer function
where does it come from: the output (vector, or matrix) from PREVIOUS layer function
w: weights matrix,
multiplying x with w accounts for "accumulating incoming signal according to their connection strengths"
b: bias vector
together with Relu, it accounts for "set a threshold bar of when the neuron is sufficiently charged" part
write down the BIG function for entire MLP: chain these layer functions together
from function perspective: scaffold (predefiend and fixed), muscle (parameters, to be learned by training)
the form (the multiplication, addition, wrapping with Relu, chaining) is the scaffold
numbers in weights matrices and biases are the "muscle" aka parameters to be learned during training (more on this next semester)
🎉🥂🎊
小index: 3 months
it is a legit part of MSc-level MLP walk through
/*end of recap*/
}
🤓
high-level stuff on layers and hierachical structure
welcome to deep learning era: any NN that has more than one hidden layer is a legit deep neural network 🤪
why more hidden layers? the key is hierachical structure and abstraction
(not so) scary deep MLP
another dot being connected: remember representation?
each layer's neurons vector can be seen as a feature vector, or a representaion vector
in a way, each layer function can be seen as a "feature extractor" that outputs some level of representation
the "weights matrix multiplication & bias vec addtion & Relu wrapping" together work as a extractor,
that extracts feature vector(output of this layer) from the input
even though 99.9% of time these feature vectors does not make semantic sense to us at all
(number friendly yet human unfriendly 🥰)
"blackbox": unlike human reasoning process, we can barely interpret the numbers (activations or weights or biases) between input and output
instead of trying to interpret these extracted feature vectors in hidden layers, let's only look at high-level abstraction level
representations from different layers form a sense of hierachical structure,
they are extracted from the original input at different levels
but what is this hierachical structure???
i use the terms "high-level and low-level" a lot nowadays, but only after i started studying ML 😅
high-level: more abstract and more general and smaller piece of infomation
the representations of different layers can be seen as at different levels,
for example,
from low-level(input, pixels),
middle-level(some local features of input like edges and simple shapes),
to high-level or semantic level or the most abstract level(output, global and general information)
the success of DL (having more hidden layers, or going "deep"), is largely attributed to its capacity of extracting representations hierachically
⚠️ note: there is always a "dimension reduction" process in SL
here comes one single high-level idea on NN that is a revelation to me:
there is always a "dimension reduction" process in classification task
why?
Classifying is a task of extracting high-level features from low-level inputs
NN does this "elegantly" (lazily actually)
by setting up layers that have fewer neurons
it is forced to squeeze information into smaller container so that only general information could stay
fully connected layer 🥹
confession: there is more than one way of connection between consecutive layers
what we have seen in the MLP hidden layer is "fully connected layer"
simply because every neuron is connected to every neuron in previous layer
this full connectivity brings a convenience for NN architects (you)
when defining a fully connected layer (function), the only quantity to be specified, that is left to our choice, is the numebr of neurons
once that choice is made,
the shape of weights matrix and bias vector would be automatically calculated using #of neurons in this layer and #of neurons in previous layer.
WHY???
the matrix addition and multiplication shape rule: once the input/output shapes are known, there is only one shape of matrix that could comply with the shape rule and complete the task of multiplication. Examples on whiteboard.
let's confirm we have learned something today by revisiting this model training process, especially "Set up the layers" part
our first time hands-on tweaking with a neural network:
add another fully connected layer comprising [your choice here] neurons
help me with basic maths: how many parameters are there for this MLP that handle this more practical input data? hint: use the shape of weights matrix and bias vector
apparently, we don't encounter image of 28x28 much often, say if we have more realistic images of 1024 x 1024
with the task being a dog-or-not classification problem
the size of input layer?
1,048, 576 ~= 1m
with first hidden layer having 1000 neurons, what is the size of this weights matrix?
~= 1B
probably more than the number of dogs in the world (check this up)
recall that we oNLy have 100 billion neurons
100 different 1024x1024 image classifications models and our brain juice is drained
when the task and data get slightly more difficult... the constraint of FC layer becomes prominent
curse of dimensionality
sorry, fully connected layer 🥹
reflection on why FC layer suffers from CoD:
1. everything connects to everthing
2. every connection has its own weight
hence the big weight matrix
🫥 bye for now
Convolutional neural networkds CNN{
CNN is the stepstone for working with vision or any 2D data
it is a variant on top of MLP, most things remain the same (it has input output layer, and fully connected layer🤨)
one good thing: no need to flattening the 2D input, recall the cost of losing neighboring information in flattening
and many other good things it brings...
but how?
new ideas in CNN are essentially introducing two new layer types on top of MLP: convolutional layer and pooling layer
2. convolution and conv layer 🫣
-- convolution is a math operation that is a special case of "multiplication" applied between two matrices or functions
-- just another meaningful computation rules
-- check this video out for a very smooth introduction
it is actually quite easy:(filter as the weights matrix) element-wise multiplication and sum up the results similar to dot product
and it just move to the next part like a sliding window, all the way till the end
recall multiplication has a flavour of "computing similarity"
view the weights matrix here as a filter that has a graphical pattern on it,
and by convolution with input
it is to detect if that graphical pattern is present in the image or not
( how similar it is between the filter pattern and input patches pattern)
lingo: the output of a conv layer is called "feature map" (2D feel huh)
recall the hierachical feature extractions in NN?
here is a visualisation of "hierachical 2D features extraction done properly by CNN"
https://cs.colby.edu/courses/F19/cs343/lectures/lecture11/Lecture11Slides.pdf
the filters doing pattern detection with an overall hierarchical structure are guess what, biologically inspired!
conv layer is profound
-- this single math operation is a game changer in AI, the first working AI in 1998 by Lecun Yann
-- reduced connectivity (see next slides)
-- shared(copied/tied) weights (see next slides)
-- its design coincides with image's intrinsic properties, right tool for the right problem (hierachical structure of patterns, self-similarity)
it also helps me get a job lol, this is the exact CNN that helped my previous job a looot
Compare Conv layer with FC layer
- to build a FC layer
1. we only need to specify how many neurons are there in the FC layer, equivalent to what the FC layer output shape is.
2. in other words, it can output any shape regardless of the input shape
3. the weights matrix shape is then caculated according to the mat.multi. shape rule (by computer)
Comparison Conv layer with FC layer
- to build a conv layer
0. numbers in weights matrix(now called a filter) stand for pattern instead of connection strength
1. we firstly define the weight matrix shape, aka the filter size (conventionally small like 3x3 )
2. the output shape is then calculated (by computer, and it cannot do massive shape reduction)
3. we lose the fine control of defining the output shape with freedom
for remedy, this is why in a CNN, there are always FC layers at the end to take care of shapes
see da VGG16
3.5. conv layer why cool:
-- reduced connectivity: a neuron only connects to some of the neurons in prev layer
-- not one link one weight anymore: the connection weights are shared (copied)
-- image has self-similarity structure (patches at diff locations looks similar)
https://youtu.be/XTZwB_jicMI
4. (max) pooling layer: no parameters, just an operation of taking the max pixel values in a given neighborhood, "downsample" (like blurring the image)
max pooling is just another easy math operation, it reduces each dimension by half in this example
why max pooling layer ?
- to compensate the fact that conv layer cannot largely reduce the dimension
why do we need dimension reduction?
- to squeeze information into more general ones
and max pooling does dimension reduction brutally, with no parameters to be learned
takeaway message:
this ordered layering of "conv+pooling" is very conventional, like shown in VGG
sometimes called a "conv block" or "conv module"
the process of designing a CNN architecture (scaffold) is mostly just stacking conv blocks, and add some FC layers in the end
modern state-of-the-art AI architecture: posenet. Intimidating maybe at the first look, it is just deep and with many good old friends
that is something even beyond some master degree taught modules! 小index: 2 years
the joy of being an architect stacking blocks?
here is an interesting work showing our primitive stacking behaviour and some other ideas
GAN
brilliant idea, it gives machine "imagination"
core idea is quite simple, simple is beautiful :)
this person does not exist
generating infinite landscape image (literally what i am studying this week)
$1,000,000 question on detect deep fake image thanks to GAN irony of verifying if photo is from reality where it used to be that we use photos to verify reality
because it is beautifully simple, i'm going to explain it in simple words:
GAN has two networks (🫢), one called generator (G) and the other one called discriminator (D)
G generates images (G-image)
D classifies if the any input image is real or fake (binary classification task)
D tries not to be fooled by G
G tries to fool the D
once trained, we can throw away D and just use G to generate images
that's it, an ingenious AI model designed using AI mindset (let the computer do the job...)
in little bit more details...
GAN structure
images from real life (dataset) and images from generator (fake images) are both fed to D
D tries not to be fooled by G:
D is trained to output "no it is fake" when ever an G-image is the input
G tries to fool the D :
G is trained to generate image that D would output "yes it is reallll!!!"
what's cool?
it steps outside the SL, where a strict teacher (correct answers or labels) is needed
instead it shapes a teacher for its own, like shadow boxing
not so much human in the loop, save the labelling time, just let computer do its job!
what are the plausible architectures?
some example codes here
(with confidence you can understand this codes more, even if you have not done a single line of python coding yet)
final words
start thinking about the presentation, aka dreaming regardless of feasibility
some sources of inspiration:
- Kaggle for practical ML problems, with dataset and codes
- hugging face for AI models, datasets, codes in general
- google arts & culture for AI in arts and culture
...more to come (i'll update this slide)
here's my take on "what are the problems that can be potentially solved by machine learning?"
- mundane, repetitive works that i don't bother doing (transcribe handwriting into texts on computer, etc.) 🙄
- "it'd be great to have that" but i don't know how to implement that just using programming (face recogtion, etc.) 🥺
- just for fun 🤗
final final words
very nice to meet yall and happy xmas, new year, and everyday 🤘🥰
ML applications found by lovely students:
Move Mirror by Google Creative Lab (pose estimation)
AI Voice Generator and Cloner (text to speech)
Beyond Imitation (pose prediction: given a sequence of poses as input, predict the next pose)
chimera-painter (conditional GAN: generative model)
reference
- Human neurons
- A neuron with dendrites
- Lenet 5
- VGG16