In supervised learning, the desired output which is often called the target value of the network is known to the neural network. It optimizes its performance to reduce the error between the actual output and target.
On the other hand, an unsupervised type of learning does not have any information about the target value. It tries to optimize its performance on its own by identifying the hidden pattern trends in inputs by forming clusters.
Machine Learning And Artificial Neural Network Models
Let’s take a quick look at the structure of the Artificial Neural Network.
ANN has 3 layers i.e. Input layer, Hidden layer, and Output layer. Each ANN has a single input and output but may also have none, one or many hidden layers. The structure of ANN classifies into many types of architecture such as a Single layer, Multi-layer, Feed-forward, and Recurrent networks.
There are weights associated with each input neuron in Artificial Neural Network, bias which also carries weight. An activation function is applied over the net input to calculate the output. The output is then compared to the target and weights are adjusted.
The activation functions are of many types such as Binary step function, Bipolar step, Sigmoidal function, etc.
The above terms are described in the diagram below:
In this tutorial, we will focus on the Artificial Neural Network Models – Multi Perceptron, Radial Bias and Kohonen Self Organising Maps in detail.
What Is A Multilayer Perceptron?
A Perceptron network with one or more hidden layers is called a Multilayer perceptron network. A multi perceptron network is also a feed-forward network. It consists of a single input layer, one or more hidden layers and a single output layer.
Due to the added layers, MLP networks extend the limitation of limited information processing of simple Perceptron Networks and are highly flexible in approximation ability. The MLP networks are trained and the weights are updated using the backpropagation learning method which is explained below in detail.
Some limitations of a simple Perceptron network like an XOR problem that could not be solved using Single Layer Perceptron can be done with MLP networks.
A Backpropagation (BP) Network is an application of a feed-forward multilayer perceptron network with each layer having differentiable activation functions.
For a given training set, the weights of the layer in a Backpropagation network are adjusted by the activation functions to classify the input patterns. The weight update in BPN takes place in the same way in which the gradient descent method is applied to the single perceptron networks.
Minimization Of Error Using BP Algorithm
In this algorithm, the error between the actual output and target is propagated back to the hidden unit. For minimizing the error, the weights are updated. To update the weights the error is calculated at the output layer.
For further minimization of error and to calculate the error at the hidden layer, some advanced techniques that will help in calculation and reduction of error at the hidden layer leading to more accurate output are applied.
With a greater number of hidden layers, the network becomes more complex and slower, but it is more beneficial. The system can be trained with one hidden layer as well. Once trained it will start producing the output rapidly.
This learning algorithm is called backpropagation learning and the network is called a Backpropagation network.
Backpropagation Learning is done in 3 stages:
- The input training pattern is feed-forward.
- The error between actual output and target values are calculated.
- The weights update.
Architecture Of BP Networks
Let’s see the architecture of Backpropagation networks.
A backpropagation network is a feed-forward multilayer network. It has an input layer, a hidden layer, and an output layer. The biases are added to the network at the hidden layer and the output layer with activation function=1. The inputs and outputs to the BPN can either be binary (0,1) or bipolar (-1,+1).
The activation function is differentiable, monotonic & incremental and is generally chosen between binary sigmoidal or bipolar sigmoidal.
A backpropagation network has a feed-forward phase where the data is fed from the input towards the output and a back-propagation phase where the signals are sent back in a reverse direction to minimize the error.
Training Process Of Back Propagation Algorithm
From the above image,
Step1: Initialize random weights and learning rate.
Step2: The input unit receives xi as input and sends it to the hidden unit.
Step 3: The net input of the hidden layer unit zj is calculated as
Step 4: Net Output of the hidden layer is calculated as zj= f (zinput), the activation function is taken as binary or bipolar sigmoidal.
Step 5: The net input of the output layer is calculated as zj= f (zinput).
Step 6: The net output of the output layer: f(yinput), the activation function is taken as binary or bipolar sigmoidal.
Step 7: Calculation of where output unit yk(k=1 to m) receives the target pattern corresponding to the input training pattern.
Find out the derivative of the function.
Step 8: Error correction and Weight Updation.
The error is sent backward.
Step 9: The output units are updated: (yk, k=1 to m) updates the bias and weights:
Step 10: Check for the stopping condition that is given as the number of epochs completed.
The steps 2 to 9 are repeated until the stopping condition is obtained.
Factors Affecting The Back-Propagation Network
Some of the factors that affect the training of Backpropagation networks are:
- Initial Weights: The initial random weights chosen are of very small value as the larger inputs in binary sigmoidal functions may lead to saturation at the very beginning, thereby leading the function been stuck at local minima. Some ways of initialization of weights can be using Nguyen-Widow’s initialization. It analyzes the response of hidden neurons to a single input, by improving the learning ability of hidden units. This leads to faster convergence of BPN.
- Learning rate: A large value of learning rate, helps in faster convergence but might lead to overshooting. The range of from 10-3 to 10 is used for various BPN experiments.
- Number of Training Data: The input training data should cover the entire input space and the set of input sets should be chosen randomly.
- Number of Hidden Layer Nodes: The number of hidden layer nodes is chosen for optimum performance of the network. For networks that do not converge to a solution, more hidden nodes can be chosen while for networks with fast convergence few hidden layer nodes are selected.
Example of a Back-propagation Network
For the following network diagram, let’s calculate the new weights with the given figures:
Input vector = [0,1]
Target output = 1
Learning Rate = 0.25
Activation function= binary sigmoidal activation function
From the above diagram we can see the input vector to Z1: [v11, v21, v01] is [ 0.6, -0.1, 0.3]
Input vector to Z2: [v12, v22, v02] = [-0.3, 0.3, 0.5]
Input vector to Y: [w1, w2, w0] = [ 0.4, 0.1, -0.2]
The activation function is given by f(x)= 1/ (1 +e -x)
Input x= [0,1] and target t=1
Step 1: Calculate the net input weight for Z1
Zin1 =v01 + x1 * v11 + x2 * v21
- 0.3 + 0* 0.6 +1 *0.1
Zin2= v02 + x1 * v21 + x2 * v22
- 0.5+ 0*0.6 +1*(0.4)
Step 2: Apply the Activation Function
zi = f(Zin1) = 1/1+e-zin1
zj= f(Zin2) = 1/1+e-zin2
Step 3: Calculate the Net input of Output layer
yin= w0 + zi*w1 + zj *w2
= -0.2+ 0.5498 * 0.4 + 0.7109 *0.1
Step 4: Calculate the Net output using activation
y= f(yin) = 1/1+e-yin
Step 5: Calculation of Error
Step 6: Weight Updation
Step 7: New Weights Calculation
Thus, the final weights are calculated as W1(new)= 0.4164, W2 (new) =0.12117
*Assumption: The error between the input and hidden layer vectors is taken as 0.
Radial Bias Function
Radial Bias Function was developed by M.J.D Powell. It is a classification and approximation algorithm. Gaussian Functions are non-linear functions that are used in Radial Bias Networks. Gaussian Function is used in the regularization of networks.
It is defined as:
f(y)= e-y^2, the f(y) is always positive for all values of y, f(y) decreases of 0 as |y| approaches 0.
The derivative of f(y) = -2 *y * f(y)
The name radial bias is taken from the concept that this function gives the same output for inputs that are at fixed radial distances from the center of the kernel. These inputs are radially symmetric and thus the name radial bias function network is taken.
Architecture of Radial Bias Function
The architecture of the Radial Bias function is given below.
The radial bias function network consists of input, hidden and output layers.
The hidden layer nodes are the radial bias function (RBF) nodes. The hidden layer has a non-linear basis function that produces a response to the input stimulus. The input should be under the localized region of the input space. Thus, this network is also called a Localized receptive field network.
Training Of Radial Bias Function
Step 1: Set the weights to some random initial values.
Step 2: Each input node receives the input signals.
The input unit: xi for all I = 1 to n
Step 3: Calculate the radial bias function using the gaussian function.
Step 4: Select an adequate number of centers from the input vectors.
Step 5: The output from the hidden unit is calculated as
Where x^ji is the center of the radial bias function unit for input vectors, is the width of the ith RBF unity and xji is the jth variable of the input vector pattern.
Step 6: The output is calculated as:
k is the number of hidden layer nodes.
ynet is the output value of the mth node in the output layer for the nth incoming pattern.
wo is the biasing term at the nth output node.
Step 7: Calculate the error and check for the stopping conditions such as the number of epochs, etc.
Kohonen Self Organising Feature Maps
Feature Maps is a method in which multi-dimensional inputs are converted into one or two-dimensional array i.e. it converts a vast array space into a feature space while maintaining the properties of the input features.
To obtain the feature maps, is it necessary to recognize a one or two-dimensional array. These one dimensional or two-dimensional neural arrays are called Self-organizing neural arrays. It is an unsupervised learning network.
For Example, there is an output cluster of m units arranged in a 1D or 2D array and the input signal of n units. The given output pattern is taken as a reference for the input pattern. Thus, when self-organization is done, the input vector unit which matches closely with the weight vector cluster unit is chosen as the winner.
To find the closest input unit, the weight vector is calculated using the Euclidean distance formula.
So, for the units having minimum square (Euclidean distance), the input unit is chosen as the winner. Another way to find the winning input neuron is by using the dot product. The unit with a maximum dot product is chosen as the winner.
A Rectangular Grid of clusters is shown above. The N(k1), N(k2), N(k3) are radii where k1>k2>k3.The winning unit is denoted by “#” and the other output units are denoted by “o”. Each unit has eight nearest neighbors.
Architecture Of Kohonen Self Organising Feature Maps
The architecture of Kohonen Self Organising Maps is shown below:
There are 2 layers i.e. the input and the output layer. The input layer consists of n units and the output layer consists of m units.
The weight Updation takes place on the winning neuron unit which is calculated using Euclidean Distance or Dot Product method. The network is trained until the number of epochs is found or when the learning rate reduces to a very small value.
Training Of Feature Maps
Step 1: Initialize random weights wij and learning rate . It can be chosen as a sample range of input values.
Step 2: Calculate the square of the Euclidean Distance for each input vector x.
Step 3: The winning unit will be the one with the minimum value of D(j).
Step 4: Weight Updation and calculation of new weights.
Step 5: Update the Learning rate
Step 6: Reduce the radius of the topological neighborhood at specific intervals.
Step 7: Repeat steps 2-6 until the stopping condition is received.
Example of Kohonen Self Organising Maps
For given input vectors, construct a Kohonen Self Organising Maps
There are four given vectors: [0 0 1 1], [1 0 0 0], [0 1 1 0], [ 0 0 0 1].
Form 2 clusters.
Initial Learning Rate: 0.5
Step 1: Initialise the weights between 0 and 1.
Step 2: Calculate the Euclidean Distance:
Since D(1)<D(2) therefore D(1) is minimum. Thus, the winning cluster unit is Y1.
Step 3: Updating the weights on the winning cluster unit.
The updated weight matrix
Wij= [0.1 0.9; 0.2 0.7; 0.8 0.5; 0.9 0.3]
Similarly, calculate the new weight matrix for the other three inputs.
For 2nd input:
Wij=[0.1 0.95;0.2 0.35; 0.8 0.25; 0.9 0.15]
For 3rd input:
Wij= [0.05 0.95; 0.6 0.35;0.9 0.25; 0.45 0.15]
For 4th input:
Wij= [0.025 0.95; 0.3 0.35; 0.45 0.25; 0.475 0.15]
1st iteration or epoch is complete.
Step 4: Updating the learning rate.
Updated Weight Diagram
More iterations can be performed until the learning rate reduces to a very small value or till the radius becomes zero.
Multi-layer perceptron networks are the networks with one or more hidden layers. The backpropagation network is a type of MLP that has 2 phases i.e. Feed Forward Phase and Reverse Phase.
In the Feedforward phase, the input neuron pattern is fed to the network and the output gets calculated when the input signals pass through the hidden input and output layer.
In Reverse Phase, the error is backpropagated to the hidden and input layer for weights adjustment. The error is calculated at the output layer when the actual output is compared with the target value.
Some networks also calculate the error at the hidden layer which is propagated back to the input layer. This helps in more accuracy and convergence. BPNs are supervised multilayer perceptron networks.
Radial Bias function uses Gaussian or Sigmoidal functions to regularise the networks. For many input nodes, each node produces a similar output within a fixed radial distance from the center of the kernel.
Kohonen Self Organising Maps are unsupervised learning algorithms that convert a multidimensional input space vector into a one dimensional or two-dimensional space vector.