Though the underlying concepts that are required to build and train a neural network are **difficult**, It is very easy to implement it in **code**. So in this post, let’s

- Code a neural network by
**hand** - Use
to build a neural network*keras*

Firstly let’s see how can we build our own neural network with just raw python code. For this let’s assume our task is to build a model that just **XOR** the input. It might seem very easy but believe me, it is the first difficult step in training any neural network as the XOR itself is not linear i.e it is non-linear.

So our model should be able to classify things **non-linearly** which is not possible by simple Perceptron. But this is where the Multi Layer Perceptron comes into handy. These are especially suited for such problems indeed.

### Neural network by hand

import numpy as np import matplotlib.pyplot as plt X = np.array([ [0,0,1],[0,1,1],[1,0,1],[1,1,1] ]) y = np.array([[0,1,1,0]]).T

Before starting off, let’s import necessary libraries. $X$ is our input with `no.of.features=3`

and `no.of.samples=4`

and $y$ is the output. We need to build a model that is capable of predicting the output given the input.

#input features --> 3, hidden_units --> 5, output_units -->1 (D,M,K) = (3,5,1) #initialize weights w1 = np.random.uniform(size=(D,M)) w2 = np.random.uniform(size=(M,K)) learning_rate = 1 cost = []

We then initialize our weights $w1$ and $w2$, `learning_rate=1`

and an empty cost list to collect the cost at each iterations which helps us visualize the cost function.

Let’s create an activation function to carry out the operations, here **sigmoid**.

def sigmoid(z,deriv=False): if deriv: return sigmoid(z)*(1-sigmoid(z)) return 1.0/(1+np.exp(-z))

Note that we don’t use **softmax** as our task is just binary classification where sigmoid is just enough to characterize the output.Then we iterate through **5000** times to train our model. And the training is followed as shown below.

#### Feedforward

$z1 = X \cdot w1$

$a1 = g(z1)$

$z2 = a1 \cdot w2$

$a2 = g(z2)$

$cost = \frac{1}{2N} \sum_{i=1}^{N}(a_2-y)^2$

#### Backpropagation

$\delta_3 = (a_2-y)$

$\delta_2 = \delta_3(g(\bar z_2))$

$\delta_1 = \delta_2 \cdot {w_2}^T g(\bar z_1)$

$\Delta w_2 = a_1 \cdot \delta_2$

$\Delta w_1 = X^T \cdot \delta_1$

$w1 = w1 – \alpha \Delta w_1$

$w2 = w2 – \alpha \Delta w_2$

Though the equations are available in the previous post, I mentioned them here as a matter of completeness.

for i in xrange(5000): #feedforward z1 = X.dot(w1) a1 = sigmoid(z1) z2 = a1.dot(w2) a2 = sigmoid(z2) #append cost cost.append(0.5*np.mean((a2-y)**2)) #backpropagate delta_3 = (a2-y) #shape: (N,K) delta_2 = delta_3*sigmoid(z2,deriv=True) #shape: (N,K) w2_delta = a1.T.dot(delta_2) #shape: (M,K) delta_1 = delta_2.dot(w2.T)*sigmoid(z1,deriv=True) #shape: (N,M) w1_delta = X.T.dot(delta_1) #shape: (D,M) w1 = w1 - learning_rate*w1_delta w2 = w2 - learning_rate*w2_delta

The code follows the equations I’ve mentioned before, so it wouldn’t be a great deal of explaining each of them.

plt.plot(cost) plt.show() print a2

Finally let’s plot the cost function and also print the output predicted output for our inputs to the console.

We can see that our cost is reduced to $0$ which means our model is performing so accurate. Let’s peak into the console for the output.

$ python mlp.py [[ 0.01839197] [ 0.98414062] [ 0.98608678] [ 0.01521415]]

We can see that our model is performing well. Let’s do the same using keras in the next section.

### Code with keras

**Keras** is one of the best deep learning library out there which makes building the most complex neural networks in an easy and an intuitive way. Keras primarily relies on either **Theano** or **Tensorflow** for its computation grid. We will go into much of it in the upcoming posts. For now let’s see, how we can actually build the same neural network as above.

from keras.models import Sequential from keras.layers import Dense from keras.optimizers import SGD

Let’s start off by importing necessary classes. Keras have awesome modularity which makes imports easier. `models.Sequential`

is some type of a rack we need our neural network to be fitted in. `layers.Dense`

allows us to construct interconnected network layers. And let’s choose Stochastic Gradient Descent for our training which is then available through `optimizers.SGD`

import numpy as np X = np.array([ [0,0,1],[0,1,1],[1,0,1],[1,1,1] ]) y = np.array([[0,1,1,0]]).T

Then import numpy and initialize our inputs. Let’s proceed to build the neural network using `keras`

.

model = Sequential() model.add(Dense(input_dim=3,output_dim=5,activation="sigmoid")) model.add(Dense(output_dim=1,activation="sigmoid")) sgd = SGD(lr=1.0)

We create a Sequential model object and then add layers as shown above.** input_dim** need to be defined only for the input layer which needs `no.of.features`

and there on we need to supply only **output_dim** for each *successive layers* which helps us building neural network.

Also the activation function is provided as a keyword_argument but can be further extended with keras.activations. SGD with `learning_rate=1.0`

is initialized.

model.compile(optimizer=sgd,loss="mean_squared_error") model.fit(X,y,nb_epoch=5000,batch_size=32) print model.predict_classes(X)

And the next step is compiling the model, we choose SGD as optimizer and the `loss="mean_squared_error"`

, since we are dealing with binary classification only, * cross-entropy* is not so helpful. And finally let’s print the predicted output to the console.

Epoch 4998/5000 4/4 [==============================] - 0s - loss: 5.6662e-04 Epoch 4999/5000 4/4 [==============================] - 0s - loss: 5.6647e-04 Epoch 5000/5000 4/4 [==============================] - 0s - loss: 5.6633e-04 4/4 [==============================] - 0s [[0] [1] [1] [0]]

From the above we can say that our *keras* model is super easy to build and it is very handy in building very complex networks which involves deep layers where building a neural network by hand is very difficult.

In the next post, let’s discuss about the various types of optimization techniques that helps us minimizing the cost effectively. And let’s get our hands dirty with *theano* and *tensorflow*. And then let’s head on to recognizing handwritten digits.