Handwritten Digits Recognition with Convolutional Neural Networks Architecture


Today, i want to write about the ‘hello world!’ step of machine learning, handwritten digits recognition with Mnist data set, python language and tflearn library. There are some other traditional ways to make this process like support vector machines (SVM), neural networks(NN) etc. But this one done with convolutional neural network(CNN). Let’s talk a bit about what the CNN is?

Convolutional neural networks (CNN or ConvNet) is a feed-forward artifical neural network, which is inspired by the organization of the animal visual cortex. It used in mostly visual tasks like image and video recognation but it is also efficient in other areas such as recommender systems, natural language  processing etc.

Comparing with the traditional neural nets, CNNs need less pre-prossesing for extracting feautres with human effort. The features could be the edges, shapes, color etc. for visual tasks like image recognition. The network is responsible for learning the filters by self. The lack of human effort is the major advantage of CNNs.

Anyway, i do not tell you the whole process and architecture at this article. There are lots of articles and tutorials about it, you can get deep understanding on CNN from here. I just want to share code and give some hints below the pieces.  The codes written in python 3.5 and the only library you need is tflearn.

import tflearn
from tflearn.layers.conv import conv_2d, max_pool_2d
from tflearn.layers.core import input_data, dropout, fully_connected
from tflearn.layers.estimator import regression
import tflearn.datasets.mnist as mnist

X, Y, test_x, test_y = mnist.load_data(one_hot=True) 
"""this load the test and training Mnist data to X,Y(training) and test_x,test_y(test) martices from tflearn library."""

X = X.reshape([-1, 28, 28, 1])
test_x = test_x.reshape([-1, 28, 28, 1])
"""these pieces of code makes every test and training samples 28x1 vector, represent all of them in an multi dimensional array(X and test_X). It is called tensor"""

convnet = input_data(shape=[None,28,28,1], name='input')
"""this makes an input layer set the size and shape of every single mnist input data with the 'input' name."""

convnet = conv_2d(convnet, 32, 2, activation='relu')
convnet = max_pool_2d(convnet, 2)
""" The convolution layer is defined with 32 pieces 2x2 size kernels(filter), rectified linear activation function and a 2x2 size maximum pooling filter"""

convnet = conv_2d(convnet, 64, 2, activation='relu')
convnet = max_pool_2d(convnet, 2)
"""Here it is define second convolutional layer like the same way above, but with 64 kernels"""

convnet = fully_connected(convnet, 1024, activation='relu')
convnet = dropout(convnet, 0.8)
"""This pieces of code create a fully connected layer has 1024 nodes with rectified linear activation function again and it assign a dropout value to avoid connection between the irrelevant/unnecessary nodes"""

convnet = fully_connected(convnet, 10, activation='softmax')
convnet = regression(convnet, optimizer='adam', learning_rate=0.01, loss='categorical_crossentropy', name='targets')
"""it makes the output layer with 10 classes (0 to 9 digits) and softmax activation function, defines the optimization method, learning rate value and loss/cost function with the 'targets' name"""

model = tflearn.DNN(convnet)
#And CNN model is created.

model.fit({'input':X},{'targets':Y}, n_epoch=10, validation_set=({'input':test_x},{'targets':test_y}),
 snapshot_step=500, show_metric=True, run_id='mnist')
"""Here we fit the input-targets data and the validation data. The epoch means forward+back propagation processes, n_epoch represents the number of the epoch. With the snapshot_step, we can set the step number to show the cost and accuracy results while the training process in progress."""

"""And lastly, this save the weight parameters of our model. It's done and ready to train."""

This process is running on CPU and the learning time is about 40 minutes with an Intel Core i7 with 2.60 GHz. We can extremely reduce the learning time if we do the CUDA configuration to run the code on GPU. The node numbers, size and number of kernels, activation functions, learning rate, optimization methods, are all up to experience and math. It is a state of art for the deep learning engineers. What is the most suitable model for good accuracy and learning time? There is not one exact rule and solution, they are all depends on your problem. I recommend you to read this paper, too.

Actually, we use the kernels at most of the application we have. If you had a look at the first link i gave, you already understood the point and what the kernels are. For instance; photoshop uses lots of kernels for blurring, sharpening etc. We can think they are all some features that our images have after the feature extraction with convolution layer.









You can see my suited and handsome picture 😛 and with the top sobel kernel applied form at the right side. My eyebrows, mustaches and beards are more evident with this filter 🙂 Think about the facebook, the face recognition system of it for tagging people, how does it do that task by self, how can it remember us? The answer is deep learning, most probably with a kind of CNN architecture and of course these kind of kernels. Maybe their system look at all of my pictures I uploaded before and applied a training process to have weight parameters of me and this kernel above could be the one of the feature of my figure. And maybe my eyebrows, mustaches and  beards are the identification factors of me, according to this kernel 🙂 and there are lots of other kernels applied on my pictures, too. So, when one of my friend upload a new picture with me, facebook can give an advice with using the obtained weight parameters of me: hey this could be Mehmet, do you want to tag him? This is how CNNs work.

Now, go back to our work. We trained our network with mnist digits and save the parameters that we obtained. We can use these parameters to recognise the handwritten digits. Let’s do a prediction and see the result:

import matplotlib.pyplot as plt

plt.imshow(test_x[1].reshape(28,28), cmap='gray')



and the result look like this;




We can see our first test sample is ‘2’. We know the indexes of output vector represents the digits(0-9). So we can see the prediction in the output vector, the third index has exact 1.0 value. It says; dude, that must be exactly a ‘2’ digit!  🙂 We are done. Best wishes and take care..



Leave a Reply

Your email address will not be published. Required fields are marked *