TensorFlow - How to define a Convolutional Neural Network (CNN)


A CNN learning to recognize digits 0,...,9 using 50.000 MNIST training data samples of 28x28 pixel images. The scripts makes use of the
		
        tf.nn.conv2d(input, filter, strides, padding, use_cudnn_on_gpu=None, data_format=None, name=None)
function of TensorFlow for generating the convolution layers.

A good explanation with examples what the function tf.nn.conv2d() does can be found here.
A good explanation with examples what the function tf.reshape() does can be found here.

Here is the script:

		
                # tf_cnn_example.py
                #
                # Here we construct a simple Convolutional Neural Network (CNN)
                # using TensorFlow (TF) which will learn using the MNIST training dataset
                # to classify 28x28 pixel images of digits 0,...,9
                #
                # by Prof. Dr. Juergen Brauer, www.juergenbrauer.org

                import tensorflow as tf
                import matplotlib.pyplot as plt
                import matplotlib.cm as cm
                import numpy as np
                from random import randint

                # 1. get the MNIST training + test data
                # Note: this uses the mnist class provided by TF for a convenient
                #       access to the data in just a few lines of code
                from tensorflow.examples.tutorials.mnist import input_data
                mnist = input_data.read_data_sets("/tmp/data/", one_hot=True)

                # show an example of a train image
                img_nr = 489
                label_vec = mnist.train.labels[img_nr]
                print("According to the training data the following image is a ", np.argmax(label_vec) )
                tmp = mnist.train.images[img_nr]
                tmp = tmp.reshape((28,28))
                plt.imshow(tmp, cmap = cm.Greys)
                plt.show()


                # 2. set up training parameters
                learning_rate = 0.001
                training_iters = 50000
                batch_size = 128
                display_step = 10


                # 3. set up CNN network parameters
                n_input   = 784  # MNIST data input dimension (img has shape: 28*28 pixels)
                n_classes = 10   # MNIST nr of total classes (0-9 digits)
                dropout   = 0.75 # probability to keep an unit


                # 4. define TF graph input nodes x,y,keep_prob
                x = tf.placeholder(tf.float32, [None, n_input])
                y = tf.placeholder(tf.float32, [None, n_classes])
                keep_prob = tf.placeholder(tf.float32)


                # 5. define a helper function to create a single CNN layer
                #    with a bias added and RELU function output
                def conv2d(x, W, b, strides=1):
                    # Conv2D wrapper, with bias and relu activation
                    x = tf.nn.conv2d(x, W, strides=[1, strides, strides, 1], padding='SAME')
                    x = tf.nn.bias_add(x, b)
                    return tf.nn.relu(x)


                # 6. define a helper function to create a single maxpool operation
                #    for the specified tensor x - with a max pooling region of 2x2 'pixels'
                def maxpool2d(x, k=2):
                    # MaxPool2D wrapper
                    return tf.nn.max_pool(x, ksize=[1, k, k, 1], strides=[1, k, k, 1],
                                          padding='SAME')


                # 7. helper function to create a CNN model
                def conv_net(x, weights, biases, dropout):

                    # reshape input picture which has size 28x28 to a 4D vector
                    # -1 means: infer the size of the corresponding dimension
                    # here: it will result in 1
                    x = tf.reshape(x, shape=[-1, 28, 28, 1])

                    # create first convolution layer
                    conv1 = conv2d(x, weights['wc1'], biases['bc1'])

                    # then add a max pooling layer for down-sampling on top of conv1
                    conv1 = maxpool2d(conv1, k=2)

                    # create second convolution layer
                    conv2 = conv2d(conv1, weights['wc2'], biases['bc2'])

                    # then add a max pooling layer for down-sampling on top of conv2
                    conv2 = maxpool2d(conv2, k=2)

                    # create a fully connected layer
                    # thereby: reshape conv2 output to fit fully connected layer input
                    fc1 = tf.reshape(conv2, [-1, weights['wd1'].get_shape().as_list()[0]])
                    fc1 = tf.add(tf.matmul(fc1, weights['wd1']), biases['bd1'])
                    fc1 = tf.nn.relu(fc1)

                    # apply dropout during training for this fully connected layer fc1
                    fc1 = tf.nn.dropout(fc1, dropout)

                    # add output layer: out=fc1*out_weights+out_biases
                    out = tf.add(tf.matmul(fc1, weights['out']), biases['out'])

                    # return tensor operation
                    return out


                # 8. initialize layers weights & biases normally distributed and
                #    store them in a dictionary each
                weights = {
                    # 5x5 conv, 1 input, 32 outputs
                    'wc1': tf.Variable(tf.random_normal([5, 5, 1, 32])),
                    # 5x5 conv, 32 inputs, 64 outputs
                    'wc2': tf.Variable(tf.random_normal([5, 5, 32, 64])),
                    # fully connected, 7*7*64 inputs, 1024 outputs
                    'wd1': tf.Variable(tf.random_normal([7*7*64, 1024])),
                    # 1024 inputs, 10 outputs (class prediction)
                    'out': tf.Variable(tf.random_normal([1024, n_classes]))
                }

                biases = {
                    'bc1': tf.Variable(tf.random_normal([32])),
                    'bc2': tf.Variable(tf.random_normal([64])),
                    'bd1': tf.Variable(tf.random_normal([1024])),
                    'out': tf.Variable(tf.random_normal([n_classes]))
                }


                # 9. construct model using helper function
                pred = conv_net(x, weights, biases, keep_prob)


                # 10. define loss and optimizer
                cost = tf.reduce_mean(tf.nn.softmax_cross_entropy_with_logits(pred, y))
                optimizer = tf.train.AdamOptimizer(learning_rate=learning_rate).minimize(cost)


                # 11. evaluate model
                correct_pred = tf.equal(tf.argmax(pred, 1), tf.argmax(y, 1))
                accuracy = tf.reduce_mean(tf.cast(correct_pred, tf.float32))


                # 12. initializing the variables and init graph
                init = tf.initialize_all_variables()
                sess = tf.InteractiveSession()
                sess.run(init)


                # 13. keep training until reach max iterations
                step = 1
                while step * batch_size < training_iters:

                        # get next training batch
                        batch_x, batch_y = mnist.train.next_batch(batch_size)

                        # set inputs & run optimization op (backprop)
                        sess.run(optimizer, feed_dict={x: batch_x, y: batch_y,
                                                       keep_prob: dropout})                        

                        if step % display_step == 0:

                            # calculate batch loss and accuracy
                            loss, acc = sess.run([cost, accuracy], feed_dict={x: batch_x,
                                                                              y: batch_y,
                                                                              keep_prob: 1.})

                            print("Iter " + str(step*batch_size) + ", Minibatch Loss= " + \
                                  "{:.6f}".format(loss) + ", Training Accuracy= " + \
                                  "{:.5f}".format(acc))

                        step += 1

                print("Optimization finished!")

                # 14. calculate accuracy for test images
                print("Testing Accuracy:", \
                sess.run(accuracy, feed_dict={x: mnist.test.images[:512],
                                              y: mnist.test.labels[:512],
                                              keep_prob: 1.}))

                # 15. show an example of a test image used for computing the accuracy
                img_nr = randint(0, 512)
                tmp = mnist.test.images[img_nr]
                tmp = tmp.reshape((28,28))
                plt.imshow(tmp, cmap = cm.Greys)
                plt.show()
			
One of the 10.000 28x28 pixels example MNIST training images:

And here is the output of the script above:
	
juebrauer@ubuntu:~/my_lectures/deep_learning/python$ python3 tf_cnn_example.py 
Extracting /tmp/data/train-images-idx3-ubyte.gz
Extracting /tmp/data/train-labels-idx1-ubyte.gz
Extracting /tmp/data/t10k-images-idx3-ubyte.gz
Extracting /tmp/data/t10k-labels-idx1-ubyte.gz
According to the training data the following image is a  5
Iter 1280, Minibatch Loss= 20582.667969, Training Accuracy= 0.32812
Iter 2560, Minibatch Loss= 9348.425781, Training Accuracy= 0.42969
Iter 3840, Minibatch Loss= 5416.346680, Training Accuracy= 0.64844
Iter 5120, Minibatch Loss= 2910.334473, Training Accuracy= 0.82031
Iter 6400, Minibatch Loss= 2785.008301, Training Accuracy= 0.82812
Iter 7680, Minibatch Loss= 6179.224609, Training Accuracy= 0.70312
Iter 8960, Minibatch Loss= 2892.498047, Training Accuracy= 0.80469
Iter 10240, Minibatch Loss= 2618.186768, Training Accuracy= 0.83594
Iter 11520, Minibatch Loss= 1050.058105, Training Accuracy= 0.89062
Iter 12800, Minibatch Loss= 3317.343750, Training Accuracy= 0.83594
Iter 14080, Minibatch Loss= 1068.383057, Training Accuracy= 0.92969
Iter 15360, Minibatch Loss= 1084.695679, Training Accuracy= 0.90625
Iter 16640, Minibatch Loss= 1544.316162, Training Accuracy= 0.91406
Iter 17920, Minibatch Loss= 1194.717896, Training Accuracy= 0.87500
Iter 19200, Minibatch Loss= 712.147949, Training Accuracy= 0.92969
Iter 20480, Minibatch Loss= 105.110428, Training Accuracy= 0.97656
Iter 21760, Minibatch Loss= 3072.687500, Training Accuracy= 0.85938
Iter 23040, Minibatch Loss= 798.411316, Training Accuracy= 0.93750
Iter 24320, Minibatch Loss= 754.927490, Training Accuracy= 0.89062
Iter 25600, Minibatch Loss= 1699.995605, Training Accuracy= 0.87500
Iter 26880, Minibatch Loss= 701.918030, Training Accuracy= 0.92969
Iter 28160, Minibatch Loss= 640.353394, Training Accuracy= 0.91406
Iter 29440, Minibatch Loss= 1515.041992, Training Accuracy= 0.90625
Iter 30720, Minibatch Loss= 940.961914, Training Accuracy= 0.91406
Iter 32000, Minibatch Loss= 921.288025, Training Accuracy= 0.93750
Iter 33280, Minibatch Loss= 535.369629, Training Accuracy= 0.93750
Iter 34560, Minibatch Loss= 451.444275, Training Accuracy= 0.94531
Iter 35840, Minibatch Loss= 192.852356, Training Accuracy= 0.96875
Iter 37120, Minibatch Loss= 1065.391846, Training Accuracy= 0.92188
Iter 38400, Minibatch Loss= 104.525597, Training Accuracy= 0.97656
Iter 39680, Minibatch Loss= 213.619064, Training Accuracy= 0.96875
Iter 40960, Minibatch Loss= 1260.187622, Training Accuracy= 0.89844
Iter 42240, Minibatch Loss= 451.840454, Training Accuracy= 0.94531
Iter 43520, Minibatch Loss= 489.602539, Training Accuracy= 0.95312
Iter 44800, Minibatch Loss= 258.125122, Training Accuracy= 0.97656
Iter 46080, Minibatch Loss= 185.385040, Training Accuracy= 0.96875
Iter 47360, Minibatch Loss= 965.169678, Training Accuracy= 0.92188
Iter 48640, Minibatch Loss= 1042.993164, Training Accuracy= 0.92188
Iter 49920, Minibatch Loss= 671.977417, Training Accuracy= 0.94531
Optimization finished!
Testing Accuracy: 0.945312
juebrauer@ubuntu:~/my_lectures/deep_learning/python$