TensorFlow - How to define a Multi Layer Perceptron (MLP)

Contents:


MLP for MNIST

A MLP learning to recognize digits 0,...,9 using 50.000 MNIST training data samples of 28x28 pixel images:
		
# mnist_mlp.py
#
# Here we construct a multi-layer perceptron (MLP),
# i.e., a neural network using hidden layers,
# and then train it using the MNIST data
#
# Note: it is not a Convolutional Neural Network (CNN),
#       which share weights!
#
# by Prof. Dr. Juergen Brauer, www.juergenbrauer.org

import tensorflow as tf

# 1. we use the input_data class in order to import the MNIST data
from tensorflow.examples.tutorials.mnist import input_data
mnist = input_data.read_data_sets("/tmp/data/", one_hot=True)
print("\nmnist has type", type(mnist))
print("There are ", mnist.train.num_examples, " training examples available.")
print("There are ", mnist.test.num_examples, " test examples available.")


# 2. set learning parameters
learn_rate      = 0.001
nr_train_epochs = 40
batch_size      = 100


# 3. configure network parameters
n_hidden_1 = 1000 # nr of neurons in 1st hidden layer
n_hidden_2 = 1000 # nr of neurons in 2nd hidden layer
n_input    = 784 # MNIST data input size (one input image has dimension 28x28 pixels, thus 784 input pixels)
n_classes  =  10 # MNIST total classes (0-9 digits)


# 4. define TensorFlow input placeholders
x = tf.placeholder("float", [None, n_input])
y = tf.placeholder("float", [None, n_classes])


# 5. helper function to create a 4 layer MLP:
#      input-layer --> hidden layer #1 --> hidden layer #2 --> output layer
def multilayer_perceptron(x, weights, biases):

    # hidden layer #1 with RELU
    layer_1 = tf.add(tf.matmul(x, weights['h1']), biases['b1'])
    layer_1 = tf.nn.relu(layer_1)
	
    # hidden layer #2 with RELU
    layer_2 = tf.add(tf.matmul(layer_1, weights['h2']), biases['b2'])
    layer_2 = tf.nn.relu(layer_2)
	
    # output layer with linear activation (no RELUs!)
    out_layer = tf.matmul(layer_2, weights['out']) + biases['out']
	
	# return the MLP model
    return out_layer

	
# 6. combine weights & biases of all layers in dictionaries
weights = {
    'h1': tf.Variable(tf.random_normal([n_input, n_hidden_1])),
    'h2': tf.Variable(tf.random_normal([n_hidden_1, n_hidden_2])),
    'out': tf.Variable(tf.random_normal([n_hidden_2, n_classes]))
}
biases = {
    'b1': tf.Variable(tf.random_normal([n_hidden_1])),
    'b2': tf.Variable(tf.random_normal([n_hidden_2])),
    'out': tf.Variable(tf.random_normal([n_classes]))
}


# 7. use helper function defined before to generate a MLP
my_mlp = multilayer_perceptron(x, weights, biases)


# 8. define loss function
cost = tf.reduce_mean(tf.nn.softmax_cross_entropy_with_logits(my_mlp, y))


# 9. define optimizer
optimizer = tf.train.AdamOptimizer(learning_rate=learn_rate).minimize(cost)


# 10. initialize all variables defined in the model and launch the graph
init = tf.initialize_all_variables()
sess = tf.InteractiveSession()
sess.run(init)


# 11. the actual training happens here:
for epoch in range(nr_train_epochs):

	# reset epoch costs
	epoch_cost = 0.0

	# compute how many batches we will have to process
	nr_batches_to_process = int(mnist.train.num_examples/batch_size)
	
	# loop over all batches to process
	for i in range(nr_batches_to_process):
	
		# get next training batch input matrix and batch label vector
		batch_x, batch_y = mnist.train.next_batch(batch_size)
		#print("type of batch_x is ", type(batch_x))
		#print("shape of batch_x is ", batch_x.shape)
		#print("batch_x=",batch_x)
		#print("batch_y=",batch_y)
		
		# Run optimization op (backprop) and cost op (to get loss value)
		_, c = sess.run([optimizer, cost], feed_dict={x: batch_x,
													  y: batch_y})
													  
		# compute total cost of all batches in this epoch
		epoch_cost += c
		
	# display epoch nr and costs for the selected batches in this epoch
	print("Epoch:", '%03d' % epoch, ", epoch cost=", "{:.3f}".format(epoch_cost))
	
print("Optimization Finished!")

# 12. test the model
correct_prediction = tf.equal(tf.argmax(my_mlp, 1), tf.argmax(y, 1))

# 13. calculate accuracy of the learned model
accuracy = tf.reduce_mean(tf.cast(correct_prediction, "float"))	
print("Accuracy:", accuracy.eval({x: mnist.test.images, y: mnist.test.labels}))
			
And here is the output of the script above:
	
juebrauer@JB-DELL-PC:/mnt/v/01_job/00_vorlesungen_meine/17_deep_learning/06_my_tensorflow_code/mnist$ python3 mnist_mlp.py
Extracting /tmp/data/train-images-idx3-ubyte.gz
Extracting /tmp/data/train-labels-idx1-ubyte.gz
Extracting /tmp/data/t10k-images-idx3-ubyte.gz
Extracting /tmp/data/t10k-labels-idx1-ubyte.gz

mnist has type 
There are  55000  training examples available.
There are  10000  test examples available.
Epoch: 000 , epoch cost= 216676.502
Epoch: 001 , epoch cost= 52526.718
Epoch: 002 , epoch cost= 27346.379
Epoch: 003 , epoch cost= 15929.129
Epoch: 004 , epoch cost= 9773.890
Epoch: 005 , epoch cost= 5625.615
Epoch: 006 , epoch cost= 3881.406
Epoch: 007 , epoch cost= 3997.750
Epoch: 008 , epoch cost= 3837.000
Epoch: 009 , epoch cost= 2844.500
Epoch: 010 , epoch cost= 2823.396
Epoch: 011 , epoch cost= 2579.255
Epoch: 012 , epoch cost= 2589.935
Epoch: 013 , epoch cost= 2137.179
Epoch: 014 , epoch cost= 1850.762
Epoch: 015 , epoch cost= 1661.469
Epoch: 016 , epoch cost= 2394.841
Epoch: 017 , epoch cost= 1787.703
Epoch: 018 , epoch cost= 1762.981
Epoch: 019 , epoch cost= 1262.616
Epoch: 020 , epoch cost= 1347.131
Epoch: 021 , epoch cost= 1465.570
Epoch: 022 , epoch cost= 1413.699
Epoch: 023 , epoch cost= 1419.388
Epoch: 024 , epoch cost= 1050.887
Epoch: 025 , epoch cost= 1334.719
Epoch: 026 , epoch cost= 1216.585
Epoch: 027 , epoch cost= 1271.461
Epoch: 028 , epoch cost= 883.701
Epoch: 029 , epoch cost= 1023.294
Epoch: 030 , epoch cost= 902.925
Epoch: 031 , epoch cost= 978.264
Epoch: 032 , epoch cost= 886.833
Epoch: 033 , epoch cost= 809.244
Epoch: 034 , epoch cost= 747.447
Epoch: 035 , epoch cost= 754.404
Epoch: 036 , epoch cost= 964.711
Epoch: 037 , epoch cost= 806.350
Epoch: 038 , epoch cost= 610.642
Epoch: 039 , epoch cost= 738.599
Optimization Finished!
Accuracy: 0.9736
    
Cool! 97.36% correct classifications on a test dataset of 10.000 test images with such a "simple" model and a few lines of code.

MLP for function approximation

A MLP in TensorFlow learning to approximate some predefined function f(x) given only some samples (x,f(x)):
        # mlp_func_approx.py
        #
        # Here we construct a multi-layer perceptron (MLP),
        # i.e., a neural network using hidden layers,
        # that learns to approximate a 1D function f(x)
        # by using just some sample pairs (x,f(x))
        #
        # by Prof. Dr. Juergen Brauer, www.juergenbrauer.org

        import tensorflow as tf
        import numpy
        import matplotlib.pyplot as plt


        # 1. define function that we will try to approximate using a MLP
        def f(x):
            return x+10*numpy.sin(x)


        # 2. define learning parameters
        learning_rate   = 0.01
        momentum        = 0.1
        training_epochs = 1000


        # 3. define some training & test data
        n_samples = 100
        train_X = numpy.random.random(n_samples)*10
        print("\ntrain_X=", train_X)
        train_Y = f(train_X)

        test_X = numpy.random.random(n_samples)*10
        print("\ntest_X=", test_X)
        test_Y = f(test_X)


        # 4. define placeholders (input nodes of the graph)
        #    where we will put in our training samples (x,f(x))
        X = tf.placeholder("float")
        Y = tf.placeholder("float")


        # 5. setup our model

        # linear model
        #W = tf.Variable(numpy.random.randn(), name="weight")
        #b = tf.Variable(numpy.random.randn(), name="bias")
        #pred = tf.add(tf.mul(X, W), b)

        # MLP model generation helper function
        def multilayer_perceptron(x, weights, biases):
            
            # hidden layer #1 with RELU: layer_1=relu(W1*x+b1)
            # searched for a long time for an error:
            # reshaped_x helped to solve the error of wrong input dimension!
            reshaped_x = tf.reshape(x, [-1, 1])
            layer_1 = tf.add(tf.matmul(reshaped_x, weights['h1']), biases['b1'])
            layer_1 = tf.nn.relu(layer_1)
             
            # hidden layer #2 with RELU: layer_2=relu(W2*layer1+b2)
            layer_2 = tf.add(tf.matmul(layer_1, weights['h2']), biases['b2'])
            layer_2 = tf.nn.relu(layer_2)
             
            # output layer with linear activation (no RELUs!): out_layer=W_out*layer2+b_out
            out_layer = tf.matmul(layer_2, weights['out']) + biases['out']
             
            # return the MLP model
            return out_layer
         
        # define nr of neurons per layer
        dim_in = 1
        dim1 = 10
        dim2 = 10
        dim_out = 1
             
        # combine weights & biases of all layers in dictionaries
        weights = {
            'h1': tf.Variable(tf.random_normal([dim_in, dim1])),
            'h2': tf.Variable(tf.random_normal([dim1, dim2])),
            'out': tf.Variable(tf.random_normal([dim2, dim_out]))
        }
        biases = {
            'b1': tf.Variable(tf.random_normal([dim1])),
            'b2': tf.Variable(tf.random_normal([dim2])),
            'out': tf.Variable(tf.random_normal([dim_out]))
        }
          
        # use helper function defined before to generate a MLP
        pred = multilayer_perceptron(X, weights, biases)



        # 6. define cost function to be optimzed in the following:
        # Minimize the squared errors (SSE)
        cost = tf.reduce_sum(tf.pow(pred-Y, 2))/(2*n_samples)


        # 7. generate optimizer node in the graph
        #optimizer = tf.train.MomentumOptimizer(learning_rate, momentum).minimize(cost)
        optimizer = tf.train.GradientDescentOptimizer(learning_rate).minimize(cost)


        # 8. initialize variables & launch graph
        init = tf.initialize_all_variables()
        sess = tf.InteractiveSession()
        sess.run(init)


        # 9. for all training epochs ...
        print("Starting with training...")
        for epoch in range(training_epochs):

             # for all individual data samples we have ...	
            for (x, y) in zip(train_X, train_Y):
                # run a backprob step
                sess.run(optimizer, feed_dict={X: x, Y: y})
                
            # display epoch nr
            if epoch % 20 == 0:
                c = sess.run(cost, feed_dict={X: test_X, Y:test_Y})
                print("Epoch:", '%04d' % epoch, "cost=", "{:.9f}".format(c))
        print("Optimization Finished!")

        # 10. show final costs on test data set
        test_cost = sess.run(cost, feed_dict={X: test_X, Y: test_Y})
        print("Test dataset costs=", test_cost, "\n")

        # 11. plot training & test data set
        plt.plot(train_X, train_Y, 'r.', label='Training data')
        plt.plot(test_X, test_Y, 'b+', label='Test data')

        # 12. compute prediction by model for test data points
        predicted_Y = numpy.zeros(n_samples)
        for i in range(len(test_X)):
            y_value = sess.run(pred, feed_dict={X: test_X[i]})
            #print("i=",i, "x=",test_X[i], "f(x)=",y_value)
            predicted_Y[i] = y_value
        print(predicted_Y)

        # 13. plot prediction by model for test data points
        plt.plot(test_X, predicted_Y, 'g+', label='Prediction by learned model for test data')

        # 14. show the plot
        plt.legend()
        plt.show()
    
Here is the resulting plot:

And the console output:
train_X= [ 6.55263519  2.26168932  0.11372009  0.88079969  3.90524367  4.19027704
  4.51647984  9.17503357  6.05409452  6.0693238   7.37462616  2.37949576
  1.12310983  1.55192457  9.54793132  5.53468104  1.86171689  9.96779896
  1.69884425  3.5979255   8.31940896  2.20902131  0.19651604  2.72954622
  3.1382006   8.0656514   2.90658115  5.7083335   6.38531743  2.31708153
  2.14475552  6.01668851  4.54260383  7.37515847  9.25169677  3.9034424
  0.52978624  3.22197729  4.18427074  3.38735953  5.55135636  8.19369609
  4.13857239  5.33998349  2.70608171  8.44530051  4.5269492   8.04372792
  6.37424512  5.38526779  3.78509409  3.56197345  8.15139351  2.0848429
  6.15208461  0.07911736  7.61692274  3.9864625   5.69838941  1.11504152
  4.76962813  1.73391235  0.19263586  1.58901935  2.07645893  0.15091672
  4.63659779  2.71076452  3.65044835  2.72451005  4.20339691  1.23306242
  1.80130665  1.76138396  2.45767413  3.93672177  0.02278248  9.09963428
  9.15135428  4.43075558  5.01668063  8.15456064  4.21513647  0.68522875
  5.07022263  2.01622247  3.23434799  1.29006761  6.47857998  7.02029138
  0.52898031  1.86328232  2.68731769  2.71265535  3.29847011  2.44686981
  8.300372    4.45796246  1.34314429  6.81362293]

test_X= [ 1.51198014  4.51576052  0.09275119  9.58248169  2.43708796  0.47039588
  1.97001945  4.05608649  1.77805367  1.07130416  4.73821217  5.18177432
  8.43786016  4.3513548   9.88246484  9.50326742  3.10043777  9.45217366
  1.74149502  9.72137966  1.24284701  1.60835921  9.00437876  3.87124423
  0.04450994  4.1679159   5.15171769  1.70550926  6.22313055  6.82840034
  3.64876938  7.26726639  2.1746669   0.14817737  1.77566636  0.79334107
  7.32976636  7.88565429  6.37333961  0.5249644   8.60701982  2.89808034
  4.71079306  4.72175475  5.50017862  2.14293307  0.9327945   0.90023805
  0.96638891  5.30864503  7.02015343  4.47486869  4.60804259  3.72202593
  5.40953638  4.27844304  1.81543992  0.14439206  0.45943251  7.2858847
  2.04125011  9.53662855  8.55759255  8.10585861  1.07392529  3.34112219
  5.69227147  7.48822027  4.7373497   2.03009417  4.11152482  1.70323246
  6.5119469   9.19361851  2.70866184  4.1894567   2.36977292  5.51917285
  8.32158826  2.97704836  6.66083506  7.25593141  5.38617029  2.7126223
  1.87814298  5.97110037  8.54463702  8.58443529  2.70573006  1.25675141
  8.54011144  3.81067884  4.01364034  8.39048483  7.64948167  1.61194228
  7.04270796  2.52632498  8.91705351  7.45047845]
Starting with training...
Epoch: 0000 cost= 2890.780517578
Epoch: 0020 cost= 3034.196289062
Epoch: 0040 cost= 3031.095703125
Epoch: 0060 cost= 3131.261474609
Epoch: 0080 cost= 3274.770507812
Epoch: 0100 cost= 3553.739257812
Epoch: 0120 cost= 3783.474365234
Epoch: 0140 cost= 4103.608886719
Epoch: 0160 cost= 4468.594238281
Epoch: 0180 cost= 4773.411132812
Epoch: 0200 cost= 5231.257324219
Epoch: 0220 cost= 5240.738281250
Epoch: 0240 cost= 5271.752929688
Epoch: 0260 cost= 5253.159179688
Epoch: 0280 cost= 5220.985351562
Epoch: 0300 cost= 5234.279296875
Epoch: 0320 cost= 5228.257324219
Epoch: 0340 cost= 5223.337890625
Epoch: 0360 cost= 5222.245117188
Epoch: 0380 cost= 5217.592285156
Epoch: 0400 cost= 5216.085449219
Epoch: 0420 cost= 5221.200683594
Epoch: 0440 cost= 5233.438964844
Epoch: 0460 cost= 5218.162109375
Epoch: 0480 cost= 5238.752441406
Epoch: 0500 cost= 5241.807617188
Epoch: 0520 cost= 5252.708007812
Epoch: 0540 cost= 5292.647460938
Epoch: 0560 cost= 5318.941894531
Epoch: 0580 cost= 5326.308593750
Epoch: 0600 cost= 5332.031250000
Epoch: 0620 cost= 5333.506347656
Epoch: 0640 cost= 5332.915039062
Epoch: 0660 cost= 5329.939941406
Epoch: 0680 cost= 5327.738281250
Epoch: 0700 cost= 5324.104492188
Epoch: 0720 cost= 5318.053222656
Epoch: 0740 cost= 5310.515136719
Epoch: 0760 cost= 5301.332031250
Epoch: 0780 cost= 5292.268554688
Epoch: 0800 cost= 5284.721679688
Epoch: 0820 cost= 5277.413085938
Epoch: 0840 cost= 5273.008300781
Epoch: 0860 cost= 5278.379394531
Epoch: 0880 cost= 5280.264160156
Epoch: 0900 cost= 5281.753906250
Epoch: 0920 cost= 5281.160156250
Epoch: 0940 cost= 5280.924804688
Epoch: 0960 cost= 5279.659179688
Epoch: 0980 cost= 5278.782714844
Optimization Finished!
Test dataset costs= 5278.42 

[ 12.22212505  -5.73527956   1.25282991   9.96770573   8.77249813
   4.74080992  11.42588711  -2.55185556  11.7595892    9.90124035
  -6.55358362  -2.62293863  16.59562492  -4.59670305   8.23064709
  10.42639446   4.06637144  10.72224712  11.82314587   9.16340542
  10.94184399  12.05458355  13.3152132   -1.27175009   0.82790697
  -3.32631922  -2.88928461  11.88570881   6.60509062  11.96871853
   0.26896799  15.85774517  10.67631054   1.74104631  11.76375103
   8.21507931  16.4115963   17.9793663    7.93617582   5.44150114
  15.61610508   5.46777487  -6.79655695  -6.69942522   0.19862115
  10.8621912    9.06102276   8.86353207   9.26481533  -1.49866354
  13.667943    -5.45207644  -6.37436342  -0.23835623  -0.60461485
  -4.09176111  11.69460583   1.70770586   4.63843107  16.0227356
  11.30206585  10.23320675  15.90231228  18.07948112   9.9171381
   2.39954138   1.90085542  17.79867172  -6.56122255  11.32145786
  -2.93578577  11.88966084   9.16445065  12.21940517   6.77956915
  -3.47549105   9.27728176   0.36693704  17.26889992   4.92089224
  10.48383236  15.75730038  -0.81167662   6.75214338  11.5856123
   4.37171698  15.97733974  15.74687862   6.79987478  11.02618885
  16.00354004  -0.85231268  -2.25789881  16.86995316  17.87198639
  12.04835796  13.86781597   8.10332584  13.82086277  17.48129845]
    
Open question: why do the costs for the test data set (and the training data set) grow during learning? If we reduce training_epochs from 1000 to 100, the costs are smaller, but the learned model is worse:
    
Epoch: 0000 cost= 2413.453125000
Epoch: 0020 cost= 2408.393554688
Epoch: 0040 cost= 2549.558837891
Epoch: 0060 cost= 2778.594970703
Epoch: 0080 cost= 3081.668701172
Optimization Finished!
Test dataset costs= 3281.55