# Speed Test for Neural Net Implementation

If you want to implement a neural net you are confrontrated with the question which design to use for such an implementation.

Since connectionist systems as neural nets often consist out of hundreds, thousands, or even millions of simple computing elements (neurons) the question arises in the context of a code implementation whether to model each neuron by a separate class.

At the first glance this seems to be the method of choice since we could use two classes:

• a neural net class A which holds all the references to the neurons
• a neuron class B which encapsulates all the neuron stuff (activation value, output value, links to other neurons, etc.)

But this could be computationally expensive, because for the neural net computation we need to access all the single neuron classes B for updating their activations, recompute an output, etc.

The alternative is to use just one class (class C) that holds the activation and output values of all the neurons in arrays. While this is faster - see the result of the following very simple speed test - it carries the danger of leading to a much more cluttered implementation.

## Speed Test Code

```#include <stdio.h>
#include <tchar.h>
#include <time.h>
#include <iostream>

using namespace std;

static int   N  = 1000000;
static float PI = 3.14159f;

class B
{
public:

B() {};

void SetValue(float val)
{
m_AValue = val;
};

float GetValue()
{
return m_AValue;
};

private:

float   m_AValue;

};

class A
{
public:
A() {};

void GenerateSomeBs()
{
m_ListOfBs = new B*[N];
for (int i=0; i<N; i++)
m_ListOfBs[i] = new B();
};

void SetValuesOfAllBs()
{
for (int i=0; i<N; i++)
m_ListOfBs[i]->SetValue( PI );
};

float GetValuesOfAllBs()
{
float sum = 0.0f;
for (int i=0; i<N; i++)
sum += m_ListOfBs[i]->GetValue();
return sum;
};

private:

B**      m_ListOfBs;

};

class C
{
public:
C() {};

void GenerateValuePlaceholdersForDs()
{
m_ListOfDValues = new float[N];
};

void SetValuesOfAllDs()
{
for (int i=0; i<N; i++)
m_ListOfDValues[i] = PI;

};

float GetValuesOfAllDs()
{
float sum = 0.0f;
for (int i=0; i<N; i++)
sum += m_ListOfDValues[i];

return sum;
}

private:

float* m_ListOfDValues;
};

int _tmain(int argc, _TCHAR* argv[])
{
cout << "SpeedTest: is it really a good idea to spend a separate object for each neuron?" << endl;

cout << "We will simulate N=" << N << " neurons (smaller classes)" << endl;

clock_t start;
int TimeElapsed;
float sum;

// 1. SpeedTest: A with many references to separate class B
cout << endl << "A+B" << endl;
A* myA = new A();

// Generating Bs
start = clock();
myA->GenerateSomeBs();
TimeElapsed = clock() - start;
cout << "Time needed for GenerateSomeBs(): " << TimeElapsed << " ms" << endl;

// Access single Bs and set their member variable to a defined value
start = clock();
myA->SetValuesOfAllBs();
TimeElapsed = clock() - start;
cout << "Time needed for SetValueOfAllBs(): " << TimeElapsed << " ms" << endl;

// Compute sum accessing all B values
start = clock();
sum = myA->GetValuesOfAllBs();
TimeElapsed = clock() - start;
cout << "Time needed for GetValueOfAllBs(): " << TimeElapsed << " ms" << endl;

// 2. SpeedTest: C which directly holds the values
C* myC = new C();
cout << endl << "C with internal D values" << endl;

start = clock();
myC->GenerateValuePlaceholdersForDs();
TimeElapsed = clock() - start;
cout << "Time needed for GenerateValuePlaceholdersForDs(): " << TimeElapsed << " ms" << endl;

// Access single values and set to a defined value
start = clock();
myC->SetValuesOfAllDs();
TimeElapsed = clock() - start;
cout << "Time needed for SetValueOfAllDs(): " << TimeElapsed << " ms" << endl;

// Compute sum accessing all D values
start = clock();
sum = myC->GetValuesOfAllDs();
TimeElapsed = clock() - start;
cout << "Time needed for GetValueOfAllDs(): " << TimeElapsed << " ms" << endl;

// Wait for some input
cin.get();

return 0;
}```

## Speed Test Results

A = neural net class

B = neuron class

C = neural net class

D = neuron (state values)

```SpeedTest: is it really a good idea to spend a separate object for each neuron?
We will simulate N=100000 neurons (smaller classes)

A+B
Time needed for GenerateSomeBs(): 165 ms
Time needed for SetValueOfAllBs(): 4 ms
Time needed for GetValueOfAllBs(): 4 ms

C with internal D values
Time needed for GenerateValuePlaceholdersForDs(): 1 ms
Time needed for SetValueOfAllDs(): 1 ms
Time needed for GetValueOfAllDs(): 0 ms```
```SpeedTest: is it really a good idea to spend a separate object for each neuron?
We will simulate N=1000000 neurons (smaller classes)

A+B
Time needed for GenerateSomeBs(): 1670 ms
Time needed for SetValueOfAllBs(): 42 ms
Time needed for GetValueOfAllBs(): 38 ms

C with internal D values
Time needed for GenerateValuePlaceholdersForDs(): 3 ms
Time needed for SetValueOfAllDs(): 4 ms
Time needed for GetValueOfAllDs(): 5 ms```

## Conclusion

When using 1.000.000 neurons we need 38 ms to access each of the B class (neuron) values once. But if each of these neurons has 100 connections to other neurons, we need to access the output values 1.000.000 * 100 times to compute the inputs for all neurons (weighted sum of inputs), leading to 38 ms * 100 = 3.8s compared to 5 ms * 100 = 0.5s.

So if time is critical (which is mostly the case…), try to avoid modeling each neuron by a separate class.