Adding a new layer

This section describes how to create a new layer incorporated with tiny-dnn. Let’s create simple fully-connected layer for example.

Note: This document is old, and doesn’t fit to current tiny-dnn. We need to update.

Declare class

Let’s define your layer. All of layer operations in tiny-dnn are derived from layer class.

// calculate y = Wx + b 
class fully_connected : public layer {
public:
    //todo 
};

the layer class prepares input/output data for your calculation. To do this, you must tell layer‘s constructor what you need.

layer::layer(const std::vector<vector_type>& in_type,
             const std::vector<vector_type>& out_type)

For example, consider calculating fully-connected operation: y = Wx + b. In this calculation, Input (right hand of this eq) is data x, weight W and bias b. Output is, of course y. So it’s constructor should pass {data,weight,bias} as input and {data} as output.

// calculate y = Wx + b
class fully_connected : public layer {
public:
    fully_connected(size_t x_size, size_t y_size)
    :layer({vector_type::data,vector_type::weight,vector_type::bias}, // x, W and b
           {vector_type::data}),
     x_size_(x_size),
     y_size_(y_size)
    {}

private:
    size_t x_size_; // number of input elements
    size_t y_size_; // number of output elements
};

the vector_type::data is some input data passed by previous layer, or output data consumed by next layer. vector_type::weight and vector_type::bias represents trainable parameters. The only difference between them is default initialization method: weight is initialized by random value, and bias is initialized by zero-vector (this behaviour can be changed by network::weight_init method). If you need another vector to calculate, vector_type::aux can be used.

Implement virtual method

There are 5 methods to implement. In most case 3 methods are written as one-liner and remaining 2 are essential:

  • layer_type
  • in_shape
  • out_shape
  • forward_propagation
  • back_propagation

layer_type

Returns name of your layer.

std::string layer_type() const override {
    return "fully-connected";
}

in_shape/out_shape

Returns input/output shapes corresponding to inputs/outputs. Shapes is defined by [width, height, depth]. For example fully-connected layer treats input data as 1-dimensional array, so its shape is [N, 1, 1].

std::vector<shape3d> in_shape() const override {
    // return input shapes
    // order of shapes must be equal to argument of layer constructor
    return { shape3d(x_size_, 1, 1), // x
             shape3d(x_size_, y_size_, 1), // W
             shape3d(y_size_, 1, 1) }; // b
}

std::vector<shape3d> out_shape() const override {
    return { shape3d(y_size_, 1, 1) }; // y
}

forward_propagation

Execute forward calculation in this method.

void forward_propagation(size_t worker_index,
                         const std::vector<vec_t*>& in_data,
                         std::vector<vec_t*>& out_data) override {
    const vec_t& x = *in_data[0]; // it's size is in_shapes()[0] (=[x_size_,1,1])
    const vec_t& W = *in_data[1];
    const vec_t& b = *in_data[2];
    vec_t& y = *out_data[0];

    std::fill(y.begin(), y.end(), 0.0);

    // y = Wx+b
    for (size_t r = 0; r < y_size_; r++) {
        for (size_t c = 0; c < x_size_; c++)
            y[r] += W[r*x_size_+c]*x[c];
        y[r] += b[r];
    }
}

the in_data/out_data is array of input/output data, which is ordered as you told layer‘s constructor. The implementation is simple and straightforward, isn’t it?

worker_index is task-id. It is always zero if you run tiny-dnn in single thread. If some class member variables are updated while forward/backward pass, these members must be treated carefully to avoid data race. If their variables are task-independent, your class can hold just N variables and access them by worker_index (you can see this example in max_pooling_layer.h). input/output data managed by layer base class is task-local, so in_data/out_data is treated as if it is running on single thread.

back propagation

void back_propagation(size_t                index,
                      const std::vector<vec_t*>& in_data,
                      const std::vector<vec_t*>& out_data,
                      std::vector<vec_t*>&       out_grad,
                      std::vector<vec_t*>&       in_grad) override {
    const vec_t& curr_delta = *out_grad[0]; // dE/dy (already calculated in next layer)
    const vec_t& x          = *in_data[0];
    const vec_t& W          = *in_data[1];
    vec_t&       prev_delta = *in_grad[0]; // dE/dx (passed into previous layer)
    vec_t&       dW         = *in_grad[1]; // dE/dW
    vec_t&       db         = *in_grad[2]; // dE/db

    // propagate delta to prev-layer
    for (size_t c = 0; c < x_size_; c++)
        for (size_t r = 0; r < y_size_; r++)
            prev_delta[c] += curr_delta[r] * W[r*x_size_+c];

    // accumulate weight difference
    for (size_t r = 0; r < y_size_; r++)
        for (size_t c = 0; c < x_size_; c++)
            dW[r*x_size_+c] += curr_delta[r] * x[c];

    // accumulate bias difference
    for (size_t r = 0; r < y_size_; r++)
        db[r] += curr_delta[r];
}

the in_data/out_data are just same as forward_propagation, and in_grad/out_grad are its gradient. Order of gradient values are same as in_data/out_data.

Note: Gradient of weight/bias are collected over mini-batch and zero-cleared automatically, so you can’t use assignment operator to these elements (layer will forget previous training data in mini-batch!). like this example, use operator += instead. Gradient of data (prev_delta in the example) may already have meaningful values if two or more layers share this data, so you can’t overwrite this value too.

Verify backward calculation

It is always a good idea to check if your backward implementation is correct. network class provides gradient_check method for this purpose. Let’s add following lines to test/test_network.h and execute test.

TEST(network, gradient_check_fully_connected) {
    network<sequential> net;
    net << fully_connected(2, 3)
        << fully_connected(3, 2);

    std::vector<tensor_t> in{ tensor_t{ 1, { 0.5, 1.0 } } };
    std::vector<std::vector<label_t>> t = { std::vector<label_t>(1, {1}) };

    EXPECT_TRUE(net.gradient_check<mse>(in, t, 1e-4, GRAD_CHECK_ALL));
}

Congratulations! Now you can use this new class as a tiny-dnn layer.