2.4. Implementing XOR-gate from Scratch

This section demonstrates how to implement an XOR gate with neural network from scratch.

Complete Python code is available at: XOR-gate.py

[1] Prepare Inputs and Labels.

# Inputs
X = np.array([[0, 0], [0, 1], [1, 0], [1, 1]])

# The ground-truth labels
Y = np.array([[0], [1], [1], [0]])

# Convert row vectors into column vectors.
X = X.reshape(4, 2, 1)
Y = Y.reshape(4, 1, 1)

[2] Create model

We create two matrices: $W$ and $U$, and two vectors: $b$ and $c$.

input_nodes = 2
hidden_nodes = 3
output_nodes = 1

#
# Initialize random weights and bias
#
W = np.random.uniform(size=(hidden_nodes, input_nodes))
b = np.random.uniform(size=(hidden_nodes, 1))
U = np.random.uniform(size=(output_nodes, hidden_nodes))
c = np.random.uniform(size=(output_nodes, 1))

[3] Training

The training loop involves forward propagation and backpropagation:

Forward propagation: Outputs the output $y$ for each input $X[i]$.
Backpropagation: Computes gradients ($dL, db, dW, dc, dU$) and updates weights ($b,W,c,U$) using gradient descent algorithm, which is explained in Appendix 2.3.

n_epochs = 15000  # Epochs
lr = 0.1  # Learning rate

#
# Training loop
#
for epoch in range(1, n_epochs + 1):

    loss = 0.0

    for i in range(0, len(Y)):
        #
        # Forward Propagation
        #
        h_h = np.dot(W, X[i]) + b
        h = sigmoid(h_h)
        y_h = np.dot(U, h) + c
        y = sigmoid(y_h)

        #
        # Back Propagation
        #
        loss += np.sum((y - Y[i]) ** 2 / 2) # for measuring the training progress
        dL = (y - Y[i])

        # Computing the gradients
        db = np.dot(U.T, dL * deriv_sigmoid(y_h)) * deriv_sigmoid(h_h)
        dW = np.dot(db, X[i].T)
        dc = dL * deriv_sigmoid(y_h)
        dU = np.dot(dc, h.T)

        # Updating Weights and Biases using gradient descent.
        c -= lr * dc
        U -= lr * dU
        b -= lr * db
        W -= lr * dW

[4] Test

Run the following command to test the model:

$ python XOR-gate.py
-----------------------------------------------------------------
 Layer (type)                Output Shape              Param #
=================================================================
 dense (Dense)               (None,  3)                    9
 dense_1 (Dense)             (None,  1)                    4
=================================================================
Total params: 13

epoch: 1 / 15000  Loss = 0.608350
epoch: 1000 / 15000  Loss = 0.497935
epoch: 2000 / 15000  Loss = 0.446343
epoch: 3000 / 15000  Loss = 0.338422
epoch: 4000 / 15000  Loss = 0.294963
epoch: 5000 / 15000  Loss = 0.279844
epoch: 6000 / 15000  Loss = 0.272833
epoch: 7000 / 15000  Loss = 0.268903
epoch: 8000 / 15000  Loss = 0.266423
epoch: 9000 / 15000  Loss = 0.264728
epoch: 10000 / 15000  Loss = 0.263502
epoch: 11000 / 15000  Loss = 0.262577
epoch: 12000 / 15000  Loss = 0.261856
epoch: 13000 / 15000  Loss = 0.261279
epoch: 14000 / 15000  Loss = 0.260807
epoch: 15000 / 15000  Loss = 0.260415
------------------------
x0 XOR x1 => result
========================
 0 XOR  0 => 0.0400
 0 XOR  1 => 0.9671
 1 XOR  0 => 0.9671
 1 XOR  1 => 0.0330
========================

The model output indicates that it has learned the XOR gate functionality.