5. Tensorflow, PyTorch and Keras

This chapter describes the code implementation of neural networks for XOR gates and binary-to-decimal conversion using TensorFlow, Keras and PyTorch frameworks.

Chapter Contents

5.1. TensorFlow Version

5.1.1. XOR-gate

Complete Python code is available at: XOR-gate-tf.py

[1] Prepare the inputs and the ground-truth labels.

# Inputs
X = np.array([[0, 0], [0, 1], [1, 0], [1, 1]])

# The ground-truth labels
Y = np.array([[0], [1], [1], [0]])

[2] Create model.

In this implementation, the XOR gate model is encapsulated within a custom class structured with two dense layers.

Unlike the full-scratch XOR gates we have seen so far, TensorFlow simplifies model development by requiring only the explicit implementation of the forward propagation process. The backpropagation procedure is handled by TensorFlow’s automatic differentiation engine.

Within the designed class, the call() method serves as the central hub for defining the forward propagation logic.

The loss function and optimizer are employed using the MeanSquaredError() and Adam modules, respectively, both provided by TensorFlow.

input_dim = 2
hidden_dim = 3
output_dim = 1

class SimpleNN(tf.keras.Model):
    def __init__(self, input_dim, hidden_dim, output_dim):
        super(SimpleNN, self).__init__()

        # Hidden layer
        self.f = tf.keras.layers.Dense(
            units=hidden_dim,
            input_dim=input_dim,
            activation="sigmoid",
        )

        # Output layer
        self.g = tf.keras.layers.Dense(
            units=output_dim,
            activation="sigmoid",
        )

    # Forward Propagation
    def call(self, x, training=None):
        h = self.f(x)
        y = self.g(h)
        return y


model = SimpleNN(input_dim, hidden_dim, output_dim)

loss_func = losses.MeanSquaredError()
optimizer = optimizers.Adam(learning_rate=0.1)

[3] Training.

The train() function consists of three procedures:

  1. Forward propagation: This procedure computes the model’s output predictions for a given input by invoking the call() method of the model.

  2. Backpropagation: This procedure automatically computes the gradients of the loss function with respect to the model’s parameters (weights and biases) by the tf.GradientTape() API, which is for automatic differentiation.

  3. Weights and Bias Update: this procedure adjusts the model’s parameters to nudge its predictions towards the desired targets using the computed gradients by the previous procedure.

#
# Training function
#
@tf.function
def train(X, Y):
    with tf.GradientTape() as tape:
        # Forward Propagation
        y = model(X)

        # Back Propagation
        loss = loss_func(Y, y)
        grad = tape.gradient(loss, model.trainable_variables)

    # Weights and Bias Update
    optimizer.apply_gradients(zip(grad, model.trainable_variables))

    return loss


n_epochs = 3000

# Training loop
for epoch in range(1, n_epochs + 1):
    loss = train(X, Y)

[4] Test.

Run the following command to test the model:

$ python XOR-gate-tf.py
_________________________________________________________________
 Layer (type)                Output Shape              Param #
=================================================================
 dense (Dense)               multiple                  9

 dense_1 (Dense)             multiple                  4

=================================================================
Total params: 13
Trainable params: 13
Non-trainable params: 0
_________________________________________________________________

... snip ...

------------------------
x0 XOR x1 => result
========================
 0 XOR  0 => 0.0029
 0 XOR  1 => 0.9993
 1 XOR  0 => 0.9959
 1 XOR  1 => 0.0032
========================

5.1.2. Binary-to-Decimal Conversion

Complete Python code is available at: binary-to-decimal-conversion-tf.py

[1] Prepare the inputs and the ground-truth labels.

# ========================================
# Create datasets
# ========================================

# Inputs
X = np.array([[0, 0], [0, 1], [1, 0], [1, 1]])

# The ground-truth labels
Y = np.array([[0], [1], [2], [3]])


# Convert into One-Hot vector
Y = keras.utils.to_categorical(Y, 4)

"""
Y = [[1. 0. 0. 0.]
     [0. 1. 0. 0.]
     [0. 0. 1. 0.]
     [0. 0. 0. 1.]]
"""

[2] Create model.

The activation function of the output layer is the softmax function, and the loss function is the CategoricalCrossentropy().

input_dim = 2
hidden_dim = 3
output_dim = 4

class SimpleNN(tf.keras.Model):
    def __init__(self, input_dim, hidden_dim, output_dim):
        super(SimpleNN, self).__init__()

        # Hidden layer
        self.f = tf.keras.layers.Dense(
            units=hidden_dim,
            input_dim=input_dim,
            activation="sigmoid",
        )

        # Output layer
        self.g = tf.keras.layers.Dense(
            units=output_dim,
            activation="softmax",
        )

    # Forward Propagation
    def call(self, x, training=None):
        h = self.f(x)
        y = self.g(h)
        return y


model = SimpleNN(input_dim, hidden_dim, output_dim)

loss_func = losses.CategoricalCrossentropy()
optimizer = optimizers.Adam(learning_rate=0.1)

[3] Training.

The train function is identical to the function introduced in the previous subsection.

@tf.function
def train(X, Y):
    with tf.GradientTape() as tape:
        # Forward Propagation
        y = model(X)

        # Back Propagation
        loss = loss_func(Y, y)
        grad = tape.gradient(loss, model.trainable_variables)

    # Weights and Bias Update
    optimizer.apply_gradients(zip(grad, model.trainable_variables))

    return loss

[4] Test.

Run the following command to test the model:

$ python binary-to-decimal-conversion-tf.py
Model: "simple_nn"
_________________________________________________________________
 Layer (type)                Output Shape              Param #
=================================================================
 dense (Dense)               multiple                  9

 dense_1 (Dense)             multiple                  16

=================================================================
Total params: 25
Trainable params: 25
Non-trainable params: 0
epoch: 1 / 800  Loss = 1.471462
epoch: 100 / 800  Loss = 0.018342
epoch: 200 / 800  Loss = 0.006469
epoch: 300 / 800  Loss = 0.003444
epoch: 400 / 800  Loss = 0.002176
epoch: 500 / 800  Loss = 0.001514
epoch: 600 / 800  Loss = 0.001121
epoch: 700 / 800  Loss = 0.000866
epoch: 800 / 800  Loss = 0.000691
-------------------------------------------
(x0 x1) => prob(0) prob(1) prob(2) prob(3)
===========================================
(0   0) => 0.9995  0.0003  0.0002  0.0000
(0   1) => 0.0003  0.9994  0.0000  0.0004
(1   0) => 0.0003  0.0001  0.9992  0.0004
(1   1) => 0.0000  0.0005  0.0003  0.9992
===========================================

5.2. PyTorch Version

The PyTorch versions are very similar to the TensorFlow version, so they will be shown only the source code without explanation in the following.

5.2.1. XOR-gate

Complete Python code is available at: XOR-gate-pt.py

[1] Prepare the inputs and the ground-truth labels.

# Inputs
X = np.array([[0, 0], [0, 1], [1, 0], [1, 1]])

# The ground-truth labels
Y = np.array([[0], [1], [1], [0]])

# convert numpy array to tensor
X = torch.from_numpy(X).float()
Y = torch.from_numpy(Y).float()

[2] Create model.

input_nodes = 2
hidden_nodes = 3
output_nodes = 1

class SimpleNN(nn.Module):
    def __init__(self, input_nodes, hidden_nodes, output_nodes):
        super(SimpleNN, self).__init__()
        self.fc1 = nn.Linear(input_nodes, hidden_nodes)
        self.fc2 = nn.Linear(hidden_nodes, output_nodes)

    # Forward Propagation
    def forward(self, x):
        x = self.fc1(x)
        x = F.sigmoid(x)
        x = self.fc2(x)
        x = F.sigmoid(x)
        return x


model = SimpleNN(input_nodes, hidden_nodes, output_nodes).to(device)

summary(model=SimpleNN(input_nodes, hidden_nodes, output_nodes), input_size=X.shape)

[3] Training.

n_epochs = 10000

# set training mode
model.train()

# set training parameters
optimizer = torch.optim.Adam(model.parameters(), lr=0.001)
criterion = nn.MSELoss()

history_loss = []

#
# Training loop
#
for epoch in range(1, n_epochs + 1):
    train_loss = 0.0

    optimizer.zero_grad()

    # forward propagation
    outputs = model(torch.Tensor(X).to(device))

    # compute loss
    loss = criterion(outputs, torch.Tensor(Y).to(device))

    # Weights and Bias Update
    loss.backward()
    optimizer.step()

    # save loss of this epoch
    train_loss += loss.item()
    history_loss.append(train_loss)

    if epoch % 100 == 0 or epoch == 1:
        print("epoch: {} / {}  Loss = {:.4f}".format(epoch, n_epochs, loss))

[4] Test.

Run the following command to test the model:

$ python XOR-gate-pt.py
==========================================================================================
Layer (type:depth-idx)                   Output Shape              Param #
==========================================================================================
SimpleNN                                 [4, 1]                    --
├─Linear: 1-1                            [4, 3]                    9
├─Linear: 1-2                            [4, 1]                    4
==========================================================================================
Total params: 13
Trainable params: 13
Non-trainable params: 0
Total mult-adds (Units.MEGABYTES): 0.00
==========================================================================================
Input size (MB): 0.00
Forward/backward pass size (MB): 0.00
Params size (MB): 0.00
Estimated Total Size (MB): 0.00
==========================================================================================
epoch: 1 / 10000  Loss = 0.2714
epoch: 100 / 10000  Loss = 0.2581
... snip ...

------------------------
 x0 XOR x1 => result
========================
 0  XOR 0  => 0.0091
 0  XOR 1  => 0.9701
 1  XOR 0  => 0.9723
 1  XOR 1  => 0.0397
========================

5.2.2. Binary-to-Decimal Conversion

Complete Python code is available at: binary-to-decimal-conversion-pt.py

[1] Prepare the inputs and the ground-truth labels.

# ========================================
# Create datasets
# ========================================


# Inputs
X = np.array([[0, 0], [0, 1], [1, 0], [1, 1]])

# The ground-truth labels
Y = np.array([[0], [1], [2], [3]])

# Convert into One-Hot vector
Y = keras.utils.to_categorical(Y, 4)

"""
Y = [[1. 0. 0. 0.]
     [0. 1. 0. 0.]
     [0. 0. 1. 0.]
     [0. 0. 0. 1.]]
"""

# convert numpy array to tensor
X = torch.from_numpy(X).float()
Y = torch.from_numpy(Y).float()

[2] Create model.

input_nodes = 2
hidden_nodes = 3
output_nodes = 4


class SimpleNN(nn.Module):
    def __init__(self, input_nodes, hidden_nodes, output_nodes):
        super(SimpleNN, self).__init__()
        self.fc1 = nn.Linear(input_nodes, hidden_nodes)
        self.fc2 = nn.Linear(hidden_nodes, output_nodes)
        self.softmax = nn.Softmax(dim=1)

    # Forward Propagation
    def forward(self, x):
        x = self.fc1(x)
        x = F.sigmoid(x)
        x = self.fc2(x)
        x = self.softmax(x)
        return x

model = SimpleNN(input_nodes, hidden_nodes, output_nodes).to(device)

summary(model=SimpleNN(input_nodes, hidden_nodes, output_nodes), input_size=X.shape)

[3] Training.

n_epochs = 10000

# set training mode
model.train()

# set training parameters
optimizer = torch.optim.Adam(model.parameters(), lr=0.001)
criterion = nn.CrossEntropyLoss()

history_loss = []

#
# Training loop
#
for epoch in range(1, n_epochs + 1):
    train_loss = 0.0

    optimizer.zero_grad()

    # forward propagation
    outputs = model(torch.Tensor(X).to(device))

    # compute loss
    loss = criterion(outputs, torch.Tensor(Y).to(device))

    # Weights and Bias Update
    loss.backward()
    optimizer.step()

    # save loss of this epoch
    train_loss += loss.item()
    history_loss.append(train_loss)

    if epoch % 100 == 0 or epoch == 1:
        print("epoch: {} / {}  Loss = {:.4f}".format(epoch, n_epochs, loss))

[4] Test.

Run the following command to test the model:

$ python binary-to-decimal-conversion-pt.py
==========================================================================================
Layer (type:depth-idx)                   Output Shape              Param #
==========================================================================================
SimpleNN                                 [4, 4]                    --
├─Linear: 1-1                            [4, 3]                    9
├─Linear: 1-2                            [4, 4]                    16
├─Softmax: 1-3                           [4, 4]                    --
==========================================================================================
Total params: 25
Trainable params: 25
Non-trainable params: 0
Total mult-adds (Units.MEGABYTES): 0.00
==========================================================================================
Input size (MB): 0.00
Forward/backward pass size (MB): 0.00
Params size (MB): 0.00
Estimated Total Size (MB): 0.00
==========================================================================================
epoch: 1 / 10000  Loss = 1.3924
epoch: 100 / 10000  Loss = 1.3835
... snip ...
-------------------------------------------
(x0 x1) => prob(0) prob(1) prob(2) prob(3)
===========================================
(0   0) => 0.9979  0.0015  0.0006  0.0000
(0   1) => 0.0014  0.9977  0.0000  0.0008
(1   0) => 0.0007  0.0000  0.9983  0.0010
(1   1) => 0.0000  0.0008  0.0011  0.9982
===========================================

5.3. Keras Version

5.3.1. XOR-gate

Complete Python code is available at: XOR-gate-keras.py

[1] Prepare the inputs and the ground-truth labels.

# Inputs
X = np.array([[0, 0], [0, 1], [1, 0], [1, 1]])

# The ground-truth labels
Y = np.array([[0], [1], [1], [0]])

# Convert row vector into column vector.
X = X.reshape(4,2,1)

[2] Create model.

Keras offers various approaches to building models. In this document, we’ll focus on the simplest method: the Sequential API.

With Sequential, building a model involves adding layers and activation functions one after another using the add() method.

Once all the layers are added, we define the loss function and optimizer using the compile() method to prepare our model for training.

input_dim = 2
hidden_dim = 3
output_dim = 1


model = Sequential()
model.add(Dense(hidden_dim, input_dim=input_dim))
model.add(Activation("sigmoid"))
model.add(Dense(output_dim))
model.add(Activation("sigmoid"))

optimizer = Adam(learning_rate=0.1)

model.compile(loss="mean_squared_error", optimizer=optimizer)

[3] Training.

In keras, the fit method trains the model.

n_epochs = 800

# Run through the data `epochs` times
history = model.fit(X, Y, epochs=n_epochs, batch_size=1, verbose=1)

[4] Test.

Run the following command to test the model:

$ python XOR-gate-keras.py

Model: "sequential"
_________________________________________________________________
 Layer (type)                Output Shape              Param #
=================================================================
 dense (Dense)               (None, 3)                 9

 activation (Activation)     (None, 3)                 0

 dense_1 (Dense)             (None, 1)                 4

 activation_1 (Activation)   (None, 1)                 0

=================================================================
Total params: 13
Trainable params: 13
Non-trainable params: 0
_________________________________________________________________

... snip ...

------------------------
x0 XOR x1 => result
========================
 0 XOR  0 => 0.0080
 0 XOR  1 => 0.9909
 1 XOR  0 => 0.9909
 1 XOR  1 => 0.0114
========================

5.3.2. Binary-to-Decimal Conversion

Complete Python code is available at: binary-to-decimal-conversion-keras.py

[1] Prepare the inputs and the ground-truth labels.

# ========================================
# Create datasets
# ========================================

# Inputs
X = np.array([[0, 0], [0, 1], [1, 0], [1, 1]])

# The ground-truth labels
Y = np.array([[0], [1], [2], [3]])

# Convert into One-Hot vector
Y = keras.utils.to_categorical(Y, 4)

"""
Y = [[1. 0. 0. 0.]
     [0. 1. 0. 0.]
     [0. 0. 1. 0.]
     [0. 0. 0. 1.]]
"""

# Convert row vectors into column vectors.
X = X.reshape(4, 2, 1)
Y = Y.reshape(4, 4, 1)

[2] Create model.

The activation function of the output layer is the softmax function, and the loss function is the categorical_crossentropy.

input_dim = 2
hidden_dim = 3
output_dim = 4

model = Sequential()
model.add(Dense(hidden_dim, input_dim=input_dim))
model.add(Activation("sigmoid"))
model.add(Dense(output_dim))
model.add(Activation("softmax"))

optimizer = Adam(learning_rate=0.1)

model.compile(loss="categorical_crossentropy", optimizer=optimizer)

[3] Training.

The train function is identical to the function introduced in the previous subsection.

n_epochs = 300

# Run through the data `epochs` times
history = model.fit(X, Y, epochs=n_epochs, batch_size=1, verbose=1)

[4] Test.

Run the following command to test the model:

$ python binary-to-decimal-conversion-keras.py

Model: "sequential"
_________________________________________________________________
 Layer (type)                Output Shape              Param #
=================================================================
 dense (Dense)               (None, 3)                 9

 activation (Activation)     (None, 3)                 0

 dense_1 (Dense)             (None, 4)                 16

 activation_1 (Activation)   (None, 4)                 0

=================================================================
Total params: 25
Trainable params: 25
Non-trainable params: 0

... snip ...

-------------------------------------------
(x0 x1) => prob(0) prob(1) prob(2) prob(3)
===========================================
(0   0) => 0.9977  0.0013  0.0009  0.0000
(0   1) => 0.0021  0.9974  0.0000  0.0005
(1   0) => 0.0011  0.0000  0.9976  0.0012
(1   1) => 0.0001  0.0008  0.0017  0.9974
===========================================