5. Tensorflow, PyTorch and Keras
This chapter describes the code implementation of neural networks for XOR gates and binary-to-decimal conversion using TensorFlow, Keras and PyTorch frameworks.
5.1. TensorFlow Version
5.2. PyTorch Version
5.3. Keras Version
5.1. TensorFlow Version
5.1.1. XOR-gate
Complete Python code is available at: XOR-gate-tf.py
[1] Prepare the inputs and the ground-truth labels.
# Inputs
X = np.array([[0, 0], [0, 1], [1, 0], [1, 1]])
# The ground-truth labels
Y = np.array([[0], [1], [1], [0]])
[2] Create model.
In this implementation, the XOR gate model is encapsulated within a custom class structured with two dense layers.
Unlike the full-scratch XOR gates we have seen so far, TensorFlow simplifies model development by requiring only the explicit implementation of the forward propagation process. The backpropagation procedure is handled by TensorFlow’s automatic differentiation engine.
Within the designed class, the call() method serves as the central hub for defining the forward propagation logic.
The loss function and optimizer are employed using the MeanSquaredError() and Adam modules, respectively, both provided by TensorFlow.
input_dim = 2
hidden_dim = 3
output_dim = 1
class SimpleNN(tf.keras.Model):
def __init__(self, input_dim, hidden_dim, output_dim):
super(SimpleNN, self).__init__()
# Hidden layer
self.f = tf.keras.layers.Dense(
units=hidden_dim,
input_dim=input_dim,
activation="sigmoid",
)
# Output layer
self.g = tf.keras.layers.Dense(
units=output_dim,
activation="sigmoid",
)
# Forward Propagation
def call(self, x, training=None):
h = self.f(x)
y = self.g(h)
return y
model = SimpleNN(input_dim, hidden_dim, output_dim)
loss_func = losses.MeanSquaredError()
optimizer = optimizers.Adam(learning_rate=0.1)
[3] Training.
The train() function consists of three procedures:
-
Forward propagation: This procedure computes the model’s output predictions for a given input by invoking the call() method of the model.
-
Backpropagation: This procedure automatically computes the gradients of the loss function with respect to the model’s parameters (weights and biases) by the tf.GradientTape() API, which is for automatic differentiation.
-
Weights and Bias Update: this procedure adjusts the model’s parameters to nudge its predictions towards the desired targets using the computed gradients by the previous procedure.
#
# Training function
#
@tf.function
def train(X, Y):
with tf.GradientTape() as tape:
# Forward Propagation
y = model(X)
# Back Propagation
loss = loss_func(Y, y)
grad = tape.gradient(loss, model.trainable_variables)
# Weights and Bias Update
optimizer.apply_gradients(zip(grad, model.trainable_variables))
return loss
n_epochs = 3000
# Training loop
for epoch in range(1, n_epochs + 1):
loss = train(X, Y)
[4] Test.
Run the following command to test the model:
$ python XOR-gate-tf.py
_________________________________________________________________
Layer (type) Output Shape Param #
=================================================================
dense (Dense) multiple 9
dense_1 (Dense) multiple 4
=================================================================
Total params: 13
Trainable params: 13
Non-trainable params: 0
_________________________________________________________________
... snip ...
------------------------
x0 XOR x1 => result
========================
0 XOR 0 => 0.0029
0 XOR 1 => 0.9993
1 XOR 0 => 0.9959
1 XOR 1 => 0.0032
========================
5.1.2. Binary-to-Decimal Conversion
Complete Python code is available at: binary-to-decimal-conversion-tf.py
[1] Prepare the inputs and the ground-truth labels.
# ========================================
# Create datasets
# ========================================
# Inputs
X = np.array([[0, 0], [0, 1], [1, 0], [1, 1]])
# The ground-truth labels
Y = np.array([[0], [1], [2], [3]])
# Convert into One-Hot vector
Y = keras.utils.to_categorical(Y, 4)
"""
Y = [[1. 0. 0. 0.]
[0. 1. 0. 0.]
[0. 0. 1. 0.]
[0. 0. 0. 1.]]
"""
[2] Create model.
The activation function of the output layer is the softmax function, and the loss function is the CategoricalCrossentropy().
input_dim = 2
hidden_dim = 3
output_dim = 4
class SimpleNN(tf.keras.Model):
def __init__(self, input_dim, hidden_dim, output_dim):
super(SimpleNN, self).__init__()
# Hidden layer
self.f = tf.keras.layers.Dense(
units=hidden_dim,
input_dim=input_dim,
activation="sigmoid",
)
# Output layer
self.g = tf.keras.layers.Dense(
units=output_dim,
activation="softmax",
)
# Forward Propagation
def call(self, x, training=None):
h = self.f(x)
y = self.g(h)
return y
model = SimpleNN(input_dim, hidden_dim, output_dim)
loss_func = losses.CategoricalCrossentropy()
optimizer = optimizers.Adam(learning_rate=0.1)
[3] Training.
The train function is identical to the function introduced in the previous subsection.
@tf.function
def train(X, Y):
with tf.GradientTape() as tape:
# Forward Propagation
y = model(X)
# Back Propagation
loss = loss_func(Y, y)
grad = tape.gradient(loss, model.trainable_variables)
# Weights and Bias Update
optimizer.apply_gradients(zip(grad, model.trainable_variables))
return loss
[4] Test.
Run the following command to test the model:
$ python binary-to-decimal-conversion-tf.py
Model: "simple_nn"
_________________________________________________________________
Layer (type) Output Shape Param #
=================================================================
dense (Dense) multiple 9
dense_1 (Dense) multiple 16
=================================================================
Total params: 25
Trainable params: 25
Non-trainable params: 0
epoch: 1 / 800 Loss = 1.471462
epoch: 100 / 800 Loss = 0.018342
epoch: 200 / 800 Loss = 0.006469
epoch: 300 / 800 Loss = 0.003444
epoch: 400 / 800 Loss = 0.002176
epoch: 500 / 800 Loss = 0.001514
epoch: 600 / 800 Loss = 0.001121
epoch: 700 / 800 Loss = 0.000866
epoch: 800 / 800 Loss = 0.000691
-------------------------------------------
(x0 x1) => prob(0) prob(1) prob(2) prob(3)
===========================================
(0 0) => 0.9995 0.0003 0.0002 0.0000
(0 1) => 0.0003 0.9994 0.0000 0.0004
(1 0) => 0.0003 0.0001 0.9992 0.0004
(1 1) => 0.0000 0.0005 0.0003 0.9992
===========================================
5.2. PyTorch Version
The PyTorch versions are very similar to the TensorFlow version, so they will be shown only the source code without explanation in the following.
5.2.1. XOR-gate
Complete Python code is available at: XOR-gate-pt.py
[1] Prepare the inputs and the ground-truth labels.
# Inputs
X = np.array([[0, 0], [0, 1], [1, 0], [1, 1]])
# The ground-truth labels
Y = np.array([[0], [1], [1], [0]])
# convert numpy array to tensor
X = torch.from_numpy(X).float()
Y = torch.from_numpy(Y).float()
[2] Create model.
input_nodes = 2
hidden_nodes = 3
output_nodes = 1
class SimpleNN(nn.Module):
def __init__(self, input_nodes, hidden_nodes, output_nodes):
super(SimpleNN, self).__init__()
self.fc1 = nn.Linear(input_nodes, hidden_nodes)
self.fc2 = nn.Linear(hidden_nodes, output_nodes)
# Forward Propagation
def forward(self, x):
x = self.fc1(x)
x = F.sigmoid(x)
x = self.fc2(x)
x = F.sigmoid(x)
return x
model = SimpleNN(input_nodes, hidden_nodes, output_nodes).to(device)
summary(model=SimpleNN(input_nodes, hidden_nodes, output_nodes), input_size=X.shape)
[3] Training.
n_epochs = 10000
# set training mode
model.train()
# set training parameters
optimizer = torch.optim.Adam(model.parameters(), lr=0.001)
criterion = nn.MSELoss()
history_loss = []
#
# Training loop
#
for epoch in range(1, n_epochs + 1):
train_loss = 0.0
optimizer.zero_grad()
# forward propagation
outputs = model(torch.Tensor(X).to(device))
# compute loss
loss = criterion(outputs, torch.Tensor(Y).to(device))
# Weights and Bias Update
loss.backward()
optimizer.step()
# save loss of this epoch
train_loss += loss.item()
history_loss.append(train_loss)
if epoch % 100 == 0 or epoch == 1:
print("epoch: {} / {} Loss = {:.4f}".format(epoch, n_epochs, loss))
[4] Test.
Run the following command to test the model:
$ python XOR-gate-pt.py
==========================================================================================
Layer (type:depth-idx) Output Shape Param #
==========================================================================================
SimpleNN [4, 1] --
├─Linear: 1-1 [4, 3] 9
├─Linear: 1-2 [4, 1] 4
==========================================================================================
Total params: 13
Trainable params: 13
Non-trainable params: 0
Total mult-adds (Units.MEGABYTES): 0.00
==========================================================================================
Input size (MB): 0.00
Forward/backward pass size (MB): 0.00
Params size (MB): 0.00
Estimated Total Size (MB): 0.00
==========================================================================================
epoch: 1 / 10000 Loss = 0.2714
epoch: 100 / 10000 Loss = 0.2581
... snip ...
------------------------
x0 XOR x1 => result
========================
0 XOR 0 => 0.0091
0 XOR 1 => 0.9701
1 XOR 0 => 0.9723
1 XOR 1 => 0.0397
========================
5.2.2. Binary-to-Decimal Conversion
Complete Python code is available at: binary-to-decimal-conversion-pt.py
[1] Prepare the inputs and the ground-truth labels.
# ========================================
# Create datasets
# ========================================
# Inputs
X = np.array([[0, 0], [0, 1], [1, 0], [1, 1]])
# The ground-truth labels
Y = np.array([[0], [1], [2], [3]])
# Convert into One-Hot vector
Y = keras.utils.to_categorical(Y, 4)
"""
Y = [[1. 0. 0. 0.]
[0. 1. 0. 0.]
[0. 0. 1. 0.]
[0. 0. 0. 1.]]
"""
# convert numpy array to tensor
X = torch.from_numpy(X).float()
Y = torch.from_numpy(Y).float()
[2] Create model.
input_nodes = 2
hidden_nodes = 3
output_nodes = 4
class SimpleNN(nn.Module):
def __init__(self, input_nodes, hidden_nodes, output_nodes):
super(SimpleNN, self).__init__()
self.fc1 = nn.Linear(input_nodes, hidden_nodes)
self.fc2 = nn.Linear(hidden_nodes, output_nodes)
self.softmax = nn.Softmax(dim=1)
# Forward Propagation
def forward(self, x):
x = self.fc1(x)
x = F.sigmoid(x)
x = self.fc2(x)
x = self.softmax(x)
return x
model = SimpleNN(input_nodes, hidden_nodes, output_nodes).to(device)
summary(model=SimpleNN(input_nodes, hidden_nodes, output_nodes), input_size=X.shape)
[3] Training.
n_epochs = 10000
# set training mode
model.train()
# set training parameters
optimizer = torch.optim.Adam(model.parameters(), lr=0.001)
criterion = nn.CrossEntropyLoss()
history_loss = []
#
# Training loop
#
for epoch in range(1, n_epochs + 1):
train_loss = 0.0
optimizer.zero_grad()
# forward propagation
outputs = model(torch.Tensor(X).to(device))
# compute loss
loss = criterion(outputs, torch.Tensor(Y).to(device))
# Weights and Bias Update
loss.backward()
optimizer.step()
# save loss of this epoch
train_loss += loss.item()
history_loss.append(train_loss)
if epoch % 100 == 0 or epoch == 1:
print("epoch: {} / {} Loss = {:.4f}".format(epoch, n_epochs, loss))
[4] Test.
Run the following command to test the model:
$ python binary-to-decimal-conversion-pt.py
==========================================================================================
Layer (type:depth-idx) Output Shape Param #
==========================================================================================
SimpleNN [4, 4] --
├─Linear: 1-1 [4, 3] 9
├─Linear: 1-2 [4, 4] 16
├─Softmax: 1-3 [4, 4] --
==========================================================================================
Total params: 25
Trainable params: 25
Non-trainable params: 0
Total mult-adds (Units.MEGABYTES): 0.00
==========================================================================================
Input size (MB): 0.00
Forward/backward pass size (MB): 0.00
Params size (MB): 0.00
Estimated Total Size (MB): 0.00
==========================================================================================
epoch: 1 / 10000 Loss = 1.3924
epoch: 100 / 10000 Loss = 1.3835
... snip ...
-------------------------------------------
(x0 x1) => prob(0) prob(1) prob(2) prob(3)
===========================================
(0 0) => 0.9979 0.0015 0.0006 0.0000
(0 1) => 0.0014 0.9977 0.0000 0.0008
(1 0) => 0.0007 0.0000 0.9983 0.0010
(1 1) => 0.0000 0.0008 0.0011 0.9982
===========================================
5.3. Keras Version
5.3.1. XOR-gate
Complete Python code is available at: XOR-gate-keras.py
[1] Prepare the inputs and the ground-truth labels.
# Inputs
X = np.array([[0, 0], [0, 1], [1, 0], [1, 1]])
# The ground-truth labels
Y = np.array([[0], [1], [1], [0]])
# Convert row vector into column vector.
X = X.reshape(4,2,1)
[2] Create model.
Keras offers various approaches to building models. In this document, we’ll focus on the simplest method: the Sequential API.
With Sequential, building a model involves adding layers and activation functions one after another using the add() method.
Once all the layers are added, we define the loss function and optimizer using the compile() method to prepare our model for training.
input_dim = 2
hidden_dim = 3
output_dim = 1
model = Sequential()
model.add(Dense(hidden_dim, input_dim=input_dim))
model.add(Activation("sigmoid"))
model.add(Dense(output_dim))
model.add(Activation("sigmoid"))
optimizer = Adam(learning_rate=0.1)
model.compile(loss="mean_squared_error", optimizer=optimizer)
[3] Training.
In keras, the fit method trains the model.
n_epochs = 800
# Run through the data `epochs` times
history = model.fit(X, Y, epochs=n_epochs, batch_size=1, verbose=1)
[4] Test.
Run the following command to test the model:
$ python XOR-gate-keras.py
Model: "sequential"
_________________________________________________________________
Layer (type) Output Shape Param #
=================================================================
dense (Dense) (None, 3) 9
activation (Activation) (None, 3) 0
dense_1 (Dense) (None, 1) 4
activation_1 (Activation) (None, 1) 0
=================================================================
Total params: 13
Trainable params: 13
Non-trainable params: 0
_________________________________________________________________
... snip ...
------------------------
x0 XOR x1 => result
========================
0 XOR 0 => 0.0080
0 XOR 1 => 0.9909
1 XOR 0 => 0.9909
1 XOR 1 => 0.0114
========================
5.3.2. Binary-to-Decimal Conversion
Complete Python code is available at: binary-to-decimal-conversion-keras.py
[1] Prepare the inputs and the ground-truth labels.
# ========================================
# Create datasets
# ========================================
# Inputs
X = np.array([[0, 0], [0, 1], [1, 0], [1, 1]])
# The ground-truth labels
Y = np.array([[0], [1], [2], [3]])
# Convert into One-Hot vector
Y = keras.utils.to_categorical(Y, 4)
"""
Y = [[1. 0. 0. 0.]
[0. 1. 0. 0.]
[0. 0. 1. 0.]
[0. 0. 0. 1.]]
"""
# Convert row vectors into column vectors.
X = X.reshape(4, 2, 1)
Y = Y.reshape(4, 4, 1)
[2] Create model.
The activation function of the output layer is the softmax function, and the loss function is the categorical_crossentropy.
input_dim = 2
hidden_dim = 3
output_dim = 4
model = Sequential()
model.add(Dense(hidden_dim, input_dim=input_dim))
model.add(Activation("sigmoid"))
model.add(Dense(output_dim))
model.add(Activation("softmax"))
optimizer = Adam(learning_rate=0.1)
model.compile(loss="categorical_crossentropy", optimizer=optimizer)
[3] Training.
The train function is identical to the function introduced in the previous subsection.
n_epochs = 300
# Run through the data `epochs` times
history = model.fit(X, Y, epochs=n_epochs, batch_size=1, verbose=1)
[4] Test.
Run the following command to test the model:
$ python binary-to-decimal-conversion-keras.py
Model: "sequential"
_________________________________________________________________
Layer (type) Output Shape Param #
=================================================================
dense (Dense) (None, 3) 9
activation (Activation) (None, 3) 0
dense_1 (Dense) (None, 4) 16
activation_1 (Activation) (None, 4) 0
=================================================================
Total params: 25
Trainable params: 25
Non-trainable params: 0
... snip ...
-------------------------------------------
(x0 x1) => prob(0) prob(1) prob(2) prob(3)
===========================================
(0 0) => 0.9977 0.0013 0.0009 0.0000
(0 1) => 0.0021 0.9974 0.0000 0.0005
(1 0) => 0.0011 0.0000 0.9976 0.0012
(1 1) => 0.0001 0.0008 0.0017 0.9974
===========================================