Neural Network from Scratch Part 2: Building an MLP in Python (No Frameworks)

Asher Best • February 21, 2026

Neural Network from Scratch Part 2: Building an MLP in Python (No Frameworks)

In this post, I will apply what I learned in part 1 to build a Multilayer Perceptron (MLP) in Python without Frameworks. If you haven’t yet, I highly suggest you check out part 1 which covers the fundamentals of neural networks and how they work. This post assumes you have a base level understanding of both neural networks and Python. The source code is available on GitHub and I encourage you to check that out after you read this post.

Applying the Foundations of Neural Networks to Build an MLP

Image Preprocessing and Dataset Loading

Let’s translate information into application. But first, I need to mention a couple aspects of my network that are prerequisites to what I aimed to accomplish. The first is image preprocessing. The images that I am running through the neural network need to adhere to consistent attributes. Otherwise, my desired results might be skewed. I decided that each image should be converted to 64x64 pixels and grayscale for processing. Through preprocessing, we can convert each pixel of the image to a grayscale value between 0 and 1 (0 being black, 1 being white) and feed each value through an input neuron in the network. Since each image will be preprocessed to 64x64 pixels, this means I will have 4,096 neurons in the input layer.

./mlp/preprocessor.py

1	import cv2
2	import numpy as np
3	
4	class ImagePreprocessor:
5		def __init__(self, size=(64, 64), color=cv2.COLOR_BGR2GRAY):
6			self.size = size
7			self.color = color
8		
9		# resize image, covert color, and reshape
10	  	def preprocess(self, img_path):
11			image = cv2.imread(img_path)
12			image = cv2.resize(image, self.size)
13			image = cv2.cvtColor(image, self.color)
14			image = image / 255.0
15				
16	    	image = image.reshape(self.size[0], self.size[1], 1)
17	    	return image

Now that I was able to preprocess images, I needed a class that would let me read images locally, preprocess each image, flatten the data, and label the data as to indicate if it was a galaxy or not. You might be asking, why not use pre-compiled image datasets and Python packages that simplify image transformations? I wanted to attempt building this network from the ground up on the first go around. I will be doing a part 3 blog post about Convolutional Neural Networks [CNN] where I use PyTorch and pre-compiled datasets to improve my galaxy detector since MLPs are not ideal for image recognition. Stay tuned for that.

./mlp/ds_loader.py

1	import os
2	import numpy as np
3	
4	class DatasetLoader:
5	   def __init__(self, galaxy_dir, non_galaxy_dir, preprocessor, flatten=False):
6	      self.galaxy_dir = galaxy_dir
7	      self.non_galaxy_dir = non_galaxy_dir
8	      self.preprocessor = preprocessor
9	      self.flatten = flatten
10	      self.data = []
11	
12	   # loop through directory
13	   # preprocess each image
14	   # append tuple data
15	   def append_data(self, dir, label):
16	      with os.scandir(dir) as entries:
17	         for entry in entries:
18	            if entry.is_file():
19		            img_data = self.preprocessor.preprocess(entry.path)
20		            if self.flatten:
21		               img_data = img_data.flatten()
22		            self.data.append((img_data, label))
23	                  
24	   # append data for each directory
25	   # shuffle and return data
26	   def load(self):
27	      self.append_data(self.galaxy_dir, 1)
28	      self.append_data(self.non_galaxy_dir, 0)
29	      np.random.shuffle(self.data)
30	      return self.data

Initializing Network Parameters

We have a means to preprocess images and load a dataset locally. It is time to define the network class. I import the necessary modules needed for later parts in the code, initialize the layer parameters, and initialize random weights and biases with the NumPy package. The weights and biases are randomly generated based on the layers provided. In the case of my network, the layers parameter would be [4096, 256, 128, 1] with the first index being the input layer, the last index being the output layer, and everything between being the hidden layers.

./mlp/network.py

1	import numpy as np
2	import time
3	from ds_loader import DatasetLoader
4	from preprocessor import ImagePreprocessor
5	
6	class Network:
7	   def __init__(self, layers):
8	      self.layers = layers
9	      self.num_layers = len(self.layers)
10	      self.weights = [np.random.randn(y, x) for x, y in zip(layers[:-1], layers[1:])]
11	      self.biases = [np.random.randn(y, 1) for y in layers[1:]]

Defining Forward Pass and Sigmoid function

The forward pass and sigmoid functions are next. The methods presented below will enable our network to calculate each neuron activation and pass these values through to each layer in the network until finally reaching the output. I first reshape the input x and store it in a list called activations. For each weight and bias, I take the dot product of the weights and activations plus the bias to derive my z value and calculate the sigmoid of the value. Both values are stored in their respective lists.

./mlp/network.py

1	# ...previous code:
2	# imports and Network class
3	
4		def sigmoid(self, z):
5		  return 1.0 / (1.0 + np.exp(-z))
6		
7		# iterate over each layer
8		# calculate the activation for each neuron
9		# return all activations and the weighted inputs (zs)
10		def forward_pass(self, x):
11		  activation = x.reshape(-1, 1)
12		  activations = [activation]
13		  
14		  zs = []
15		
16		  for w, b in zip(self.weights, self.biases):
17		     z = np.dot(w, activation) + b
18		     zs.append(z)
19		     activation = self.sigmoid(z)
20		     activations.append(activation)
21		
22		  return activations, zs

Backpropagation and Mini-batch Stochastic Gradient Descent (SGD)

Now that I have defined methods for both forward pass and sigmoid, I can write the code for backpropagation and mini-batch stochastic gradient descent. The method below initializes a list with the same shape for both the weights and biases, respectively. The input (x) is passed to the forward pass method to return the activations and the z values. The delta for the derived output is set for the last bias in the nabla_b list and delta is multiplied by the second to last set of activations for the last index in the nabla_w list. For each layer starting at range 2, we take advantage of negative indices in Python to move backwards through our neural network to calculate the sigmoid prime, the delta, and set the respective changes in both the nabla_b and nabla_w lists. The sigmoid prime method is included below as well and serves to take the derivative of our sigmoid calculation shown previously.

./mlp/network.py

1	# ...previous code:
2	# imports and Network class
3	# sigmoid and forward_pass functions
4	   
5	   def sigmoid_prime(self, z):
6		   return self.sigmoid(z) * (1 - self.sigmoid(z))
7	   
8	   # output error
9	   # backrop through hidden layers
10	   def backpropagation(self, x, y):
11	      nabla_w = [np.zeros(w.shape) for w in self.weights]
12	      nabla_b = [np.zeros(b.shape) for b in self.biases]
13	
14	      activations, zs = self.forward_pass(x)
15	
16	      delta = activations[-1] - y
17	      nabla_b[-1] = delta
18	      nabla_w[-1] = np.outer(delta, activations[-2])
19	
20	      for l in range(2, self.num_layers):
21	         z = zs[-l]
22	         sp = self.sigmoid_prime(z)
23	         delta = np.dot(self.weights[-l+1].T, delta) * sp
24	         nabla_b[-l] = delta
25	         nabla_w[-l] = np.outer(delta, activations[-l-1])
26	
27	      return nabla_w, nabla_b

In order to call our backpropagation method, a method needs to be defined for mini-batch stochastic gradient. This method also initializes an empty NumPy array each matching the weights and biases. For each mini-batch, the input (x) and the prediction (y) are passed to the backpropagation method to retrieve the change in weights and biases and those changes are summed to update the weights and bias lists. Finally, our class level weights and biases are updated with our gradient descent formula that uses our provided learning rate averaged across the current mini-batch. You might see the learn_rate argument named as eta in other code examples. I decided to use learn_rate as it felt more readable and intuitive to me.

./mlp/network.py

1	# ...previous code:
2	# imports and Network class
3	# sigmoid function and forward_pass function
4	# sigmoid_prime and backpropagation function
5	   
6	   # iterate over mini-batch
7	   # run backprop and accumulate gradients
8	   # update weights and biases
9	   def update_mini_batch(self, mini_batch, learn_rate):
10	      nabla_w = [np.zeros(w.shape) for w in self.weights]
11	      nabla_b = [np.zeros(b.shape) for b in self.biases]
12	
13	      for x, y in mini_batch:
14	         delta_nabla_w, delta_nabla_b = self.backpropagation(x, y)
15	         nabla_w = [nw + dnw for nw, dnw in zip(nabla_w, delta_nabla_w)]
16	         nabla_b = [nb + dnb for nb, dnb in zip(nabla_b, delta_nabla_b)]
17	
18	      self.weights = [w - (learn_rate / len(mini_batch)) * nw for w, nw in zip(self.weights, nabla_w)]
19	      self.biases = [b - (learn_rate / len(mini_batch)) * nb for b, nb in zip(self.biases, nabla_b)]

Training Iterations and Evaluation

Time to build the part of our code that will iterate over training batches and evaluate the results! The binary cross entropy method is our loss function. The evaluate method calculates the total correct predictions and total losses by iterating over the training data, retrieving the output activation, adding the total loss, and incrementing the number correct if the prediction matches the label (y). Finally, we calculate and return the average loss and accuracy.

./mlp/network.py

1	# ...previous code:
2	# imports and Network class
3	# sigmoid function and forward_pass function
4	# sigmoid_prime and backpropagation function
5	# update_mini_batch function
6	
7	   def binary_cross_entropy(self, y, a):
8	      loss = -(y * np.log(a + 1e-8) + (1 - y) * np.log(1 - a + 1e-8))
9	      return float(loss.item()) # convert (1,1) array to scalar
10	
11	   # evaluate average loss and accuracy on dataset
12	   def evaluate(self, data):
13	      correct = 0
14	      total_loss = 0
15	
16	      for x, y in data:
17	         activations, _ = self.forward_pass(x)
18	         a = activations[-1]
19	         total_loss += self.binary_cross_entropy(y, a)
20	
21	         prediction = 1 if a >= 0.5 else 0
22	         if prediction == y:
23	            correct += 1
24	
25	      avg_loss = total_loss / len(data)
26	      accuracy = correct / len(data)
27	
28	      return avg_loss, accuracy

The train method iterates over each epoch, shuffles the data, creates the mini-batches, calls the update mini batch method, displays the metrics, and optionally validates the training data against validation data if provided.

./mlp/network.py

1	# ...previous code:
2	# imports and Network class
3	# sigmoid and forward_pass functions
4	# sigmoid_prime and backpropagation functions
5	# update_mini_batch function
6	# binary_cross_entropy and evaluate functions
7	
8	
9	   # split dataset into mini-batches
10	   # call update_mini_batch for each mini-batch
11	   # repeat for multiple epochs
12	   # compute average loss for each epoch
13	   def train(self, training_data, epochs, mini_batch_size, learn_rate, validation_data=None, plot=True):
14	      n = len(training_data)
15	      train_losses = []
16	      train_accs = []
17	      val_accs = []
18	
19	      for epoch in range(epochs):
20	         epoch_start = time.time()
21	
22	         np.random.shuffle(training_data)
23	         mini_batches = [training_data[k:k+mini_batch_size] for k in range(0, n, mini_batch_size)]
24	         for mini_batch in mini_batches:
25	            self.update_mini_batch(mini_batch, learn_rate)
26	
27	         avg_loss, accuracy = self.evaluate(training_data)
28	         train_losses.append(avg_loss)
29	         train_accs.append(accuracy)
30	
31	         metrics = f'Epoch {epoch + 1}: Loss = {avg_loss:.3f}: Train Acc = {(accuracy * 100):.1f}%'
32	
33	         if validation_data:
34	            _, val_accuracy = self.evaluate(validation_data)
35	            val_accs.append(val_accuracy)
36	            metrics += f': Val Acc = {(val_accuracy * 100):.1f}%'
37	
38	         epoch_time = time.time() - epoch_start
39	         metrics += f': Time = {epoch_time:.2f}s'
40	
41	         print(metrics)

Time to test out the network! I have two directories: one for galaxy images I downloaded from Galaxy Zoo and one for random non-galaxy images I downloaded via API from Lorem Picsum. The more images you have, the better accuracy your network will achieve. For testing purposes, I have just over 2,000 training samples. It is a rather small training set but it will serve our purpose for this exercise. Plus, we can always scale up later on. First, I call the load method from the DatasetLoader class to preprocess and load the images from the provided directories. I then split the training data into a training set and validation set so I can check for overfitting and underfitting. I then set the layers of my network and call the train method with the provided training parameters: 20 epochs, 100 size per mini-batch, and a learning rate of 0.01. Let’s see how it goes.

./mlp/network.py

1	# ...previous code:
2	# imports and Network class
3	# sigmoid and forward_pass functions
4	# sigmoid_prime and backpropagation functions
5	# update_mini_batch function
6	# binary_cross_entropy and evaluate functions
7	# train function
8	
9	if __name__ == "__main__":
10	   loader = DatasetLoader(
11	      galaxy_dir='../data/gz2/images/587722',
12	      non_galaxy_dir='../data/images/non_galaxies',
13	      preprocessor=ImagePreprocessor(),
14	      flatten=True,
15	      augment=False,
16	      variations=1
17	   )
18	   training_data = loader.load()
19	   print(f"Total training samples: {len(training_data)}")
20	   split = int(0.8 * len(training_data))
21	   training_set = training_data[:split]
22	   validation_set = training_data[split:]
23	   net = Network(layers=[4096, 256, 128, 1])
24	   net.train(training_data=training_set, epochs=20, mini_batch_size=100, learn_rate=0.01, validation_data=validation_set)

As you can see from the output below (and the visuals plotted with matplotlib for the graph lovers out there), the loss is steadily decreasing over each epoch and the training accuracy increases with each epoch. The latter half climbs to over 90% accuracy. Not bad! Additionally, the training accuracy and validation accuracy are very close together through each iteration so that tells us our model is training correctly. The time it takes to run each epoch is fairly slow given I am only training on a couple thousand samples. I would expect this number to be about 2 - 3 seconds per epoch. However, this is to be expected with doing matrix operations using vanilla NumPy on a CPU. There are tricks that we could incorporate to improve this speed but that is beyond the scope of this blog post.

1	\mlp> python network.py
2	Total training samples: 2050
3	Epoch 1: Loss = 0.843: Train Acc = 87.0%: Val Acc = 87.6%: Time = 8.99s
4	Epoch 2: Loss = 0.687: Train Acc = 86.0%: Val Acc = 86.1%: Time = 9.22s
5	Epoch 3: Loss = 0.574: Train Acc = 84.5%: Val Acc = 84.9%: Time = 8.99s
6	Epoch 4: Loss = 0.490: Train Acc = 84.5%: Val Acc = 84.9%: Time = 9.16s
7	Epoch 5: Loss = 0.421: Train Acc = 84.9%: Val Acc = 84.9%: Time = 9.10s
8	Epoch 6: Loss = 0.371: Train Acc = 86.3%: Val Acc = 85.9%: Time = 9.07s
9	Epoch 7: Loss = 0.329: Train Acc = 87.9%: Val Acc = 87.3%: Time = 8.91s
10	Epoch 8: Loss = 0.296: Train Acc = 89.2%: Val Acc = 88.5%: Time = 8.91s
11	Epoch 9: Loss = 0.269: Train Acc = 89.9%: Val Acc = 90.0%: Time = 9.01s
12	Epoch 10: Loss = 0.247: Train Acc = 90.6%: Val Acc = 90.5%: Time = 9.20s
13	Epoch 11: Loss = 0.229: Train Acc = 91.3%: Val Acc = 90.7%: Time = 9.24s
14	Epoch 12: Loss = 0.213: Train Acc = 92.2%: Val Acc = 91.5%: Time = 9.26s
15	Epoch 13: Loss = 0.199: Train Acc = 92.7%: Val Acc = 92.0%: Time = 8.86s
16	Epoch 14: Loss = 0.187: Train Acc = 93.0%: Val Acc = 92.7%: Time = 8.87s
17	Epoch 15: Loss = 0.176: Train Acc = 93.5%: Val Acc = 92.7%: Time = 8.73s
18	Epoch 16: Loss = 0.167: Train Acc = 94.0%: Val Acc = 93.4%: Time = 8.68s
19	Epoch 17: Loss = 0.158: Train Acc = 94.3%: Val Acc = 93.7%: Time = 8.71s
20	Epoch 18: Loss = 0.151: Train Acc = 94.7%: Val Acc = 93.7%: Time = 8.80s
21	Epoch 19: Loss = 0.144: Train Acc = 95.2%: Val Acc = 94.4%: Time = 8.77s
22	Epoch 20: Loss = 0.138: Train Acc = 95.4%: Val Acc = 94.9%: Time = 9.04s

Neural Network From Scratch Part 2: Training Loss Graph

Training loss graph of the Multilayer Perceptron Binary Classification Neural Network showing steady decrease in loss over each successive epoch.

Neural Network From Scratch Part 2: Training and Validation Graph

Training and validation accuracy closely mirror each other and increase over each successive Epoch in this Multilayer Perceptron Binary Classification Neural Network.

Final Thoughts and What Comes Next

Thanks for taking the time to read this post. I hope you came away with some ideas and insights on how MLPs work and/or how to build a neural network yourself from scratch without frameworks such as PyTorch or TensorFlow. In part 3, I will explain how I enhanced this neural network to a Convolutional Neural Network (CNN) using PyTorch. CNNs are far better equipped to handle image classifications such as this. Additionally, I will demonstrate how I converted the CNN into a usable web app where users can upload an image and receive a response as to whether the image uploaded is a galaxy or not. Stay tuned!