Introduction
There is a lot of hype surrounding Artificial Intelligence (AI). Terms like “Neural Networks”, “Deep Learning,” and “TensorFlow” are often thrown around as if they are arcane magic reserved for PhDs and Silicon Valley engineers. But the truth is far more exciting: building a neural network is more accessible than you think.
At its core, a neural network is not a brain- it is a mathematical system inspired by the way neurons fire in the human brain. It is a function approximator. If you give it enough data, it can learn to map inputs to outputs, whether that is identifying a cat in a photo, predicting house prices, or generating text.
In this tutorial, we are going to strip away the complexity. We will guide you through building your first neural network in Python using the most popular libraries: TensorFlow and Keras. By the end of this guide, you will not only have a working AI model but also a fundamental understanding of how it works.
Prerequisites
Before we write a single line of code, we need to set up our environment. If you are a TuxAcademy learner, you know that a clean workspace is half the battle.
1. Python Environment
Ensure you have Python 3.8 or later installed. We highly recommend using a virtual environment to keep dependencies clean.
# Create a virtual environment python -m venv tuxai-env # Activate it (Linux/Mac) source tuxai-env/bin/activate # Activate it (Windows) tuxai-env\Scripts\activate
2. Installing Libraries
We need three main libraries:
-
NumPy: For numerical operations.
-
Matplotlib: For visualizing data and training progress.
-
TensorFlow: Google’s open-source library that provides Keras, our high-level API for building neural networks.
Run the following command:
pip install numpy matplotlib tensorflow
Problem
To learn effectively, we need a real problem. We will use the Fashion MNIST dataset.
Fashion MNIST contains 70,000 grayscale images of clothing items:
-
10 categories: T-shirt/top, Trouser, Pullover, Dress, Coat, Sandal, Shirt, Sneaker, Bag, Ankle boot.
-
Image size: 28×28 pixels.
-
Goal: Train a model to look at an image (pixel data) and correctly identify what type of clothing it is.
Step 1: Loading and Exploring the Data
Let’s import our libraries and load the dataset. Keras has built-in datasets, making this incredibly easy.
import numpy as np import matplotlib.pyplot as plt import tensorflow as tf from tensorflow import keras print("TensorFlow version:", tf.__version__) # Load the dataset fashion_mnist = keras.datasets.fashion_mnist (train_images, train_labels), (test_images, test_labels) = fashion_mnist.load_data()
Understanding the Data Structure
Let’s check the shape of our data:
print(f"Training images shape: {train_images.shape}") # (60000, 28, 28) print(f"Training labels shape: {train_labels.shape}") # (60000,) print(f"Test images shape: {test_images.shape}") # (10000, 28, 28)
-
60,000 training images: Each is a 28×28 matrix of pixel values (0 to 255).
-
10,000 test images: Used to evaluate how well our model generalizes to unseen data.
We also need the class names for readable output later:
class_names = ['T-shirt/top', 'Trouser', 'Pullover', 'Dress', 'Coat', 'Sandal', 'Shirt', 'Sneaker', 'Bag', 'Ankle boot']
Visualizing the Data
Let’s plot the first 10 images to see what we are working with. This is a crucial step in any data science project—visual inspection helps catch anomalies early.
plt.figure(figsize=(10, 10)) for i in range(25): plt.subplot(5, 5, i+1) plt.xticks([]) plt.yticks([]) plt.grid(False) plt.imshow(train_images[i], cmap=plt.cm.binary) plt.xlabel(class_names[train_labels[i]]) plt.show()
https://via.placeholder.com/800×400?text=Fashion+MNIST+Sample+Images
Step 2: Preprocessing the Data
Raw data is rarely ready for neural networks. We have two critical preprocessing steps.
1. Normalization
Currently, pixel values range from 0 to 255. Neural networks perform much better when input values are small—usually between 0 and 1 or -1 and 1. This process is called normalization.
We simply divide the pixel values by 255.0.
train_images = train_images / 255.0 test_images = test_images / 255.0
2. Reshaping (If Necessary)
For our first network (a Dense network), we need to flatten the 28×28 grid into a single 1D vector of 784 pixels. Keras can handle this via the Flatten layer, but we could also do it manually. We will let Keras handle it.
Step 3: Building the Neural Network Architecture
This is the heart of the tutorial. We are going to build a Sequential model—a linear stack of layers.
The Architecture Plan
We will use a simple three-layer architecture:
-
Input Layer (Flatten): Takes the 28×28 image and turns it into a 784-long list.
-
Hidden Layer (Dense): A fully connected layer with 128 neurons (units) and the ReLU activation function. This is where the “learning” happens. The neurons learn to detect patterns like edges, shapes, or textures.
-
Output Layer (Dense): A layer with 10 neurons (one for each class) and the Softmax activation function. Softmax converts the raw outputs (logits) into probabilities that sum to 1.
model = keras.Sequential([ keras.layers.Flatten(input_shape=(28, 28)), # Input layer keras.layers.Dense(128, activation='relu'), # Hidden layer keras.layers.Dense(10, activation='softmax') # Output layer ])
Why ReLU and Softmax?
-
ReLU (Rectified Linear Unit):
max(0, x). It introduces non-linearity. Without non-linear activation functions, the entire network would behave like a linear regression model, no matter how many layers you stack. -
Softmax: Ensures the output is a probability distribution. If the model predicts
[0.01, 0.05, ..., 0.80], the highest probability (0.80) is the predicted class.
Step 4: Compiling the Model
Before training, we need to configure the learning process. This is done via the compile method. We need three things:
-
Optimizer: The algorithm that adjusts the weights to minimize loss. Adam is a great default choice—it adapts the learning rate automatically.
-
Loss Function: A measure of how wrong the model is. For classification, we use
sparse_categorical_crossentropy. We use “sparse” because our labels are integers (0-9), not one-hot encoded vectors. -
Metrics: What to track during training. We care about accuracy.
model.compile(optimizer='adam', loss='sparse_categorical_crossentropy', metrics=['accuracy'])
Step 5: Training the Model (The Magic Moment)
Now we feed the data to the model. We use the fit method.
-
epochs: The number of times the model iterates over the entire dataset. We’ll start with 5.
-
validation_split: We reserve 20% of the training data to validate the model during training. This helps us check for overfitting.
history = model.fit(train_images, train_labels, epochs=10, validation_split=0.2)
What happens during training?
-
The model makes a random guess initially.
-
It compares its guess to the actual label (loss).
-
The optimizer calculates the gradient (the direction to adjust the weights).
-
The weights are updated slightly to improve the guess.
-
Repeat for 60,000 images, times 10 epochs.
You will see output like:
Epoch 1/10 1500/1500 [==============================] - 3s 2ms/step - loss: 0.4985 - accuracy: 0.8245 - val_loss: 0.3945 - val_accuracy: 0.8575 ... Epoch 10/10 1500/1500 [==============================] - 3s 2ms/step - loss: 0.2199 - accuracy: 0.9185 - val_loss: 0.3381 - val_accuracy: 0.8805
Notice the val_accuracy (validation accuracy) is slightly lower than the training accuracy. This is normal. It means the model is memorizing the training data a bit, but it still generalizes well.
Step 6: Evaluating the Model
Training is done. Now we use the test set—data the model has never seen—to get the true performance metric.
test_loss, test_acc = model.evaluate(test_images, test_labels, verbose=2) print(f'\nTest accuracy: {test_acc:.4f}')
Typically, the test accuracy will be around 88% to 89% . That means our model correctly identifies the type of clothing in a random image about 9 out of 10 times. For a simple 3-layer network trained in seconds, that’s impressive!
Step 7: Making Predictions
Let’s use the model to predict what a single image is.
# Grab the first image from the test set img = test_images[0] # Keras expects a batch, so we add a dimension (1, 28, 28) img_batch = np.expand_dims(img, axis=0) # Predict predictions = model.predict(img_batch) predicted_class = np.argmax(predictions[0]) actual_class = test_labels[0] print(f"Predicted: {class_names[predicted_class]}") print(f"Actual: {class_names[actual_class]}")
Visualizing Predictions
It’s helpful to see where the model gets confused. Let’s plot a few test images with their predictions.
plt.figure(figsize=(10, 10)) for i in range(25): plt.subplot(5, 5, i+1) plt.xticks([]) plt.yticks([]) plt.grid(False) plt.imshow(test_images[i], cmap=plt.cm.binary) pred = np.argmax(model.predict(test_images[i:i+1])[0]) true = test_labels[i] color = 'green' if pred == true else 'red' plt.xlabel(f"{class_names[pred]} ({class_names[true]})", color=color) plt.show()
Red labels indicate misclassifications. Common confusion points are between “Shirt” and “Pullover” or “Coat” because they look similar.
Step 8: Analyzing Training History
We stored the training history in the history variable. Plotting the loss and accuracy over epochs gives us insight into whether the model is learning correctly.
plt.figure(figsize=(12, 4)) # Plot training & validation accuracy values plt.subplot(1, 2, 1) plt.plot(history.history['accuracy']) plt.plot(history.history['val_accuracy']) plt.title('Model accuracy') plt.ylabel('Accuracy') plt.xlabel('Epoch') plt.legend(['Train', 'Validation'], loc='upper left') # Plot training & validation loss values plt.subplot(1, 2, 2) plt.plot(history.history['loss']) plt.plot(history.history['val_loss']) plt.title('Model loss') plt.ylabel('Loss') plt.xlabel('Epoch') plt.legend(['Train', 'Validation'], loc='upper left') plt.show()
What to look for:
-
If the training accuracy keeps rising but validation accuracy plateaus or drops, the model is overfitting.
-
If both are low, the model is underfitting (needs more epochs or a more complex architecture).
Optimization: Improving the Model
Our current model is a solid baseline. But how can we improve accuracy? Here are a few professional techniques:
1. Adding More Layers
Deep learning is “deep” because of multiple hidden layers. Adding another Dense layer can help the model learn more abstract features.
model = keras.Sequential([ keras.layers.Flatten(input_shape=(28, 28)), keras.layers.Dense(128, activation='relu'), keras.layers.Dense(64, activation='relu'), # New hidden layer keras.layers.Dense(10, activation='softmax') ])
2. Dropout for Regularization
Dropout randomly turns off a percentage of neurons during training. This forces the network to not rely on any single neuron, reducing overfitting.
model = keras.Sequential([ keras.layers.Flatten(input_shape=(28, 28)), keras.layers.Dense(128, activation='relu'), keras.layers.Dropout(0.2), # 20% dropout keras.layers.Dense(10, activation='softmax') ])
3. Changing the Optimizer
While Adam is excellent, sometimes you might want to try RMSprop or SGD with a custom learning rate.
Beyond Classification: Where to Go Next?
Congratulations! You have just built, trained, and evaluated your first neural network. But this is only the beginning. At TuxAcademy, we believe in continuous learning. Here is what you should tackle next:
-
Convolutional Neural Networks (CNNs): For image data, CNNs are far superior to Dense networks because they understand spatial hierarchies (edges -> shapes -> objects). Try replacing the
Flatten+DensewithConv2DandMaxPooling2Dlayers. -
Save and Load Models: In production, you don’t retrain every time. Learn to use
model.save('my_model.h5')andkeras.models.load_model(). -
TensorBoard: Use TensorBoard to visualize your model graphs and training metrics in a beautiful UI.
-
Real-World Data: Try this on the CIFAR-10 dataset (color images) or a custom dataset from Kaggle.
Conclusion
We have covered a lot of ground. You moved from a blank Python file to a functioning AI that can distinguish between sneakers and boots, bags and shirts. You learned about:
-
The structure of a neural network (input, hidden, output layers).
-
Activation functions (ReLU, Softmax).
-
The training process (optimizers, loss functions, epochs).
-
How to evaluate and interpret results.
The barrier to entry for AI is lower than ever, but the need for ethical, knowledgeable engineers is higher than ever. At TuxAcademy, we are committed to providing you with the skills to build AI responsibly.
Key Takeaways
-
Start Simple: Always build a baseline model before adding complexity.
-
Normalization is Crucial: Neural networks are sensitive to input scales.
-
Validate and Test: Never rely on training accuracy alone. Always check against unseen data.
-
Iterate: Model building is an iterative process. Analyze your failures and improve.
Frequently Asked Questions (FAQ)
Q: Why does my accuracy vary each time I run the model?
A: Neural networks initialize weights with random values. This randomness leads to slight variations in performance. If the variation is huge, check your learning rate or batch size.
Q: My model is overfitting. What should I do?
A: Try adding dropout, reducing the number of neurons, or using data augmentation (flipping/rotating images) to artificially increase your dataset size.
Q: Can I run this on a Raspberry Pi?
A: Yes! For inference (making predictions), a Pi is capable. For training large models, you typically need a GPU, but this Fashion MNIST example will run fine on a Pi 4 or 5.
Q: What is the difference between Keras and TensorFlow?
A: TensorFlow is the backend engine. Keras is the high-level API (Application Programming Interface) that makes it easy to build models.

