Dec 27, 2024●9 reads●No License

Image Recognition System Using MINIST Datasets

k
Kusum Adhikari

Deep Learning with MNIST Dataset: A TensorFlow Example

This is a simple demonstration of a deep learning model that classifies handwritten digits from the MNIST dataset using TensorFlow. The MNIST dataset consists of 60,000 28x28 pixel grayscale images for training and 10,000 images for testing, each labeled with a digit from 0 to 9.

1. Loading the MNIST Dataset

We first load the MNIST dataset using tf.keras.datasets.mnist. The dataset is split into training and test sets:

python
import tensorflow as tf
import numpy as np
import matplotlib.pyplot as plt

Load the dataset

datasets = tf.keras.datasets.mnist
(train_images, train_labels), (test_images, test_labels) = datasets.load_data()

We can visualize an image from the dataset to understand the data format. Each image is a 28x28 grayscale image:

python
plt.imshow(train_images[700], cmap='gray')
plt.title(f"Label: {train_labels[700]}")
plt.axis('off')
plt.show()

2. Preparing the Data

We normalize the images by scaling the pixel values from [0, 255] to [0, 1] to help the model learn more effectively. This is done by dividing each image by 255. Additionally, we specify the categories (digits 0-9) and their corresponding labels:

python
train_images = train_images / 255.0
test_images = test_images / 255.0

categories = list(range(10))
class_names = ['0', '1', '2', '3', '4', '5', '6', '7', '8', '9']

Image Dimensions:

Training Images: 60,000 images, each of size 28x28 pixels
Test Images: 10,000 images, each of size 28x28 pixels

3. Building the Model

We create a simple neural network with one hidden layer. The model architecture is as follows:

Flatten Layer: Converts the 28x28 image into a 1D array of 784 pixels.
Dense Layer: A fully connected layer with 128 units and ReLU activation.
Output Layer: A softmax layer that outputs probabilities for the 10 classes (digits 0-9).

python
model = tf.keras.Sequential([
tf.keras.layers.Flatten(input_shape=(28, 28)),
tf.keras.layers.Dense(128, activation='relu'),
tf.keras.layers.Dense(10)
])

Model Compilation:

We use the Adam optimizer for training and the SparseCategoricalCrossentropy loss function because we have integer labels. We also track accuracy as a metric.

python
model.compile(optimizer='adam',
loss=tf.keras.losses.SparseCategoricalCrossentropy(from_logits=True),
metrics=['accuracy'])

4. Training the Model

We train the model for 10 epochs using the training data:

python
history = model.fit(train_images, train_labels, epochs=10)

The training progress can be tracked using the loss and accuracy values during each epoch:

Epoch 1/10
1875/1875 ━━━━━━━━━━━━━━━━━━━━ 4s 1ms/step - accuracy: 0.7922 - loss: 7.3062
...
Epoch 10/10
1875/1875 ━━━━━━━━━━━━━━━━━━━━ 3s 2ms/step - accuracy: 0.9532 - loss: 0.1776

5. Evaluating the Model

After training, we evaluate the model on the test set to see how well it performs:

python
test_loss, test_acc = model.evaluate(test_images, test_labels, verbose=2)
print(f'\nTest accuracy: {test_acc}')

The model achieved a test accuracy of 94.98%.

6. Making Predictions

We then use the trained model to make predictions on the test set. We apply a softmax layer to the model to convert the logits into probabilities.

python
probability_model = tf.keras.Sequential([model, tf.keras.layers.Softmax()])
predictions = probability_model.predict(test_images)

Sample Prediction:

For example, for the 20th test image, the predicted probabilities for each class are:

python
predictions[20]

Output:

array([7.8796272e-17, 4.3501952e-11, 6.3360421e-06, 6.6398679e-06, 4.0793191e-03, 5.1840096e-05, 3.3385075e-09, 9.3602069e-02, 3.3808790e-06, 9.0225041e-01], dtype=float32)

The predicted class is 9 (using np.argmax(predictions[20])), which matches the actual label:

python
test_labels[20]

Output: 9

7. Visualizing Predictions

We can visualize the predictions on the test images and compare them with the actual labels:

python
for i in range(5):
plt.grid(False)
plt.imshow(test_images[i], cmap=plt.cm.binary)
plt.xlabel('Actual: ' + class_names[test_labels[i]])
plt.title('Predictions: ' + class_names[np.argmax(predictions[i])])
plt.show()

8. Conclusion

This experiment demonstrates the use of a simple neural network model to classify handwritten digits from the MNIST dataset using TensorFlow. The model achieved an accuracy of approximately 95% on the test set, which is a strong result for such a simple architecture.

For further improvements, one could explore more advanced architectures (e.g., Convolutional Neural Networks), data augmentation, or hyperparameter tuning.
You can see the output from:
https://github.com/smiling621/Kusum/blob/main/MINIST.ipynb

Additional Resources

Models

MINIST.ipynb

Datasets

Yann.lecun.com