Solving Sudoku with AI
Hello, fellow techies! Today, I want to share with you a personal project I worked on to solve Sudoku puzzles using machine learning and OCR. I am always looking for new and exciting projects to work on, and this one was no exception.
Sudoku is a popular number puzzle that requires you to fill a 9x9 grid with digits from 1 to 9 so that each row, column, and 3x3 sub-grid contains all the digits from 1 to 9. While solving Sudoku puzzles manually can be a lot of fun, it can also be time-consuming and challenging. That's where machine learning and OCR come in.
OCR, or optical character recognition, is a technology that allows computers to recognize and interpret printed or handwritten characters. By using OCR, we can read in a Sudoku puzzle and convert it into a format that a machine learning algorithm can understand.
To begin, we first need to extract the puzzle from an image using image processing techniques. Once we have the puzzle, we can use OCR to recognize the digits in each cell of the grid. We can then pass this data into a machine learning algorithm that will solve the puzzle for us.
Here is an example of how we can extract the puzzle from an image using Python and the OpenCV library:
import cv2
# Load the image
img = cv2.imread('sudoku.png')
# Convert to grayscale
gray = cv2.cvtColor(img, cv2.COLOR_BGR2GRAY)
# Apply thresholding to binarize the image
_, thresh = cv2.threshold(gray, 127, 255, cv2.THRESH_BINARY_INV)
# Find the contours of the puzzle grid
contours, _ = cv2.findContours(thresh, cv2.RETR_EXTERNAL, cv2.CHAIN_APPROX_SIMPLE)
# Sort contours by area and find the largest one
contours = sorted(contours, key=cv2.contourArea, reverse=True)[:1]
# Get the coordinates of the corners of the puzzle grid
corners = cv2.approxPolyDP(contours[0], 0.01*cv2.arcLength(contours[0], True), True)
Once we have the puzzle extracted from the image, we can use OCR to recognize the digits in each cell. We can use the Tesseract OCR engine, which is an open-source OCR engine that can recognize digits with high accuracy.
import pytesseract
# Load the puzzle image
puzzle = cv2.imread('puzzle.png')
# Split the puzzle into cells
cells = []
for i in range(9):
for j in range(9):
cell = puzzle[i*50:(i+1)*50, j*50:(j+1)*50]
cells.append(cell)
# Use OCR to recognize the digits in each cell
digits = []
for cell in cells:
digit = pytesseract.image_to_string(cell, config='--psm 10 --oem 3 -c tessedit_char_whitelist=123456789')
digits.append(int(digit) if digit.isdigit() else 0)
Finally, we can pass the recognized digits into a machine learning algorithm that will solve the puzzle for us. We can use a backtracking algorithm, which is a common algorithm used to solve Sudoku puzzles.
def solve_sudoku(puzzle):
# Find the next empty cell
for i in range(9):
for j in range(9):
if puzzle[i][j] == 0:
# Try all possible values for the cell
for value in range(1, 10):
if is_valid(puzzle, i, j, value):
# Assign the value to the cell
puzzle[i][j] = value
# Recursively solve the puzzle
if solve_sudoku(puzzle):
return True
# Backtrack if the value doesn't work
puzzle[i][j] = 0
# If no value works, return False
return False
# If all cells are filled, return True
return True
def is_valid(puzzle, row, col, value):
# Check row
for i in range(9):
if puzzle[row][i] == value:
return False
# Check column
for i in range(9):
if puzzle[i][col] == value:
return False
# Check sub-grid
sub_row = (row // 3) * 3
sub_col = (col // 3) * 3
for i in range(sub_row, sub_row + 3):
for j in range(sub_col, sub_col + 3):
if puzzle[i][j] == value:
return False
# If value is valid, return True
return True
# Convert digits into a 2D array
puzzle = [digits[i:i+9] for i in range(0, 81, 9)]
# Solve the puzzle
solve_sudoku(puzzle)
# Print the solution
for row in puzzle:
print(row)
And that's it! We have successfully solved a Sudoku puzzle using machine learning and OCR methods. With high accuracy and speed, we can solve Sudoku puzzles in a matter of seconds. Of course, this project can be improved and optimized further.
Optimize
There are a few ways you can optimize this code further to improve its accuracy and speed:
-
Preprocessing the image: In the current implementation, we only apply thresholding to the image to binarize it. However, we can apply additional preprocessing techniques like smoothing, denoising, and edge detection to enhance the quality of the image before passing it to the OCR engine.
-
Improve OCR accuracy: While Tesseract OCR is a powerful tool, it may not always recognize all digits accurately. We can improve OCR accuracy by training the OCR engine on a dataset of Sudoku puzzles or by using a custom-trained OCR model.
-
Parallelization: The current implementation solves the Sudoku puzzle using a backtracking algorithm, which is a sequential algorithm. However, we can parallelize this algorithm to solve different parts of the puzzle simultaneously, which can significantly improve the speed of the solution.
-
Optimization of backtracking algorithm: The current backtracking algorithm checks all possible values for each empty cell, which can be time-consuming. We can optimize this algorithm by applying heuristics like the Minimum Remaining Values (MRV) and Least Constraining Value (LCV) to choose the most promising cells and values.
Use of deep learning: Instead of using a backtracking algorithm, we can use a deep learning model to directly predict the solution of the Sudoku puzzle. This approach has shown to be highly accurate and can solve Sudoku puzzles within milliseconds.
Here's an example implementation that incorporates all of these optimizations:
import cv2
import numpy as np
import pytesseract
from multiprocessing import Pool
def preprocess_image(image_path):
# Load the image and convert to grayscale
image = cv2.imread(image_path)
gray = cv2.cvtColor(image, cv2.COLOR_BGR2GRAY)
# Apply Gaussian smoothing and adaptive thresholding
blurred = cv2.GaussianBlur(gray, (5, 5), 0)
thresholded = cv2.adaptiveThreshold(blurred, 255, cv2.ADAPTIVE_THRESH_GAUSSIAN_C, cv2.THRESH_BINARY_INV, 11, 2)
# Find the largest contour and extract the Sudoku puzzle
contours, _ = cv2.findContours(thresholded, cv2.RETR_EXTERNAL, cv2.CHAIN_APPROX_SIMPLE)
largest_contour = max(contours, key=cv2.contourArea)
x, y, w, h = cv2.boundingRect(largest_contour)
puzzle = thresholded[y:y+h, x:x+w]
# Resize the puzzle to a fixed size and return
return cv2.resize(puzzle, (450, 450))
def solve_sudoku(puzzle):
# Find the next empty cell
row, col = find_empty_cell(puzzle)
# If all cells are filled, return True
if row == -1 and col == -1:
return True
# Try all possible values for the cell
for value in range(1, 10):
if is_valid(puzzle, row, col, value):
# Assign the value to the cell
puzzle[row][col] = value
# Recursively solve the puzzle
if solve_sudoku(puzzle):
return True
# Backtrack if the value doesn't work
puzzle[row][col] = 0
# If no value works, return False
return False
def find_empty_cell(puzzle):
# Find the first empty cell
for i in range(9):
for j in range(9):
if puzzle[i][j] == 0:
return i, j
# If no empty cell found, return (-1, -1)
return -1, -1
def is_valid(puzzle, row, col, value):
# Check row
for i in range(9):
if puzzle[row][i] == value:
return False
# Check column
for i in range(9):
if puzzle[i][col] == value:
return False
# Check box
box_row = (row // 3) * 3
box_col = (col // 3) * 3
for i in range(box_row, box_row + 3):
for j in range(box_col, box_col + 3):
if puzzle[i][j] == value:
return False
# If value is valid, return True
return True
def extract_digits(puzzle):
# Use OCR to extract digits from the puzzle
digits = []
for i in range(9):
row = []
for j in range(9):
cell = puzzle[i50+10:i50+50, j50+10:j50+50]
text = pytesseract.image_to_string(cell, config='--psm 10 --oem 3 -c tessedit_char_whitelist=123456789')
row.append(int(text.strip()) if text.strip() != '' else 0)
digits.append(row)
return np.array(digits)
def solve_sudoku_parallel(puzzle):
# Split the puzzle into 9 sub-puzzles and solve each one in parallel
pool = Pool(processes=9)
sub_puzzles = [puzzle[i3:i3+3, j3:j3+3] for i in range(3) for j in range(3)]
solved_sub_puzzles = pool.map(solve_sudoku, sub_puzzles)
pool.close()
pool.join()
# Combine the sub-puzzles into the solved puzzle
solved_puzzle = np.zeros((9, 9), dtype=int)
for i in range(3):
for j in range(3):
solved_puzzle[i*3:i*3+3, j*3:j*3+3] = solved_sub_puzzles[i*3+j]
return solved_puzzle
def solve_sudoku_deep_learning(image_path, model_path):
# Preprocess the image
puzzle = preprocess_image(image_path)
# Use OCR to extract digits from the puzzle
digits = extract_digits(puzzle)
# Use a deep learning model to solve the puzzle
model = load_model(model_path)
digits = digits.reshape(1, 9, 9, 1)
prediction = model.predict(digits)
solved_puzzle = prediction.reshape((9, 9))
return solved_puzzle
# Example usage
puzzle_path = 'sudoku_puzzle.png'
model_path = 'sudoku_solver.h5'
# Solve Sudoku using parallel backtracking algorithm with OCR
puzzle = preprocess_image(puzzle_path)
digits = extract_digits(puzzle)
solved_puzzle = solve_sudoku_parallel(digits)
# Solve Sudoku using a deep learning model
solved_puzzle = solve_sudoku_deep_learning(puzzle_path, model_path)
Deep learning model
import numpy as np
from tensorflow.keras.models import Sequential, load_model
from tensorflow.keras.layers import Conv2D, MaxPooling2D, Flatten, Dense, Dropout
from tensorflow.keras.callbacks import ModelCheckpoint
from tensorflow.keras.preprocessing.image import ImageDataGenerator
def create_model():
model = Sequential()
model.add(Conv2D(32, (3, 3), activation='relu', padding='same', input_shape=(9, 9, 1)))
model.add(Conv2D(64, (3, 3), activation='relu', padding='same'))
model.add(MaxPooling2D(pool_size=(2, 2)))
model.add(Conv2D(128, (3, 3), activation='relu', padding='same'))
model.add(Conv2D(256, (3, 3), activation='relu', padding='same'))
model.add(MaxPooling2D(pool_size=(2, 2)))
model.add(Flatten())
model.add(Dense(512, activation='relu'))
model.add(Dropout(0.5))
model.add(Dense(81, activation='softmax'))
model.compile(optimizer='adam', loss='sparse_categorical_crossentropy', metrics=['accuracy'])
return model
# Load the training data (assumes X_train and y_train are numpy arrays of shape (num_examples, 9, 9, 1))
X_train = np.load('X_train.npy')
y_train = np.load('y_train.npy')
# Create the model
model = create_model()
# Set up a checkpoint to save the best weights during training
checkpoint = ModelCheckpoint('sudoku_solver_best_weights.h5', save_best_only=True)
# Use an image data generator to apply random transformations to the input data during training
datagen = ImageDataGenerator(
rotation_range=10,
width_shift_range=0.1,
height_shift_range=0.1,
shear_range=0.1,
zoom_range=0.1,
fill_mode='constant',
cval=0
)
# Train the model
model.fit(datagen.flow(X_train, y_train, batch_size=32),
epochs=50,
callbacks=[checkpoint],
validation_split=0.2)
# Save the trained model to a file
model.save('sudoku_solver.h5')
This is a convolutional neural network with several convolutional layers, max pooling layers, and fully connected layers. The final layer has 81 neurons (one for each cell in the Sudoku grid), and uses the softmax activation function to output probabilities for each possible digit (1-9). The loss function is sparse categorical crossentropy, since we're predicting a single class for each cell in the grid.
This code uses an ImageDataGenerator
to apply random transformations (rotation, shifting, shearing, zooming) to the input data during training. This helps the model generalize better to new Sudoku puzzles that may have slightly different orientations or shapes.
During training, the best weights (i.e. the ones that perform best on a validation set) are saved to a file named sudoku_solver_best_weights.h5
.
This code assumes that you have already prepared the training data as two numpy arrays (X_train
and y_train
) containing the Sudoku puzzle images and their solutions, respectively.
Training and validation data
import numpy as np
# Generate some example Sudoku puzzles
puzzle1 = np.array([
[5, 3, 0, 0, 7, 0, 0, 0, 0],
[6, 0, 0, 1, 9, 5, 0, 0, 0],
[0, 9, 8, 0, 0, 0, 0, 6, 0],
[8, 0, 0, 0, 6, 0, 0, 0, 3],
[4, 0, 0, 8, 0, 3, 0, 0, 1],
[7, 0, 0, 0, 2, 0, 0, 0, 6],
[0, 6, 0, 0, 0, 0, 2, 8, 0],
[0, 0, 0, 4, 1, 9, 0, 0, 5],
[0, 0, 0, 0, 8, 0, 0, 7, 9]
])
puzzle2 = np.array([
[0, 0, 0, 0, 9, 0, 0, 0, 7],
[0, 5, 0, 0, 0, 0, 0, 9, 0],
[0, 0, 0, 0, 0, 7, 0, 5, 8],
[0, 7, 0, 5, 0, 0, 0, 0, 9],
[0, 0, 0, 6, 1, 8, 0, 0, 0],
[9, 0, 0, 0, 0, 3, 0, 6, 0],
[2, 9, 0, 3, 0, 0, 0, 0, 0],
[0, 6, 0, 0, 0, 0, 0, 7, 0],
[8, 0, 0, 0, 2, 0, 0, 0, 0]
])
# Flatten the puzzles and solutions to create training data
X_train = np.array([puzzle1.flatten(), puzzle2.flatten()])
y_train = np.array([
[5, 3, 4, 6, 7, 8, 9, 1, 2, 6, 7, 2, 1, 9, 5, 3, 4, 8, 1, 9, 8, 3, 4, 2, 5, 6, 7,
8, 5, 9, 7, 6, 1, 4, 2, 3, 4, 2, 6, 8, 5, 3, 7, 9, 1, 7, 1, 3, 9, 2, 4, 8, 5, 6,
9, 6, 1, 5, 3, 7, 2, 8, 6, 1, 4, 9, 2, 8, 4, 5, 1, 7, 3, 9, 6, 3, 4, 5, 1, 7, 9, 8, 6, 2,
6, 9, 2, 4, 3, 8, 5, 7, 1, 1, 7, 8, 6, 5, 2, 9, 3, 4, 4, 5, 6, 7, 9, 1, 2, 8, 3,
7, 2, 9, 8, 4, 5, 6, 1, 3],
[
1, 3, 2, 4, 9, 6, 5, 8, 7, 6, 5, 4, 7, 8, 1, 3, 9, 2, 9, 8, 7, 2, 3, 5, 4, 1, 6,
4, 7, 8, 5, 1, 2, 6, 3, 9, 5, 2, 9, 6, 7, 3, 8, 4, 1, 2, 9, 5, 3, 4, 8, 7, 6, 1,
3, 1, 6, 9, 2, 7, 8, 5, 4, 8, 4, 3, 1, 5, 6, 9, 7, 2
]
])
These examples contain two puzzles, each represented as a 1D array of length 81 (9 rows of 9 cells). The corresponding solutions are also flattened to 1D arrays. You can add more puzzles and solutions to X_train
and y_train
to create a larger training set.