Build Your Own Tic-Tac-Toe Robot

Попытка англоязычного экспромта для Medium.

Увы, литературный мой английский ужасающ, я знаю. Не исключено, что вы прочтете в этом тексте совсем не тот смысл, который автор пытался в него вложить. Заранее, как принято в России, прошу прощения за свое косноязычие.


When I was a child, I learned that there was an amazing book — HEISERMAN, David L. “Build Your Own Working Robot”. My God, I raved about that book at the time! I saw it in my dreams. Alas, it was impossible to buy this book in the USSR. America and Europe were incredibly far away, behind the Iron Curtain, in another space-time continuum and another Universe. I think it would have been easier for me to fly to The Moon. I could only dream. Stanislav Lem and Martin Gardner were the only companions of my Way.

An episode of “Test pilota Pirxa”. 1978.

Today, as my country plunges once again, as if into the warped space of a black hole, I make my Robot all over again. An unclosed gestalt? — I don’t really know. These computer toys are useless, only a way to talk to the author of a book I hadn’t read as a child. Would you like to try making one toy with me today? I promise it will be interesting.

For this experiment you need Ruby!

Today is a little sketch describing, step by step, the creation of Artificial Intelligence for a game of Tic Tac Toe. Have you ever created an AI from scratch before? It’s not hard. I’ll help you. We will not use known algorithms and strategies in our work; we will create a neural network based on FANN.

Tic-Tac-Toe Artificial Intelligence with Neural Network.
Tic-Tac-Toe Artificial Intelligence with Neural Network.

To start with, to learn how to play, the program will play a lot of games with itself. Would you say that this is not the best method to learn anything? I would agree with you. Still, let’s give it a try.

Take a look at the training.rb file. It’s simple: our new AI will have to play 50,000 games of tic-tac-toe. The goal of these games is to get data, based on which the program will be able to successfully play with you. The result of this script will be a csv file containing an extended log of all the games played.

9,"[1, 2, 3, 4, 5, 6, 7, 8, 9]",0.3,0,X,3,7
6,"[1, 2, 3, 4, 5, 6, 7, 8, :X]",0.1,1,O,3,7
1,"[1, 2, 3, 4, 5, :O, 7, 8, :X]",0.3,2,X,3,7
.....

Let’s open the resulting log to look at its first line. The first element is the move, element 2 is the game position on the board. The fourth element is the sequence number of the move (it starts with zero, not one). 5 — cross or zero to identify the player. 6 — element, indicating the presence of a fork as the game progresses. And finally, 7 — the total number of moves in the game.

Okay, but what about element 3? It’s more complicated here. These are the “weights” initially (which can be redefined later) assigned to the moves. The AI starts to prepare for Neural Network analysis. We take a simple logic as a basis: the final win / draw / loss corresponds to 0.3 / 0.2 / 0.1:

prioritization = []
game.players_move_order.map do |i|
  prioritization << if i == game.check
                      0.3
                    elsif game.check == 'draw'
                      0.2
                    else
                      0.1
                    end
end

Which move in Tic-Tac-Toe do you think is optimal? It’s very simple: If it’s the last move, it’s the best move:

if row[6].to_i - row[3].to_i == 1
    x_data.push([row[0].to_i])
    y_data.push([1])
end

What is the worst move? — If the move is penultimate (your opponent wins the next move). Do you agree?

Let’s assume that the above is true for all Tic-Tac-Toe game situations. Including for the last move, when it does not matter. We created a logical basis for our new AI.

OK. During the game the AI should be wary of forks, avoiding moves that can lead to them. Let’s assume that the last move led to the fork: this assumption, due to the simplicity of our AI, is acceptable. What about this one?

No problem. We define a constant:

TRIADS = [
  [0, 1, 2],
  [3, 4, 5],
  [6, 7, 8],
  [0, 3, 6],
  [1, 4, 7],
  [2, 5, 8],
  [6, 4, 2],
  [0, 4, 8]
].freeze

And in the training process, the AI looks forks:

def fork?
  TRIADS.select do |x|
    @board[x[0]] == @board[x[1]] && @board[x[2]].class != @board[x[0]].class &&
      place_x?(x[0]) ||
      @board[x[1]] == @board[x[2]] && @board[x[0]].class != @board[x[2]].class &&
        place_x?(x[1]) ||
      @board[x[0]] == @board[x[2]] && @board[x[1]].class != @board[x[2]].class &&
        place_x?(x[0])
  end
end

The result is when the combination appears twice. Cheers!

if @game.fork?.size > 1
An episode of “Test pilota Pirxa”. 1978.
An episode of “Test pilota Pirxa”. 1978.

Great! Now let’s define some dangerous situations:

DANGEROUS_SITUATIONS_1 = [
  [6, 4, 2],
  [0, 4, 8]
].freeze

DANGEROUS_SITUATIONS_2 = [
  [0, 4, 7],
  [0, 4, 5],
  [2, 4, 3],
  [2, 4, 7],
  [3, 4, 8],
  [1, 4, 8],
  [1, 4, 6],
  [5, 4, 6]
].freeze
def fork_danger_1?
  DANGEROUS_SITUATIONS_1.detect do |x|
    @board[x[0]] == @board[x[2]] &&
      @board[x[0]] != @board[x[1]]
  end
end

def fork_danger_3?
  DANGEROUS_SITUATIONS_1.detect do |x|
    @board[x[0]] != @board[x[2]] &&
      @board[x[1]] == @board[x[2]]
  end
end

def fork_danger_2?
  DANGEROUS_SITUATIONS_2.detect do |x|
    @board[x[0]] == @board[x[2]] &&
      @board[x[0]] != @board[x[1]]
  end
end

We will create three arrays into which we will put the moves: 1. Unambiguously unacceptable, 2. Leading to a fork, and 3. Attacking:

# Create a list of unacceptable moves, a list of moves leading to fork, a list of attacking moves:
array_of_games.each do |row|
  row.each do |e|
    next unless e == current_position

    if row[6].to_i - row[3].to_i == 2 && row[2].to_f != 0.2
      unacceptable_moves_array << row[0]
    # Find moves that inevitably lead to a fork:
    elsif fork_danger_1 && row[3].to_i == (3 if @player1 == 'Human' || 2) && row[0].to_i.odd?
      unacceptable_moves_array << row[0]
    elsif (fork_danger_2 || fork_danger_3) && row[3].to_i == (3 if @player1 == 'Human' || 2) && row[0].to_i.even?
      unacceptable_moves_array << row[0]
    end
    next if row[5].nil?

    # Find moves that may lead to a fork:
    array_of_moves_to_fork << row[0] if row[3].to_i == row[5].to_i
    # Find attacking moves:
    attack_moves_array << row[0] if row[3].to_i == row[5].to_i && row[6].to_i < 7
  end
end

Next, the neural network gets the outcome of our efforts:

data = nn_data(board, fork_danger_1, fork_danger_2, fork_danger_3, array_of_games)
fann_results_array = []
  train = RubyFann::TrainData.new(inputs: data[0], desired_outputs: data[1])
  model = RubyFann::Standard.new(
    num_inputs: 1,
    hidden_neurons: [4],
    num_outputs: 1
  )
  model.train_on_data(train, 5000, 500, 0.01)
  data[0].flatten.each do |i|
    fann_results_array << model.run([i])
  end
result = data[0][fann_results_array.index(fann_results_array.max)]

Result: AI will play you a draw if you don’t make a mistake in the game. Or AI will win if you make the wrong move. If the AI loses, you need to delete the game log and re-generate it. Remember, this is just a simple model to demonstrate the possibility of creating a neural network, and be lenient!

How to start playing:

gem install csv ruby-fann progress_bar tty-pie
git clone https://github.com/cmirnow/Tic-Tac-Toe-AI-with-Neural-Network-Resurrections.git
cd Tic-Tac-Toe-AI-with-Neural-Network-Resurrections
ruby start.rb 

Ваш комментарий

Ваш адрес email не будет опубликован. Обязательные поля помечены *