|
- What is the significance of move 37? (to a non go player)
1 I have seen (and googled) information for Game 2, Move 37 in the AlphaGo vs Lee Sedol match However it is difficult to find information concerning this move that doesn't rely on an understanding of go (which I don't have) I would like to understand the significance of this without it being a go gameplay answer
- Did Alphago zero actually beat Alphago 100 games to 0?
2 tl;dr Did AlphaGo and AlphaGo play 100 repetitions of the same sequence of boards, or were there 100 different games? Background: Alphago was the first superhuman go player, but it had human tuning and training AlphaGo zero learned to be more superhuman than superhuman Its supremacy was shown by how it beat AlphaGo perfectly in 100 games
- deep learning - What is the input to AlphaGos neural network . . .
AlphaGo Zero only uses the black and white stones from the Go board as its input, whereas previous versions of AlphaGo included a small number of hand-engineered features What exactly is the input to AlphaGo's neural network? What do they mean by "just white and black stones as input"? What kind of information is the neural network using?
- Newest alphago Questions - Artificial Intelligence Stack Exchange
For questions related to DeepMind's AlphaGo, which is the first computer Go program to beat a human professional Go player without handicaps on a full-sized 19x19 board AlphaGo was introduced in the paper "Mastering the game of Go with deep neural networks and tree search" (2016) by David Silver et al There have been three more powerful successors of AlphaGo: AlphaGo Master, AlphaGo Zero and
- How does Alpha Go Zero MCTS work in parallel?
To understand how AlphaGo Zero performs parallel simulations think of each simulation as a separate agent that interacts with the search tree Each agent starts from the root node and selects an action according to the statistics in the tree, such as: (1) mean action value (Q), (2) visit count (N), and; (3) prior probability (P) The agent then follows the action to the next node and repeats
- Initialising DQN with weights from imitation learning rather than . . .
In AlphaGo, the authors initialised a policy gradient network with weights trained from imitation learning I believe this gives it a very good starting policy for the policy gradient network the imitation network was trained on labelled data of (state, expert action) to output a softmax policy denoting the probability of actions for each state
- Why AlphaGo didnt use Deep Q-Learning?
In the previous research, in 2015, Deep Q-Learning shows its great performance on single player Atari Games But why do AlphaGo's researchers use CNN + MCTS instead of Deep Q-Learning? is that beca
|
|
|