Joe's Go Database

Joe's Go Database (JGDB) is a collection of more than 500,000 games by professional and top amateur Go (Baduk) players. It is one of the largest Go datasets online. The dataset is in .sgf format with limited metadata. I compiled the dataset for training machine learning models to play Go. Hopefully it will be useful for others who are looking for Go training data.

Download

The Dataset

JGDB was compiled from numerous sources, cleaned, stripped of metadata and comments, and saved as .sgf files. It contains more than 500,000 games of Go (Baduk) and is split into training, validation, and testing sets.

train: 515,749 games
val:     9,982 games
test:    9,990 games

Processing

The data is cleaned and stripped of metadata, comments, and variations. Only player rank, handicap, komi, result, and game moves are kept. Many games did not have final passes from both players, even if the result was a score, not resignation (i.e. B+3.5). For these games I added passing moves from one or both players. Due to processing there may be errors in some games, probably not too many. If you find anything let me know and I will fix it!

Usage

You are free to use the dataset however you wish! The sequence of moves in a game is generally not copyrightable thus I am releasing JGDB into the public domain.