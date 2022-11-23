CICERO continuously looks at the game board to understand and model how the other players are likely to act, following which it uses this framework to control a language model that "can generate free-form dialogue, informing other players of its plans and proposing reasonable actions for the other players that coordinate well with them". Meta started with a 2.7 billion parameter BART-like language model that is pre-trained on text from the internet and fine-tuned on over 40,000 human games on webDiplomacy.net. It also developed techniques to automatically annotate messages in the training data with corresponding planned moves in the game. The idea is to control dialogue generation while persuading other players more effectively. In short, Cicero first makes a prediction of what everyone will do; Second, it refines that prediction using planning; Third, it generates several candidate messages based on the board state, dialogue, and its intents; and fourth, it filters messages to reduce gibberish and unrelated comments.