Buscar

Multiagent Logic

Faça como milhares de estudantes: teste grátis o Passei Direto

Esse e outros conteúdos desbloqueados

16 milhões de materiais de várias disciplinas

Impressão de materiais

Agora você pode testar o

Passei Direto grátis

Você também pode ser Premium ajudando estudantes

Faça como milhares de estudantes: teste grátis o Passei Direto

Esse e outros conteúdos desbloqueados

16 milhões de materiais de várias disciplinas

Impressão de materiais

Agora você pode testar o

Passei Direto grátis

Você também pode ser Premium ajudando estudantes

Faça como milhares de estudantes: teste grátis o Passei Direto

Esse e outros conteúdos desbloqueados

16 milhões de materiais de várias disciplinas

Impressão de materiais

Agora você pode testar o

Passei Direto grátis

Você também pode ser Premium ajudando estudantes
Você viu 3, do total de 532 páginas

Faça como milhares de estudantes: teste grátis o Passei Direto

Esse e outros conteúdos desbloqueados

16 milhões de materiais de várias disciplinas

Impressão de materiais

Agora você pode testar o

Passei Direto grátis

Você também pode ser Premium ajudando estudantes

Faça como milhares de estudantes: teste grátis o Passei Direto

Esse e outros conteúdos desbloqueados

16 milhões de materiais de várias disciplinas

Impressão de materiais

Agora você pode testar o

Passei Direto grátis

Você também pode ser Premium ajudando estudantes

Faça como milhares de estudantes: teste grátis o Passei Direto

Esse e outros conteúdos desbloqueados

16 milhões de materiais de várias disciplinas

Impressão de materiais

Agora você pode testar o

Passei Direto grátis

Você também pode ser Premium ajudando estudantes
Você viu 6, do total de 532 páginas

Faça como milhares de estudantes: teste grátis o Passei Direto

Esse e outros conteúdos desbloqueados

16 milhões de materiais de várias disciplinas

Impressão de materiais

Agora você pode testar o

Passei Direto grátis

Você também pode ser Premium ajudando estudantes

Faça como milhares de estudantes: teste grátis o Passei Direto

Esse e outros conteúdos desbloqueados

16 milhões de materiais de várias disciplinas

Impressão de materiais

Agora você pode testar o

Passei Direto grátis

Você também pode ser Premium ajudando estudantes

Faça como milhares de estudantes: teste grátis o Passei Direto

Esse e outros conteúdos desbloqueados

16 milhões de materiais de várias disciplinas

Impressão de materiais

Agora você pode testar o

Passei Direto grátis

Você também pode ser Premium ajudando estudantes
Você viu 9, do total de 532 páginas

Faça como milhares de estudantes: teste grátis o Passei Direto

Esse e outros conteúdos desbloqueados

16 milhões de materiais de várias disciplinas

Impressão de materiais

Agora você pode testar o

Passei Direto grátis

Você também pode ser Premium ajudando estudantes

Prévia do material em texto

MULTIAGENT SYSTEMS
Algorithmic, Game-Theoretic,
and Logical Foundations
Yoav Shoham
Stanford University
Kevin Leyton-Brown
University of British Columbia
Revision 1.1
Multiagent Systems is copyright © Shoham and Leyton-Brown, 2009, 2010. This version is
formatted differently than the book—and in particular has different page numbering—and has
not been fully copy edited. Please treat the printed book as the definitive version.
You are invited to use this electronic copy without restriction for on-screen viewing, but are
requested to print it only under one of the following circumstances:
• You live in a place that does not offer you access to the physical book;
• The cost of the book is prohibitive for you;
• You need only one or two chapters.
Finally, we ask you not to link directly to the PDF or to distribute it electronically. Instead, we
invite you to link to http://www.masfoundations.org. This will allow us to gauge the level of
interest in the book and to update the PDF to keep it consistent with reprintings of the book.
i
To my wife Noa and my daughters Maia, Talia and Ella —YS
To Jude —KLB
Contents
Credits and Acknowledgments xi
Introduction xiii
1 Distributed Constraint Satisfaction 1
1.1 Defining distributed constraint satisfaction problems 2
1.2 Domain-pruning algorithms 4
1.3 Heuristic search algorithms 8
1.3.1 The asynchronous backtracking algorithm 10
1.3.2 A simple example 12
1.3.3 An extended example: the four queens problem 13
1.3.4 Beyond the ABT algorithm 17
1.4 History and references 18
2 Distributed Optimization 19
2.1 Distributed dynamic programming for path planning 19
2.1.1 Asynchronous dynamic programming 19
2.1.2 Learning real-time A∗ 20
2.2 Action selection in multiagent MDPs 22
2.3 Negotiation, auctions and optimization 28
2.3.1 From contract nets to auction-like optimization 28
2.3.2 The assignment problem and linear programming 30
2.3.3 The scheduling problem and integer programming 36
2.4 Social laws and conventions 44
2.5 History and references 46
3 Introduction to Noncooperative Game Theory: Games in Normal Form 47
3.1 Self-interested agents 47
3.1.1 Example: friends and enemies 48
3.1.2 Preferences and utility 49
3.2 Games in normal form 54
3.2.1 Example: the TCP user’s game 54
iv Contents
3.2.2 Definition of games in normal form 55
3.2.3 More examples of normal-form games 56
3.2.4 Strategies in normal-form games 59
3.3 Analyzing games: from optimality to equilibrium 60
3.3.1 Pareto optimality 61
3.3.2 Defining best response and Nash equilibrium 62
3.3.3 Finding Nash equilibria 63
3.3.4 Nash’s theorem: proving the existence of Nash equilibria 65
3.4 Further solution concepts for normal-form games 73
3.4.1 Maxmin and minmax strategies 73
3.4.2 Minimax regret 76
3.4.3 Removal of dominated strategies 78
3.4.4 Rationalizability 81
3.4.5 Correlated equilibrium 83
3.4.6 Trembling-hand perfect equilibrium 85
3.4.7 ǫ-Nash equilibrium 85
3.5 History and references 87
4 Computing Solution Concepts of Normal-Form Games 89
4.1 Computing Nash equilibria of two-player, zero-sum games 89
4.2 Computing Nash equilibria of two-player, general-sum games 91
4.2.1 Complexity of computing a sample Nash equilibrium 91
4.2.2 An LCP formulation and the Lemke–Howson algorithm 93
4.2.3 Searching the space of supports 101
4.2.4 Beyond sample equilibrium computation 104
4.3 Computing Nash equilibria of n-player, general-sum games 105
4.4 Computing maxmin and minmax strategies for two-player, general-sum games 108
4.5 Identifying dominated strategies 108
4.5.1 Domination by a pure strategy 109
4.5.2 Domination by a mixed strategy 110
4.5.3 Iterated dominance 112
4.6 Computing correlated equilibria 113
4.7 History and references 115
5 Games with Sequential Actions: Reasoning and Computing with the Extensive Form
117
5.1 Perfect-information extensive-form games 117
5.1.1 Definition 118
5.1.2 Strategies and equilibria 119
5.1.3 Subgame-perfect equilibrium 121
5.1.4 Computing equilibria: backward induction 124
5.2 Imperfect-information extensive-form games 130
5.2.1 Definition 130
Uncorrected manuscript of Multiagent Systems, published by Cambridge University Press
Revision 1.1 © Shoham & Leyton-Brown, 2009, 2010.
Contents v
5.2.2 Strategies and equilibria 131
5.2.3 Computing equilibria: the sequence form 134
5.2.4 Sequential equilibrium 142
5.3 History and references 145
6 Richer Representations: Beyond the Normal and Extensive Forms 147
6.1 Repeated games 148
6.1.1 Finitely repeated games 149
6.1.2 Infinitely repeated games 150
6.1.3 “Bounded rationality": repeated games played by automata 153
6.2 Stochastic games 159
6.2.1 Definition 160
6.2.2 Strategies and equilibria 160
6.2.3 Computing equilibria 162
6.3 Bayesian games 163
6.3.1 Definition 164
6.3.2 Strategies and equilibria 167
6.3.3 Computing equilibria 170
6.3.4 Ex post equilibrium 173
6.4 Congestion games 174
6.4.1 Definition 174
6.4.2 Computing equilibria 175
6.4.3 Potential games 176
6.4.4 Nonatomic congestion games 178
6.4.5 Selfish routing and the price of anarchy 180
6.5 Computationally motivated compact representations 185
6.5.1 The expected utility problem 185
6.5.2 Graphical games 188
6.5.3 Action-graph games 190
6.5.4 Multiagent influence diagrams 192
6.5.5 GALA 195
6.6 History and references 196
7 Learning and Teaching 199
7.1 Why the subject of “learning” is complex 199
7.1.1 The interaction between learning and teaching 199
7.1.2 What constitutes learning? 201
7.1.3 If learning is the answer, what is the question? 202
7.2 Fictitious play 206
7.3 Rational learning 211
7.4 Reinforcement learning 215
7.4.1 Learning in unknown MDPs 215
7.4.2 Reinforcement learning in zero-sum stochastic games 216
7.4.3 Beyond zero-sum stochastic games 219
Free for on-screen use; please do not distribute. You can get another free copy
of this PDF or order the book at http://www.masfoundations.org.
vi Contents
7.4.4 Belief-based reinforcement learning 220
7.5 No-regret learning and universal consistency 220
7.6 Targeted learning 222
7.7 Evolutionary learning and other large-population models 224
7.7.1 The replicator dynamic 224
7.7.2 Evolutionarily stable strategies 228
7.7.3 Agent-based simulation and emergent conventions 230
7.8 History and references 233
8 Communication 235
8.1 “Doing by talking” I: cheap talk 235
8.2 “Talking by doing”: signaling games 239
8.3 “Doing by talking” II: speech-act theory 241
8.3.1 Speech acts 242
8.3.2 Rules of conversation 243
8.3.3 A game-theoretic view of speech acts 245
8.3.4 Applications 248
8.4 History and references 251
9 Aggregating Preferences: Social Choice 253
9.1 Introduction 253
9.1.1 Example: plurality voting 253
9.2 A formal model 254
9.3 Voting 256
9.3.1 Voting methods 256
9.3.2 Voting paradoxes 258
9.4 Existence of social functions 260
9.4.1 Social welfare functions 260
9.4.2 Social choice functions 263
9.5 Ranking systems 267
9.6 History and references 271
10 Protocols for Strategic Agents: Mechanism Design 273
10.1 Introduction 273
10.1.1 Example: strategic voting 273
10.1.2 Example: buying a shortest path 274
10.2 Mechanism design with unrestricted preferences 275
10.2.1 Implementation 276
10.2.2 The revelation principle 277
10.2.3 Impossibility of general, dominant-strategy implementation 280
10.3 Quasilinear preferences 280
10.3.1 Risk attitudes 281
10.3.2 Mechanism design in the quasilinear setting 284
10.4 Efficient mechanisms 288
Uncorrected manuscript of Multiagent Systems, published by Cambridge University Press
Revision 1.1 © Shoham & Leyton-Brown, 2009, 2010.
Contents vii
10.4.1 Groves mechanisms 288
10.4.2 The VCG mechanism 292
10.4.3 VCG and individual rationality 295
10.4.4 VCG and weak budget balance 296
10.4.5 Drawbacksof VCG 297
10.4.6 Budget balance and efficiency 301
10.4.7 The AGV mechanism 302
10.5 Beyond efficiency 303
10.5.1 What else can be implemented in dominant strategies? 303
10.5.2 Tractable Groves mechanisms 305
10.6 Computational applications of mechanism design 307
10.6.1 Task scheduling 307
10.6.2 Bandwidth allocation in computer networks 309
10.6.3 Multicast cost sharing 312
10.6.4 Two-sided matching 316
10.7 Constrained mechanism design 321
10.7.1 Contracts 322
10.7.2 Bribes 323
10.7.3 Mediators 324
10.8 History and references 326
11 Protocols for Multiagent Resource Allocation: Auctions 329
11.1 Single-good auctions 329
11.1.1 Canonical auction families 330
11.1.2 Auctions as Bayesian mechanisms 332
11.1.3 Second-price, Japanese, and English auctions 333
11.1.4 First-price and Dutch auctions 335
11.1.5 Revenue equivalence 337
11.1.6 Risk attitudes 340
11.1.7 Auction variations 341
11.1.8 “Optimal” (revenue-maximizing) auctions 343
11.1.9 Collusion 345
11.1.10 Interdependent values 348
11.2 Multiunit auctions 351
11.2.1 Canonical auction families 351
11.2.2 Single-unit demand 352
11.2.3 Beyond single-unit demand 355
11.2.4 Unlimited supply: random sampling auctions 357
11.2.5 Position auctions 359
11.3 Combinatorial auctions 361
11.3.1 Simple combinatorial auction mechanisms 363
11.3.2 The winner determination problem 364
11.3.3 Expressing a bid: bidding languages 368
11.3.4 Iterative mechanisms 373
Free for on-screen use; please do not distribute. You can get another free copy
of this PDF or order the book at http://www.masfoundations.org.
viii Contents
11.3.5 A tractable mechanism 375
11.4 Exchanges 377
11.4.1 Two-sided auctions 377
11.4.2 Prediction markets 378
11.5 History and references 380
12 Teams of Selfish Agents: An Introduction to Coalitional Game Theory 383
12.1 Coalitional games with transferable utility 383
12.1.1 Definition 384
12.1.2 Examples 384
12.1.3 Classes of coalitional games 386
12.2 Analyzing coalitional games 387
12.2.1 The Shapley value 388
12.2.2 The core 391
12.2.3 Refining the core: ǫ-core, least core, and nucleolus 394
12.3 Compact representations of coalitional games 397
12.3.1 Weighted majority games and weighted voting games 398
12.3.2 Weighted graph games 399
12.3.3 Capturing synergies: a representation for superadditive games 401
12.3.4 A decomposition approach: multi-issue representation 402
12.3.5 A logical approach: marginal contribution nets 403
12.4 Further directions 405
12.4.1 Alternative coalitional game models 405
12.4.2 Advanced solution concepts 407
12.5 History and references 407
13 Logics of Knowledge and Belief 409
13.1 The partition model of knowledge 409
13.1.1 Muddy children and warring generals 409
13.1.2 Formalizing intuitions about the partition model 410
13.2 A detour to modal logic 413
13.2.1 Syntax 414
13.2.2 Semantics 414
13.2.3 Axiomatics 415
13.2.4 Modal logics with multiple modal operators 416
13.2.5 Remarks about first-order modal logic 416
13.3 S5: An axiomatic theory of the partition model 417
13.4 Common knowledge, and an application to distributed systems 420
13.5 Doing time, and an application to robotics 423
13.5.1 Termination conditions for motion planning 423
13.5.2 Coordinating robots 427
13.6 From knowledge to belief 429
13.7 Combining knowledge and belief (and revisiting knowledge) 431
13.8 History and references 436
Uncorrected manuscript of Multiagent Systems, published by Cambridge University Press
Revision 1.1 © Shoham & Leyton-Brown, 2009, 2010.
Contents ix
14 Beyond Belief: Probability, Dynamics and Intention 437
14.1 Knowledge and probability 437
14.2 Dynamics of knowledge and belief 442
14.2.1 Belief revision 442
14.2.2 Beyond AGM: update, arbitration, fusion, and friends 448
14.2.3 Theories of belief change: a summary 453
14.3 Logic, games, and coalition logic 453
14.4 Towards a logic of “intention” 455
14.4.1 Some preformal intuitions 456
14.4.2 The road to hell: elements of a formal theory of intention 458
14.4.3 Group intentions 461
14.5 History and references 463
Appendices: Technical Background 465
A Probability Theory 467
A.1 Probabilistic models 467
A.2 Axioms of probability theory 467
A.3 Marginal probabilities 468
A.4 Conditional probabilities 468
B Linear and Integer Programming 469
B.1 Linear programs 469
B.2 Integer programs 471
C Markov Decision Problems (MDPs) 475
C.1 The model 475
C.2 Solving known MDPs via value iteration 475
D Classical Logic 477
D.1 Propositional calculus 477
D.2 First-order logic 478
Bibliography 481
Index 503
Free for on-screen use; please do not distribute. You can get another free copy
of this PDF or order the book at http://www.masfoundations.org.
Credits and Acknowledgments
We should start off by explaining the order of authorship. Yoav conceived of the
project, and started it, in late 2001, working on it alone and with several colleagues
(see below). Sometime in 2004 Yoav realized he needed help if the project were
ever to come to conclusion, and he enlisted the help of Kevin. The result was a true
partnership and a complete overhaul of the material. The current book is vastly
different from the draft that existed when the partnership was formed—in depth,
breadth, and form. Yoav and Kevin have made equal contributions to the book; the
order of authorship reflects the history of the book, but nothing else.
In six years of book-writing we accumulated many debts. The following is our
best effort to acknowledge those. If we omit any names it is due solely to our poor
memories and record keeping, and we apologize in advance.
When the book started out, Teg Grenager served as a prolific ghost writer. While
little of the original writing remains (though some does, for example, in Sec-
tion 8.3.1 on speech acts), the project would not have gotten off the ground without
him.
Several past and present graduate students made substantial contributions. Chap-
ter 12 (coalitional games) is based entirely on writing by Sam Ieong, who was also
closely involved in the editing. Section 3.3.4 (the existence of Nash equilibria) and
parts of Section 6.5 (compact game representations) are based entirely on writing
by Albert Xin Jiang, who also worked extensively with us to refine the material. Al-
bert also contributed to the proof of Theorem 3.4.4 (the minmax theorem). Some
of the material in Chapter 4 on computing solution concepts is based on writing by
Ryan Porter, who also contributed much of the material in Section 6.1.3 (bounded
rationality). The material in Chapter 7 (multiagent learning) is based in part on
joint work with Rob Powers, who also contributed text. Section 10.6.4 (mecha-
nisms for matching) is based entirely on text by Baharak Rastegari, and David R.
M. Thompson contributed material to Sections 10.6.3 (mechanisms for multicast
routing) and 6.3.4 (ex post equilibria). Finally, all of the past and present students
listed here offered invaluable comments on drafts. Other students also offered
valuable comments. Samantha Leung deserves special mention; we also received
useful feedback from Michael Cheung, Matthew Chudek, Farhad Ghassemi, Ryan
Golbeck, James Wright, and Erik Zawadzki. We apologize in advance to any others
whose names we have missed.
Several of our colleagues generously contributed material to the book, in addi-
xii Credits and Acknowledgments
tion to lending their insight. They include Geoff Gordon (Matlab code to generate
Figure 3.13, showing the saddle point for zero-sum games), Carlos Guestrin (ma-
terial on action selection in distributed MDPs in Section 2.2, and Figure 1.1, show-
ing a deployed sensor network), Michael Littman (Section 5.1.4 on computing all
subgame-perfect equilibria), Amnon Meisels (much of the material on heuristic
distributed constraint satisfaction in Chapter1), Marc Pauly (material on coalition
logic in Section 14.3), Christian Shelton (material on computing Nash equilibria
for n-player games in Section 4.3), and Moshe Tennenholtz (material on restricted
mechanism design in Section 10.7). We thank Éva Tardos and Tim Roughgar-
den for making available notes that we drew on for our proofs of Lemma 3.3.14
(Sperner’s lemma) and Theorem 3.3.21 (Brouwer’s fixed-point theorem for simplo-
topes), respectively.
Many colleagues around the world generously gave us comments on drafts, or
provided counsel otherwise. Felix Brandt and Vince Conitzer deserve special men-
tion for their particularly detailed and insightful comments. Other colleagues to
whom we are indebted include Alon Altman, Krzysztof Apt, Navin A. R. Bhat, Ro-
nen Brafman, Yiling Chen, Konstantinos Daskalakis, Yossi Feinberg, Jeff Fletcher,
Nando de Freitas, Raul Hakli, Joe Halpern, Jason Hartline, Jean-Jacques Herings,
Ramesh Johari, Bobby Kleinberg, Daphne Koller, Fangzhen Lin, David Parkes,
David Poole, Maurice Queyranne, Tim Roughgarden, Tuomas Sandholm, Peter
Stone, Nikos Vlasis, Mike Wellman, Bob Wilson, Mike Wooldridge, and Dongmo
Zhang.
Many others pointed out errors in the first printing of the book through our er-
rata wiki: B.J.Buter, Nicolas Dudebout, Marco Guazzone, Joel Kammet, Nicolas
Lambert, Nimalan Mahendran, Mike Rogers, Ivomar Brito Soares, Michael Styer,
Sean Sutherland, Grigorios Tsoumakas, Steve Wolfman, and James Wright.
Several people provided critical editorial and production assistance of various
kinds. Most notably, David R. M. Thompson overhauled our figures, code format-
ting, bibliography and index. Chris Manning was kind enough to let us use the
LATEX macros from his own book, and Ben Galin added a few miracles of his own.
Ben also composed several of the examples, found some bugs, drew many figures,
and more generally for two years served as an intelligent jack of all trades on this
project. Erik Zawadzki helped with the bibliography and with some figures. Maia
Shoham helped with some historical notes and bibliography entries, as well as with
some copy-editing.
We thank all these friends and colleagues. Their input has contributed to a better
book, but of course they are not to be held accountable for any remaining short-
comings. We claim sole credit for those.
We also thank Cambridge University Press for publishing the book, and for their
enlightened online-publishing policy which has enabled us to provide the broad-
est possible access to it. Specific thanks to Lauren Cowles, an editor of unusual
intelligence, good judgment, and sense of humor.
Last, and certainly not the least, we thank our families, for supporting us through
this time-consuming project. We dedicate this book to them, with love.
Uncorrected manuscript of Multiagent Systems, published by Cambridge University Press
Revision 1.1 © Shoham & Leyton-Brown, 2009, 2010.
Introduction
Imagine a personal software agent engaging in electronic commerce on your behalf.
Say the task of this agent is to track goods available for sale in various online
venues over time, and to purchase some of them on your behalf for an attractive
price. In order to be successful, your agent will need to embody your preferences
for products, your budget, and in general your knowledge about the environment
in which it will operate. Moreover, the agent will need to embody your knowledge
of other similar agents with which it will interact (e.g., agents who might compete
with it in an auction, or agents representing store owners)—including their own
preferences and knowledge. A collection of such agents forms a multiagent system.
The goal of this book is to bring under one roof a variety of ideas and techniques
that provide foundations for modeling, reasoning about, and building multiagent
systems.
Somewhat strangely for a book that purports to be rigorous, we will not give
a precise definition of a multiagent system. The reason is that many competing,
mutually inconsistent answers have been offered in the past. Indeed, even the
seemingly simpler question—What is a (single) agent?—has resisted a definitive
answer. For our purposes, the following loose definition will suffice: Multiagent
systems are those systems that include multiple autonomous entities with either
diverging information or diverging interests, or both.
Scope of the book
The motivation for studying multiagent systems often stems from interest in ar-
tificial (software or hardware) agents, for example software agents living on the
Internet. Indeed, the Internet can be viewed as the ultimate platform for interac-
tion among self-interested, distributed computational entities. Such agents can be
trading agents of the sort discussed above, “interface agents” that facilitate the in-
teraction between the user and various computational resources (including other
interface agents), game-playing agents that assist (or replace) human players in a
multiplayer game, or autonomous robots in a multi-robot setting. However, while
the material is written by computer scientists with computational sensibilities, it is
quite interdisciplinary and the material is in general fairly abstract. Many of the
ideas apply to—and indeed are often taken from—inquiries about human individu-
als and institutions.
xiv Introduction
The material spans disciplines as diverse as computer science (including arti-
ficial intelligence, theory, and distributed systems), economics (chiefly microe-
conomic theory), operations research, analytic philosophy, and linguistics. The
technical material includes logic, probability theory, game theory, and optimiza-
tion. Each of the topics covered easily supports multiple independent books and
courses, and this book does not aim to replace them. Rather, the goal has been
to gather the most important elements from each discipline and weave them to-
gether into a balanced and accurate introduction to this broad field. The intended
reader is a graduate student or an advanced undergraduate, prototypically, but not
necessarily, in computer science.
Since the umbrella of multiagent systems is so broad, the questions of what to
include in any book on the topic and how to organize the selected material are
crucial. To begin with, this book concentrates on foundational topics rather than
surface applications. Although we will occasionally make reference to real-world
applications, we will do so primarily to clarify the concepts involved; this is despite
the practical motivations professed earlier. And so this is the wrong text for the
reader interested in a practical guide into building this or that sort of software. The
emphasis is rather on important concepts and the essential mathematics behind
them. The intention is to delve in enough detail into each topic to be able to tackle
some technical material, and then to point the reader in the right directions for
further education on particular topics.
Our decision was thus to include predominantly established, rigorous material
that is likely to withstand the test of time, and to emphasize computational perspec-
tives where appropriate. This still left us with vast material from which to choose.
In understanding the selection made here, it is useful to keep in mind the following
keywords: coordination, competition, algorithms, game theory, and logic. These
terms will help frame the chapter overview that follows.
Overview of the chapters
Starting with issues of coordination, we begin in Chapter 1 and Chapter 2 with
distributed problem solving. In these multiagent settings there is no question of
agents’ individual preferences; there is some global problem to be solved, but
for one reason or another it is either necessary or advantageous to distribute the
task among multiple agents, whose actions may require coordination. These chap-
ters are thus strongly algorithmic. Thefirst one looks at distributed constraint-
satisfaction problems. The latter addresses distributed optimization and specifically
examines four algorithmic methods: distributed dynamic programming, action se-
lection in distributed MDPs, auction-like optimization procedures for linear and
integer programming, and social laws.
We then begin to embrace issues of competition as well as coordination. While
the area of multiagent systems is not synonymous with game theory, there is no
question that game theory is a key tool to master within the field, and so we devote
Uncorrected manuscript of Multiagent Systems, published by Cambridge University Press
Revision 1.1 © Shoham & Leyton-Brown, 2009, 2010.
xv
several chapters to it. Chapters 3, 5 and 6 constitute a crash course in noncooper-
ative game theory. They cover, respectively, the normal form, the extensive form,
and a host of other game representations. In these chapters, as in others which draw
on game theory, we culled the material that in our judgment is needed in order to be
a knowledgeable consumer of modern-day game theory. Unlike traditional game
theory texts, we also include discussion of algorithmic considerations. In the con-
text of the normal-form representation that material is sufficiently substantial to
warrant its own chapter, Chapter 4.
We then switch to two specialized topics in multiagent systems. In Chapter 7
we cover multiagent learning. The topic is interesting for several reasons. First,
it is a key facet of multiagent systems. Second, the very problems addressed in
the area are diverse and sometimes ill understood. Finally, the techniques used,
which draw equally on computer science and game theory (as well as some other
disciplines), are not straightforward extensions of learning in the single-agent case.
In Chapter 8 we cover another element unique to multiagent systems, com-
munication. We cover communication in a game-theoretic setting, as well as in
cooperative settings traditionally considered by linguists and philosophers (except
that we see that there too a game-theoretic perspective can creep in).
Next is a three-chapter sequence that might be called “protocols for groups."
Chapters 9 covers social-choice theory, including voting methods. This is a non-
strategic theory, in that it assumes that the preferences of agents are known, and
the only question is how to aggregate them properly. Chapter 10 covers mecha-
nism design, which looks at how such preferences can be aggregated by a central
designer even when agents are strategic. Finally, Chapter 11 looks at the special
case of auctions.
Chapter 12 covers coalitional game theory, in recent times somewhat neglected
within game theory and certainly underappreciated in computer science.
The material in Chapters 1–12 is mostly Bayesian and/or algorithmic in nature.
And thus the tools used in them include probability theory, utility theory, algo-
rithms, Markov decision problems (MDPs), and linear/integer programming. We
conclude with two chapters on logical theories in multiagent systems. In Chap-
ter 13 we cover modal logic of knowledge and belief. This material hails from
philosophy and computer science, but it turns out to dovetail very nicely with the
discussion of Bayesian games in Chapter 6. Finally, in Chapter 14 we extend the
discussion in several directions—we discuss how beliefs change over time, on log-
ical models of games, and how one might begin to use logic to model motivational
attitudes (such as “intention”) in addition to the informational ones (knowledge,
belief).
Required background
The book is rigorous and requires mathematical thinking, but only basic back-
ground knowledge. In much of the book we assume knowledge of basic computer
Free for on-screen use; please do not distribute. You can get another free copy
of this PDF or order the book at http://www.masfoundations.org.
xvi Introduction
science (algorithms, complexity) and basic probability theory. In more technical
parts we assume familiarity with Markov decision problems (MDPs), mathemati-
cal programming (specifically, linear and integer programming), and classical logic.
All of these (except basic computer science) are covered briefly in appendices, but
those are meant as refreshers and to establish notation, not as a substitute for back-
ground in those subjects. (This is true in particular of probability theory.) However,
above all, a prerequisite is a capacity for clear thinking.
How to teach (and learn) from this book
There are partial dependencies among the 13 chapters. To understand them, it is
useful to think of the book as consisting of the following “blocks".
Block 1, Chapters 1–2: Distributed problem solving
Block 2, Chapters 3–6: Noncooperative game theory
Block 3, Chapter 7: Learning
Block 4, Chapter 8: Communication
Block 5, Chapters 9–11: Protocols for groups
Block 6, Chapter 12: Coalitional game theory
Block 7, Chapters 13–14: Logical theories
Within every block there is a sequential dependence (except within Block 1,
in which the sections are largely independent of each other). Among the blocks,
however, there is only one strong dependence: Blocks 3, 4, and 5 each depend on
some elements of noncooperative game theory and thus on block 2 (though none
requires the entire block). Otherwise there are some interesting local pairwise
connections between blocks, but none that requires that both blocks be covered,
whether sequentially or in parallel.
Given this weak dependence among the chapters, there are many ways to craft
a course out of the material, depending on the background of the students, their
interests, and the time available. The book’s Web site
http://www.masfoundations.org
contains several specific syllabi that have been used by us and other colleagues, as
well as additional resources for both students and instructors.
On pronouns and gender
We use male pronouns to refer to agents throughout the book. We debated this
between us, not being happy with any of the alternatives. In the end we reluctantly
Uncorrected manuscript of Multiagent Systems, published by Cambridge University Press
Revision 1.1 © Shoham & Leyton-Brown, 2009, 2010.
xvii
settled on the “standard” male convention rather than the reverse female convention
or the grammatically dubious “they.” We urge the reader not to read patriarchal
intentions into our choice.
Free for on-screen use; please do not distribute. You can get another free copy
of this PDF or order the book at http://www.masfoundations.org.
1 Distributed Constraint Satisfaction
In this chapter and the next we discuss cooperative situations in which agents col-
laborate to achieve a common goal. This goal can be viewed as shared between
the agents or, alternatively, as the goal of a central designer who is designing the
various agents. Of course, if such a designer exists, a natural question is why it
matters that there are multiple agents; they can be viewed merely as end sensors
and effectors for executing the plan devised by the designer. However, there exist
situations in which a problem needs to be solved in a distributed fashion, either
because a central controller is not feasible or because one wants to make good
use of the distributed resources. A good example is provided by sensor networks.sensor network
Such networks consist of multiple processing units, each with local sensor capabil-
ities, limited processing power, limited power supply, and limited communication
bandwidth. Despite these limitations, these networks aim to provide some global
service. Figure 1.1 shows an example of a fielded sensor network used for mon-
itoring environmental quantities like humidity, temperature and pressure in an of-
fice environment. Each sensor can monitor only its local area and, similarly, can
communicate only with other sensors in its localvicinity. The question is what al-
gorithm the individual sensors should run so that the center can still piece together
a reliable global picture.
Distributed algorithms have been widely studied in computer science. We con-
centrate on distributed problem-solving algorithms of the sort studied in artificial
intelligence. We divide the discussion into two parts. In this chapter we cover
distributed constraint satisfaction, where agents attempt in a distributed fashion to
find a feasible solution to a problem with global constraints. In the next chapter
we look at agents who try not only to satisfy constraints, but also to optimize some
objective function subject to these constraints.
Later in this book we will encounter additional examples of distributed problem
solving. Each of them requires specific background, however, which is why they
are not discussed here. Two of them stand out in particular.
• In Chapter 7 we encounter a family of techniques that involve learning, some
of them targeted at purely cooperative situations. In these situations the agents
learn through repeated interactions how to coordinate a choice of action. This
material requires some discussion of noncooperative game theory (discussed in
2 1 Distributed Constraint Satisfaction
SERVER
LAB
PHONEQUIET
Figure 1.1: Part of a real sensor network used for indoor environmental monitoring.
Chapter 3) as well as general discussion of multiagent learning (discussed in
Chapter 7).
• In Chapter 13 we discuss the use of logics of knowledge (introduced in that
chapter) to establish the knowledge conditions required for coordination, in-
cluding an application to distributed control of multiple robots.
1.1 Defining distributed constraint satisfaction problems
A constraint satisfaction problem (CSP) is defined by a set of variables, domainsconstraint
satisfaction
problem (CSP)
for each of the variables, and constraints on the values that the variables might take
on simultaneously. The role of constraint satisfaction algorithms is to assign values
to the variables in a way that is consistent with all the constraints, or to determine
that no such assignment exists.
Constraint satisfaction techniques have been applied in diverse domains, includ-
ing machine vision, natural language processing, theorem proving, and planning
and scheduling, to name but a few. Here is a simple example taken from the do-
main of sensor networks. Figure 1.2 depicts a three-sensor snippet from the sce-
nario illustrated in Figure 1.1. Each of the sensors has a certain radius that, in
combination with the obstacles in the environment, gives rise to a particular cover-
Uncorrected manuscript of Multiagent Systems, published by Cambridge University Press
Revision 1.1 © Shoham & Leyton-Brown, 2009, 2010.
1.1 Defining distributed constraint satisfaction problems 3
age area. These coverage areas are shown as ellipses in Figure 1.2. As you can see,
some of the coverage areas overlap. We consider a specific problem in this setting.
Suppose that each sensor can choose one of three possible radio frequencies. All
the frequencies work equally well so long as no two sensors with overlapping cov-
erage areas use the same frequency. The question is which algorithms the sensors
should employ to select their frequencies, assuming that this decision cannot be
made centrally.
Figure 1.2: A simple sensor net problem.
The essence of this problem can be captured as a graph-coloring problem. Fig-
ure 1.3 shows such a graph, corresponding to the sensor network CSP above. The
nodes represent the individual units; the different frequencies are represented by
colors; and two nodes are connected by an undirected edge if and only if the cov-
erage areas of the corresponding sensors overlap. The goal of graph coloring is to
choose one color for each node so that no two adjacent nodes have the same color.
����
����
�����
�
�
�
�
�� S
S
S
S
S
SS
X1
X2 X3
{red, blue, green}
{red, blue, green} {red, blue, green}6=
6= 6=
Figure 1.3: A graph-coloring problem equivalent to the sensor net problem of Fig-
ure 1.2.
Formally speaking, a CSP consists of a finite set of variablesX = {X1, . . . ,Xn},
Free for on-screen use; please do not distribute. You can get another free copy
of this PDF or order the book at http://www.masfoundations.org.
4 1 Distributed Constraint Satisfaction
a domain Di for each variable Xi, and a set of constraints {C1, . . . , Cm}. Al-
though in general CSPs allow infinite domains, we assume here that all the domains
are finite. In the graph-coloring example above there were three variables, and they
each had the same domain, {red, green, blue}. Each constraint is a predicate on
some subset of the variables, say, Xi1 , . . . ,Xij ; the predicate defines a relation
that is a subset of the Cartesian product Di1 × · · · × Dij . Each such constraint
restricts the values that may be simultaneously assigned to the variables participat-
ing in the constraint. In this chapter we restrict the discussion to binary constraints,
each of which constrains exactly two variables. For example, in the map-coloring
case, each “not-equal” constraint applied to two nodes.
Given a subset S of the variables, an instantiation of S is an assignment of a
unique domain value for each variable in S; it is legal if it does not violate any
constraint that mentions only variables in S. A solution to a network is a legal
instantiation of all variables. Typical tasks associated with constraint networks are
to determine whether a solution exists, to find one or all solutions, to determine
whether a legal instantiation of some of the variables can be extended to a solution,
and so on. We will concentrate on the most common task, which is to find one
solution to a CSP, or to prove that none exists.
In a distributed CSP, each variable is owned by a different agent. The goal is
still to find a global variable assignment that meets the constraints, but each agent
decides on the value of his own variable with relative autonomy. While he does
not have a global view, each agent can communicate with his neighbors in the
constraint graph. A distributed algorithm for solving a CSP has each agent engage
in some protocol that combines local computation with communication with his
neighbors. A good algorithm ensures that such a process terminates with a legal
solution (or with a realization that no legal solution exists) and does so quickly.
We discuss two types of algorithms. Algorithms of the first kind embody a least-
commitment approach and attempt to rule out impossible variable values without
losing any possible solutions. Algorithms of the second kind embody a more adven-
turous spirit and select tentative variable values, backtracking when those choices
prove unsuccessful. In both cases we assume that the communication between
neighboring nodes is perfect, but nothing about its timing; messages can take more
or less time without rhyme or reason. We do assume, however, that if node i sends
multiple messages to node j, those messages arrive in the order in which they were
sent.
1.2 Domain-pruning algorithms
Under domain-pruning algorithms, nodes communicate with their neighbors in or-
der to eliminate values from their domains. We consider two such algorithms. In
the first, the filtering algorithm, each node communicates its domain to its neigh-filtering
algorithm bors, eliminates from its domain the values that are not consistent with the values
received from the neighbors, and the process repeats. Specifically, each node xi
Uncorrected manuscript of Multiagent Systems, published by Cambridge University Press
Revision 1.1 © Shoham & Leyton-Brown, 2009, 2010.
1.2 Domain-pruning algorithms 5
with domain Di repeatedly executes the procedure Revise(xi, xj) for each neigh-
bor xj .
procedure Revise(xi,xj)
forall vi ∈ Di do
if there is no value vj ∈ Dj such that vi is consistent with vj then
delete vi from Di
The process, known also under the general term arc consistency, terminatesarc consistency
when no further elimination takes place, or when one of the domains becomes
empty (in which case the problem has no solution). If the process terminates with
one value in each domain, that set of values constitutes a solution. If it terminates
with multiple values in each domain, the result is inconclusive; the problem might
or might not have a solution.
Clearly, the algorithm is guaranteed to terminate, and furthermore it is sound (in
that if it announces a solution, or announces that no solution exists, it is correct), but
it is not complete (i.e., it may fail to pronounce a verdict). Consider, for example,
the family of very simple graph-coloring problems shown in Figure 1.4. (Note that
problem (d) is identical to the problem in Figure 1.3.)
�
�� �
��
�
�� �
��
�
�� �
��
�
�� �
��
�
�� �
��
�
�� �
��
�
�
�
�
��
�
�
�
�
��
�
�
�
�
��
�
�
�
�
��
S
S
S
S
SS
S
S
S
S
SS
S
S
S
S
SS
S
S
S
S
SS
X1 X1
X1 X1
X2 X2
X2 X2
X3 X3
X3 X3
6= 6=
6= 6=
6= 6=
6= 6=
6= 6=
6= 6=
(d)(c)
(b)(a)
{red, blue, green}
{red, blue, green} {red, blue, green}
{red}
{red, blue} {red, blue}
{red, blue}
{red, blue} {red, blue}
{red}
{red, blue} {red, blue, green}
Figure 1.4: A family of graph coloring problems
In this family of CSPs the three variables (i.e., nodes) are fixed, as are the “not-
Free for on-screen use; please do not distribute. You can get another free copy
of this PDF or order the book at http://www.masfoundations.org.
6 1 Distributed Constraint Satisfaction
equal” constraints between them. What are not fixed are the domains of the vari-
ables. Consider the four instances of Figure 1.4.
(a) Initially, as the nodes communicate with one another, only x1’s messages
result in any change. Specifically, when either x2 or x3 receive x1’s message
they remove red from their domains, ending up with D2 = {blue} and D3 =
{blue, green}. Then, when x2 communicates his new domain to x3, x3 further
reduces his domain to {green}. At this point no further changes take place and
the algorithm terminates with a correct solution.
(b) The algorithm starts as before, but once x2 and x3 receive x1’s message they
each reduce their domains to {blue}. Now, when they update each other on
their new domains, they each reduce their domains to {}, the empty set. At this
point the algorithm terminates and correctly announces that no solution exists.
(c) In this case the initial set of messages yields no reduction in any domain. The
algorithm terminates, but all the nodes have multiple values remaining. And so
the algorithm is not able to show that the problem is overconstrained and has no
solution.
(d) Filtering can also fail when a solution exists. For similar reasons as in instance
(c), the algorithm is unable to show that in this case the problem does have a
solution.
In general, filtering is a very weak method and, at best, is used as a preprocess-
ing step for more sophisticated methods. The algorithm is directly based on the
notion of unit resolution from propositional logic. Unit resolution is the followingunit resolution
inference rule:
A1
¬(A1 ∧A2 ∧ · · · ∧An)
¬(A2 ∧ · · · ∧An)
To see how the filtering algorithm corresponds to unit resolution, we must first
write the constraints as forbidden value combinations, called Nogoods. For exam-Nogood
ple, the constraint that x1 and x2 cannot both take the value “red” would give rise
to the propositional sentence ¬(x1 = red∧x2 = red), which we write as the No-
good {x1, x2}. In instance (b) of Figure 1.4, agent X2 updated his domain based
on agent X1’s announcement that x1 = red and the Nogood {x1 = red, x2 =
red}.
x1 = red
¬(x1 = red ∧ x2 = red)
¬(x2 = red)
Uncorrected manuscript of Multiagent Systems, published by Cambridge University Press
Revision 1.1 © Shoham & Leyton-Brown, 2009, 2010.
1.2 Domain-pruning algorithms 7
Unit resolution is a weak inference rule, and so it is not surprising that the filter-
ing algorithm is weak as well. Hyper-resolution is a generalization of unit resolu-hyper-resolution
tion and has the following form:
A1 ∨A2 ∨ · · · ∨Am
¬(A1 ∧A1,1 ∧A1,2 ∧ · · · )
¬(A2 ∧A2,1 ∧A2,2 ∧ · · · )
.
.
.
¬(Am ∧Am,1 ∧Am,2 ∧ · · · )
¬(A1,1 ∧ · · · ∧A2,1 ∧ · · · ∧Am,1 ∧ · · · )
Hyper-resolution is both sound and complete for propositional logic, and indeed
it gives rise to a complete distributed CSP algorithm. In this algorithm, each agent
repeatedly generates new constraints for his neighbors, notifies them of these new
constraints, and prunes his own domain based on new constraints passed to him by
his neighbors. Specifically, he executes the following algorithm, where NGi is the
set of all Nogoods of which agent i is aware and NG∗j is a set of new Nogoods
communicated from agent j to agent i.
procedure ReviseHR(NGi,NG∗j )
repeat
NGi ← NGi
⋃
NG∗j
let NG∗i denote the set of new Nogoods that i can derive from NGi and
his domain using hyper-resolution
if NG∗i is nonempty then
NGi ← NGi
⋃
NG∗i
send the Nogoods NG∗i to all neighbors of i
if {} ∈ NG∗i then
stop
until there is no change in i’s set of Nogoods NGi
The algorithm is guaranteed to converge in the sense that after sending and re-
ceiving a finite number of messages, each agent will stop sending messages and
generating Nogoods. Furthermore, the algorithm is complete. The problem has
a solution iff, on completion, no agent has generated the empty Nogood. (Obvi-
ously, every superset of a Nogood is also forbidden, and thus if a single node ever
generates an empty Nogood then the problem has no solution.)
Consider again instance (c) of the CSP problem in Figure 1.4. In contrast to
the filtering algorithm, the hyper-resolution-based algorithm proceeds as follows.
Initially, x1 maintains four Nogoods—{x1 = red, x2 = red}, {x1 = red, x3 =
red},
{x1 = blue, x2 = blue}, {x1 = blue, x3 = blue} —which are derived directly
Free for on-screen use; please do not distribute. You can get another free copy
of this PDF or order the book at http://www.masfoundations.org.
8 1 Distributed Constraint Satisfaction
from the constraints involving x1. Furthermore, x1 must adopt one of the values in
his domain, so x1 = red ∨ x1 = blue. Using hyper-resolution, x1 can reason:
x1 = red ∨ x1 = blue
¬(x1 = red ∧ x2 = red)
¬(x1 = blue ∧ x3 = blue)
¬(x2 = red ∧ x3 = blue)
Thus, x1 constructs the new Nogood {x2 = red, x3 = blue}; in a similar way
he can also construct the Nogood {x2 = blue, x3 = red}. x1 then sends both
Nogoods to his neighbors x2 and x3. Using his domain, an existing Nogood and
one of these new Nogoods, x2 can reason:
x2 = red ∨ x2 = blue
¬(x2 = red ∧ x3 = blue)
¬(x2 = blue ∧ x3 = blue)
¬(x3 = blue)
Using the other new Nogood from x1, x2 can also construct the Nogood {x3 =
red}. These two singleton Nogoods are communicated to x3 and allow him to
generate the empty Nogood. This proves that the problem does not have a solution.
This example, while demonstrating the greater power of the hyper-resolution-
based algorithm relative to the filtering algorithm, also exposes its weakness; the
number of Nogoods generated can grow to be unmanageably large. (Indeed, we
only described the minimal number of Nogoods needed to derive the empty No-
good; many others would be created as all the agents processed each other’s mes-
sages in parallel. Can you find an example?) Thus, the situation in which we find
ourselves is that we have one algorithm that is too weak and another that is im-
practical. The problem lies in the least-commitment nature of these algorithms;
they are restrictedto removing only provably impossible value combinations. The
alternative to such “safe” procedures is to explore a subset of the space, making
tentative value selections for variables, and backtracking when necessary. This is
the topic of the next section. However, the algorithms we have just described are
not irrelevant; the filtering algorithm is an effective preprocessing step, and the
algorithm we discuss next is based on the hyper-resolution-based algorithm.
1.3 Heuristic search algorithms
A straightforward centralized trial-and-error solution to a CSP is to first order the
variables (e.g., alphabetically). Then, given the ordering x1, x2, . . . , xn, invoke
the procedure ChooseValue(x1, {}). The procedure ChooseValue is defined recur-
sively as follows, where {v1, v2, . . . , vi−1} is the set of values assigned to variables
x1, . . . , xi−1.
Uncorrected manuscript of Multiagent Systems, published by Cambridge University Press
Revision 1.1 © Shoham & Leyton-Brown, 2009, 2010.
1.3 Heuristic search algorithms 9
procedure ChooseValue(xi, {v1, v2, . . . , vi−1})
vi ← value from the domain of xi that is consistent with {v1, v2, . . . , vi−1}
if no such value exists then
backtrack1
else if i = n then
stop
else
ChooseValue(xi+1, {v1, v2, . . . , vi})
This exhaustive search of the space of assignments has the advantage of com-chronological
backtracking pleteness. But it is “distributed” only in the uninteresting sense that the different
agents execute sequentially, mimicking the execution of a centralized algorithm.
The following attempt at a distributed algorithm has the opposite properties; it
allows the agents to execute in parallel and asynchronously, is sound, but is not
complete. Consider the following naive procedure, executed by all agents in paral-
lel and asynchronously.
select a value from your domain
repeat
if your current value is consistent with the current values of your
neighbors, or if none of the values in your domain are consistent with them
then
do nothing
else
select a value in your domain that is consistent with those of your
neighbors and notify your neighbors of your new value
until there is no change in your value
Clearly, when the algorithm terminates because no constraint violations have
occurred, a solution has been found. But in all other cases, all bets are off. If the
algorithm terminates because no agent can find a value consistent with those of his
neighbors, there might still be a consistent global assignment. And the algorithm
may never terminate even if there is a solution. For example, consider example (d)
of Figure 1.4: if every agent cycles sequentially between red, green, and blue, the
algorithm will never terminate.
We have given these two straw-man algorithms for two reasons. Our first rea-
son is to show that reconciling true parallelism and asynchrony with soundness
and completeness is likely to require somewhat complex algorithms. And second,
1. There are various ways to implement the backtracking in this procedure. The most straightforward way
is to undo the choices made thus far in reverse chronological order, a procedure known as chronological
backtracking. It is well known that more sophisticated backtracking procedures can be more efficient, but
that does not concern us here.
Free for on-screen use; please do not distribute. You can get another free copy
of this PDF or order the book at http://www.masfoundations.org.
10 1 Distributed Constraint Satisfaction
the fundamental heuristic algorithm for distributed CSPs—the asynchronous back-
tracking (or ABT) algorithm—shares much with the two algorithms. From the firstABT algorithm
algorithm it borrows the notion of a global total ordering on the agents. From the
second it borrows a message-passing protocol, albeit a more complex one, which
relies on the global ordering. We will describe the ABT in its simplest form. After
demonstrating it on an extended example, we will point to ways in which it can be
improved upon.
1.3.1 The asynchronous backtracking algorithm
As we said, the asynchronous backtracking (ABT) algorithm assumes a total order-
ing (the “priority order") on the agents. Each binary constraint is known to both
the constrained agents and is checked in the algorithm by the agent with the lower
priority between the two. A link in the constraint network is always directed from
an agent with higher priority to an agent with lower priority.
Agents instantiate their variables concurrently and send their assigned values to
the agents that are connected to them by outgoing links. All agents wait for and
respond to messages. After each update of his assignment, an agent sends his new
assignment along all outgoing links. An agent who receives an assignment (from
the higher-priority agent of the link), tries to find an assignment for his variable
that does not violate a constraint with the assignment he received.
ok? messages are messages carrying an agent’s variable assignment. When an
agent Ai receives an ok? message from agent Aj , Ai places the received assign-
ment in a data structure called agent_view, which holds the last assignment Ai
received from higher-priority neighbors such as Aj . Next, Ai checks if his current
assignment is still consistent with his agent_view. If it is consistent, Ai does
nothing. If not, then Ai searches his domain for a new consistent value. If he finds
one, he assigns his variable that value and sends ok? messages to all lower-priority
agents linked to him informing them of this value. Otherwise, Ai backtracks.
The backtrack operation is executed by sending a Nogood message. Recall
that a Nogood is simply an inconsistent partial assignment, that is, assignments of
specific values to some of the variables that together violate the constraints on those
variables. In this case, the Nogood consists of Ai’s agent_view.2 The Nogood is
sent to the agent with the lowest priority among the agents whose assignments are
included in the inconsistent tuple in the Nogood. Agent Ai who sends a Nogood
message to agent Aj assumes that Aj will change his assignment. Therefore, Ai
removes from his agent_view the assignment of Aj and makes an attempt to find
an assignment for Aj’s variable that is consistent with the updated agent_view.
Because of its reliance on building up a set of Nogoods, the ABT algorithm
can be seen as a greedy version of the hyper-resolution algorithm of the previous
section. In the latter, all possible Nogoods are generated by each agent and commu-
nicated to all neighbors, even though the vast majority of these messages are not
2. We later discuss schemes that achieve better performance by avoiding always sending this entire set.
Uncorrected manuscript of Multiagent Systems, published by Cambridge University Press
Revision 1.1 © Shoham & Leyton-Brown, 2009, 2010.
1.3 Heuristic search algorithms 11
useful. Here, agents make tentative choices of a value for their variables, only gen-
erate Nogoods that incorporate values already generated by the agents above them
in the order, and—importantly—communicate new values only to some agents and
new Nogoods to only one agent.
Below is the pseudocode of the ABT algorithm, specifying the protocol for agent
Ai.
when received (Ok?, (Aj , dj)) do
add (Aj , dj) to agent_view
check_agent_view
when received (Nogood, nogood) do
add nogood to Nogood list
forall (Ak, dk) ∈ nogood, if Ak is not a neighbor of Ai do
add (Ak, dk) to agent_view
request Ak to add Ai as a neighbor
check_agent_view
procedure check_agent_view
when agent_view and current_value are inconsistent do
if no value in Di is consistent with agent_view then
backtrack
else
select d ∈ Di consistent with agent_view
current_value← d
send (ok?, (Ai, d)) to lower-priority neighbors
procedure backtrack
nogood ← some inconsistent set, using hyper-resolutionor similar procedure
if nogood is the empty set then
broadcast to other agents that there is no solution
terminate this algorithm
else
select (Aj, dj) ∈ nogood where Aj has the lowest priority in nogood
send (Nogood, nogood) to Aj
remove (Aj , dj) from agent_view
check_agent_view
Notice a certain wrinkle in the pseudocode, having to do with the addition of
edges. Since the Nogood can include assignments of some agentAj , whichAi was
not previously constrained with, after adding Aj’s assignment to its agent_view
Free for on-screen use; please do not distribute. You can get another free copy
of this PDF or order the book at http://www.masfoundations.org.
12 1 Distributed Constraint Satisfaction
Ai sends a message to Aj asking it to add Ai to its list of outgoing links. Further-
more, after adding the link, Aj sends an ok? message to Ai each time it reassigns
its variable. After storing the Nogood,Ai checks if its assignment is still consistent.
If it is, a message is sent to the agent the Nogood was received from. This resend-
ing of the assignment is crucial since, as mentioned earlier, the agent sending a
Nogood assumes that the receiver of the Nogood replaces its assignment. There-
fore it needs to know that the assignment is still valid. If the old assignment that
was forbidden by the Nogood is inconsistent, Ai tries to find a new assignment
similarly to the case when an ok? message is received.
1.3.2 A simple example
In Section 1.3.3 we give a more elaborate example, but here is a brief illustra-
tion of the operation of the ABT algorithm on one of the simple problems en-
countered earlier. Consider again the instance (c) of the CSP in Figure 1.4, and
assume the agents are ordered alphabetically: x1, x2, x3. They initially select val-
ues at random; suppose they all select blue. x1 notifies x2 and x3 of his choice,
and x2 notifies x3. x2’s local view is thus {x1 = blue}, and x3’s local view is
{x1 = blue, x2 = blue}. x2 and x3 must check for consistency of their local
views with their own values. x2 detects the conflict, changes his own value to red,
and notifies x3. In the meantime, x3 also checks for consistency and similarly
changes his value to red; he, however, notifies no one. Then x3 receives a second
message from x2, and updates his local view to {x1 = blue, x2 = red}. At this
point he cannot find a value from his domain consistent with his local view, and,
using hyper resolution, generates the Nogood {x1 = blue, x2 = red}. He com-
municates this Nogood to x2, the lowest ranked agent participating in the Nogood.
x2 now cannot find a value consistent with his local view, generates the Nogood
{x1 = blue}, and communicates it to x1. x1 detects the inconsistency with his
current value, changes his value to red, and communicates the new value to x2
and x3. The process now continues as before; x2 changes his value back to blue,
x3 finds no consistent value and generates the Nogood {x1 = red, x2 = blue},
and then x2 generates the Nogood {x1 = red}. At this point x1 has the Nogood
{x1 = blue} as well as the Nogood {x1 = red}, and using hyper-resolution he
generates the Nogood {}, and the algorithm terminates having determined that the
problem has no solution.
The need for the addition of new edges is seen in a slightly modified example,
shown in Figure 1.5.
As in the previous example, here too x3 generates the Nogood {x1 = blue, x2 =
red} and notifies x2. x2 is not able to regain consistency by changing his own
value. However, x1 is not a neighbor of x2, and so x2 does not have the value
x1 = blue in his local view and is not able to send the Nogood {x1 = blue} to
x1. So x2 sends a request to x1 to add x2 to his list of neighbors and to send x2 his
current value. From there onward the algorithm proceeds as before.
Uncorrected manuscript of Multiagent Systems, published by Cambridge University Press
Revision 1.1 © Shoham & Leyton-Brown, 2009, 2010.
1.3 Heuristic search algorithms 13
ffifl
�fi
ffifl
�fi
ffifl
�fi
@
@
@
@ �
�
�
�
X1
{1, 2}
X2
{2}
X3
{1, 2}
6= 6=
(a)
@
@
@R
�
�
�	
(new_val,(X1, 1)) (new_val,(X2, 2))
local_view
{(X1, 1), (X2, 2)}
ffifl
�fi
ffifl
�fi
ffifl
�fi
@
@
@
@ �
�
�
�
-
new link
X1
{1, 2}
X2
{2}
X3
{1, 2}
6= 6=
(b)
add neighbor request
ff
�
�
��
(Nogood,{(X1, 1), (X2, 2)})
local_view
{(X1, 1)}
ffifl
�fi
ffifl
�fi
ffifl
�fi
@
@
@
@ �
�
�
�
-X1{1, 2}
X2
{2}
X3
{1, 2}
6= 6=
(c)
ff
(Nogood,{(X1, 1)})
Figure 1.5: Asynchronous backtracking with dynamic link addition.
1.3.3 An extended example: the four queens problem
In order to gain additional feeling for the ABT algorithm beyond the didactic exam-
ple in the previous section, let us look at one of the canonical CSP problems: the
n-queens problem. More specifically, we will consider the four queens problem,
which asks how four queens can be placed on a 4× 4 chessboard so that no queen
can (immediately) attack any other. We will describe ABT’s behavior in terms of
cycles of computation, which we somewhat artificially define to be the receiving
Free for on-screen use; please do not distribute. You can get another free copy
of this PDF or order the book at http://www.masfoundations.org.
14 1 Distributed Constraint Satisfaction
O
K
?
O
K
?
O
K
?
D4
D5
D6
D7
O
K
?
OK?
OK?
OK?
OK?
OK
?
Figure 1.6: Cycle 1 of ABT for four
queens. All agents are active.
O
K
? OK?
OK
?
D4
D5
D6
D7
O
K
?
OK?
OK
?
Nog
ood
Figure 1.7: Cycle 2 of ABT for four
queens. A2, A3 and A4 are active. The
Nogood message is A1 = 1 ∧ A2 =
1→ A3 6= 1.
of messages, the computations triggered by received messages, and the sending of
messages due to these computations.
In the first cycle (Figure 1.6) all agents select values for their variables, which
represent the positions of their queens along their respective rows. Arbitrarily, we
assume that each begins by positioning his queen at the first square of his row. Each
agent 1, 2, and 3 sends ok? messages to the agents ordered after him: A1 sends
three messages, A2 sends two, and agent A3 sends a single message. Agent A4
does not have any agent after him, so he sends no messages. All agents are active
in this first cycle of the algorithm’s run.
In the second cycle (Figure 1.7) agentsA2, A3, andA4 receive the ok? messages
sent to them and proceed to assign consistent values to their variables. Agent A3
assigns the value 4 that is consistent with the assignments of A1 and A2 that he
receives. AgentA4 has no value consistent with the assignments ofA1,A2, andA3,
and so he sends a Nogood containing these three assignments to A3 and removes
the assignment of A3 from his Agent_V iew. Then, he assigns the value 2 which
is consistent with the assignments that he received from A1 and A2 (having erased
the assignment of A3, assuming that it will be replaced because of the Nogood
message). The active agents in this cycle are A2, A3, and A4. Agent A2 acts
according to his information about A1’s position and moves to square 3, sending
two ok? messages to inform his successors about his value. As can be seen in
Figure 1.7,A3 has moved to square 4 after receiving the ok? messages of agentsA1
and A2. Note that agent A3 thinks that these agents are still in the first column of
their respective rows. This is a manifestation of concurrency that causes each agent
to act at all times in a form that is based only on his Agent_View. TheAgent_V iew
of agent A3 includes the ok? messages he received.
The third cycle is described in Figure 1.8; only A3 is active. After receiving
the assignment of agent A2, A3 sends back a Nogood message to agent A2. He
then erases the assignment of agent A2 from his Agent_V iew and validates that
Uncorrected manuscript of Multiagent Systems,published by Cambridge University Press
Revision 1.1 © Shoham & Leyton-Brown, 2009, 2010.
1.3 Heuristic search algorithms 15
D4
D5
D6
D7
N
ogood
Figure 1.8: Cycle 3. Only A3 is active.
The Nogood message is A1 = 1 →
A2 6= 3.
D7
D4
D5
D6
O
K
?
O
K
?
O
K
?
O
K
?
O
K
?
N
ogood
Figure 1.9: Cycles 4 and 5. A2, A3 and
A4 are active. The Nogood message is
A1 = 1 ∧A2 = 4→ A3 6= 4.
his current assignment (the value 4) is consistent with the assignment of agent A1.
Agents A1 and A2 continue to be idle, having received no messages that were sent
in cycle 2. The same is true for agent A4. Agent A3 also receives the Nogood sent
by A4 in cycle 2 but ignores it since it includes an invalid assignment for A2 (i.e.,
(2, 1) and not the currently correct (2, 4)).
Cycles 4 and 5 are depicted in Figure 1.9. In cycle 4 agent A2 moves to square
4 because of the Nogood message he received. His former value was ruled out and
the new value is the next valid one. He informs his successors A3 and A4 of his
new position by sending two ok? messages. In cycle 5 agent A3 receives agent
A2’s new position and selects the only value that is compatible with the positions
of his two predecessors, square 2. He sends a message to his successor informing
him about this new value. Agent A4 is now left with no valid value to assign and
sends a Nogood message toA3 that includes all his conflicts. The Nogood message
appears at the bottom of Figure 1.9. Note that the Nogood message is no longer
valid. Agent A4, however, assumes that A3 will change his position and moves to
his only valid position (given A3’s anticipated move)—column 3.
Consider now cycle 6. Agent A4 receives the new assignment of agent A3 and
sends him a Nogood message. Having erased the assignment of A3 after sending
the Nogood message, he then decides to stay at his current assignment (column 3),
since it is compatible with the positions of agents A1 and A2. Agent A3 is idle in
cycle 6, since he receives no messages from either agent A1 or agent A2 (who are
idle too). So, A4 is the only active agent at cycle 6 (see Figure 1.10).
In each of cycles 7 and 8, one Nogood is sent. Both are depicted in Figure 1.11.
First, agent A3, after receiving the Nogood message from A4, finds that he has
no valid values left and sends a Nogood to A2. Next, in cycle 8, agent A2 also
discovers that his domain of values is exhausted and sends a Nogood message
to A1. Both sending agents erase the values of their successors (to whom the
Nogood messages were sent) from their agent_views and therefore remain in
their positions, which are now conflict free.
Free for on-screen use; please do not distribute. You can get another free copy
of this PDF or order the book at http://www.masfoundations.org.
16 1 Distributed Constraint Satisfaction
D4
D5
D6
D7
N
ogood
Figure 1.10: Cycle 6. Only A4 is active.
The Nogood message isA1 = 1∧A2 =
4→ A3 6= 2.
D4
D5
D6
D7
Nogood
No
goo
d
Figure 1.11: Cycles 7 and 8. A3 is ac-
tive in the first cycle and A2 is active in
the second. The Nogood messages are
A1 = 1→ A2 6= 4 and A1 6= 1.
D4
D5
D6
D7
O
K
?
OK?
O
K
?
Figure 1.12: Cycle 9. OnlyA1 is active.
D4
D5
D6
D7
OK?
Figure 1.13: Cycle 10. Only A3 is ac-
tive.
Cycle 9 involves only agentA1, who receives the Nogood message from A2 and
so moves to his next value—square 2. Next, he sends ok? messages to his three
successors.
The final cycle is cycle 10. Agent A3 receives the ok? message of A1 and so
moves to a consistent value—square 1 of his row. Agents A2 and A4 check their
Agent_V iews after receiving the same ok? messages from agent A1 and find
that their current values are consistent with the new position of A1. Agent A3
sends an ok? message to his successor A4, informing of his move, but A4 finds
no reason to move. His value is consistent with all value assignments of all his
predecessors. After cycle 10 all agents remain idle, having no constraint violations
with assignments on their agent_views. Thus, this is a final state of the ABT
algorithm in which it finds a solution.
Uncorrected manuscript of Multiagent Systems, published by Cambridge University Press
Revision 1.1 © Shoham & Leyton-Brown, 2009, 2010.
1.3 Heuristic search algorithms 17
1.3.4 Beyond the ABT algorithm
The ABT algorithm is the backbone of modern approaches to distributed constraint
satisfaction, but it admits many extensions and modifications.
A major modification has to do with which inconsistent partial assignment (i.e.,
Nogood) is sent in the backtrack message. In the version presented earlier, which
is the early version of ABT, the full agent_view is sent. However, the full
agent_view is in many cases not a minimal Nogood; a strict subset of it may also
be inconsistent. In general, shorter Nogoods can lead to a more efficient search
process, since they permit backjumping further up the search tree.
Here is an example. Consider an agent A6 holding an inconsistent agent_view
with the assignments of agents A1, A2, A3, A4 and A5. If we assume that A6
is only constrained by the current assignments of A1 and A3, sending a Nogood
message to A5 that contains all the assignments in the agent_view seems to be
a waste. After sending the Nogood to A5, A6 will remove his assignment from
the agent_view and make another attempt to assign his variable, which will be
followed by an additional Nogood sent to A4 and the removal of A4’s assignment
from the agent_view. These attempts will continue until a minimal subset is
sent as a Nogood. In this example, it is the Nogood sent to A3. The assignment
with the lower priority in the minimal inconsistent subset is removed from the
agent_view and a consistent assignment can now be found. In this example the
computation ended by sending a Nogood to the culprit agent, which would have
been the outcome if the agent computed a minimal subset.
The solution to this inefficiency, however, is not straightforward, since finding a
minimal Nogood is in general intractable (specifically, NP-hard). And so various
heuristics are needed to cut down on the size of the Nogood, without sacrificing
correctness.
A related issue is the number of Nogoods stored by each agent. In the preceding
ABT version, each Nogood is recorded by the receiving agent. Since the number of
inconsistent subsets can be exponential, constraint lists with exponential size will
be created, and a search through such lists requires exponential time in the worst
case. Various proposals have been made to cut down on this number while preserv-
ing correctness. One proposal is that agents keep only Nogoods consistent with
their agent_view. While this prunes some of the Nogoods, in the worst case it
still leaves a number of Nogoods that is exponential in the size of the agent_view.
A further improvement is to store only Nogoods that are consistent with both the
agent’s agent_view and his current assignment. This approach, which is con-
sidered by some the best implementation of the ABT algorithm, ensures that the
number of Nogoods stored by any single agent is no larger than the size of the
domain.
Finally, there are approaches to distributed constraint satisfaction that do not
follow the ABT scheme, including asynchronous forward checking and concurrentasynchronous
forward
checking
dynamic backtracking. Discussion of them is beyond the scope of this book, but
concurrent
dynamic
backtracking
the references point to further reading on the topic.
Free for on-screen use; please do not distribute. You can get another free copy
of this PDF or order the book at http://www.masfoundations.org.
18 1 Distributed Constraint Satisfaction
1.4 History and references
Distributed constraint satisfaction is discussed in detail in Yokoo [2001], andre-
viewed in Yokoo and Hirayama [2000]. The ABT algorithm was initially intro-
duced in Yokoo [1994]. More comprehensive treatments, including the latest in-
sights into distributed CSPs, appear in Meisels [2008] and Faltings [2006]. The
sensor net figure is due to Carlos Guestrin.
Uncorrected manuscript of Multiagent Systems, published by Cambridge University Press
Revision 1.1 © Shoham & Leyton-Brown, 2009, 2010.
2 Distributed Optimization
In the previous chapter we looked at distributed ways of meeting global constraints.
Here we up the ante; we ask how agents can, in a distributed fashion, optimize a
global objective function. Specifically, we consider four families of techniques and
associated sample problems. They are, in order:
• Distributed dynamic programming (as applied to path-planning problems).
• Distributed solutions to Markov Decision Problems (MDPs).
• Optimization algorithms with an economic flavor (as applied to matching and
scheduling problems).
• Coordination via social laws and conventions, and the example of traffic rules.
2.1 Distributed dynamic programming for path planning
Like graph coloring, path planning constitutes another common abstract problem-
solving framework. A path-planning problem consists of a weighted directed graph
with a set of n nodesN , directed links L, a weight functionw : L 7→ R+, and two
nodes s, t ∈ N . The goal is to find a directed path from s to t having minimal total
weight. More generally, we consider a set of goal nodes T ⊂ N , and are interested
in the shortest path from s to any of the goal nodes t ∈ T .
This abstract framework applies in many domains. Certainly it applies when
there is some concrete network at hand (e.g., a transportation or telecommunication
network). But it also applies in more roundabout ways. For example, in a planning
problem the nodes can be states of the world, the arcs actions available to the agent,
and the weights the cost (or, alternatively, time) of each action.
2.1.1 Asynchronous dynamic programming
Path planning is a well-studied problem in computer science and operations re-
search. We are interested in distributed solutions, in which each node performs a
local computation, with access only to the state of its neighbors. Underlying our
20 2 Distributed Optimization
solutions will be the principle of optimality: if node x lies on a shortest path from sprinciple of
optimality to t, then the portion of the path from s to x (or, respectively, from x to t) must also
be the shortest paths between s and x (resp., x and t). This allows an incremental
divide-and-conquer procedure, also known as dynamic programming.dynamic
programming Let us represent the shortest distance from any node i to the goal t as h∗(i).
Thus the shortest distance from i to t via a node j neighboring i is given by
f ∗(i, j) = w(i, j) + h∗(j), and h∗(i) = minj f ∗(i, j). Based on these facts,
the ASYNCHDP algorithm has each node repeatedly perform the following proce-asynchronous
dynamic
programming
dure. In this procedure, given in Figure 2.1, each node i maintains a variable h(i),
which is an estimate of h∗(i).
procedure ASYNCHDP (node i)
if i is a goal node then
h(i)← 0
else
initialize h(i) arbitrarily (e.g., to ∞ or 0)
repeat
forall neighbors j do
f(j)← w(i, j) + h(j)
h(i)← minj f(j)
Figure 2.1: The asynchronous dynamic programming algorithm.
Figure 2.2 shows this algorithm in action. The h values are initialized to∞, and
incrementally decrease to their correct values. The figure shows three iterations;
note that after the first iteration, not all finite h values are correct; in particular, the
value 3 in node d still overestimates the true distance, which is corrected in the
next iteration.
One can prove that the ASYNCHDP procedure is guaranteed to converge to the
true values, that is, h will converge to h∗. Specifically, convergence will require
one step for each node in the shortest path, meaning that in the worst case con-
vergence will require n iterations. However, for realistic problems this is of little
comfort. Not only can convergence be slow, but this procedure assumes a process
(or agent) for each node. In typical search spaces one cannot effectively enumer-
ate all nodes, let alone allocate them each a process. (For example, chess has
approximately 10120 board positions, whereas there are fewer than 1081 atoms in
the universe and there have only been 1026 nanoseconds since the Big Bang.) So to
be practical we turn to heuristic versions of the procedure, which require a smaller
number of agents. Let us start by considering the opposite extreme in which we
have only one agent.
2.1.2 Learning real-time A∗
In the learning real-time A∗, or LRTA∗, algorithm, the agent starts at a given node,learning
real-time A∗
(LRTA∗) Uncorrected manuscript of Multiagent Systems, published by Cambridge University Press
Revision 1.1 © Shoham & Leyton-Brown, 2009, 2010.
2.1 Distributed dynamic programming for path planning 21
�
���
@
@@R ?
-
�
�
�
�
�
���
-
@
@@R
6
�
���i
i
i
i
i i
b d
a c
1
3
1
2
2
2 1
1
3
s t
∞ ∞
∞ ∞
∞ 0
(i)
�
���
@
@@R ?
-
�
�
�
�
�
���
-
@
@@R
6
�
���i
i
i
i
i i
b d
a c
1
3
1
2
2
2 1
1
3
s t
∞ 3
∞ 1
∞ 0
(ii)
�
���
@
@@R ?
-
�
�
�
�
�
���
-
@
@@R
6
�
���i
i
i
i
i i
b d
a c
1
3
1
2
2
2 1
1
3
s t
3 2
3 1
∞ 0
(iii)
Figure 2.2: Asynchronous dynamic programming in action
performs an operation similar to that of asynchronous dynamic programming, and
then moves to the neighboring node with the shortest estimated distance to the goal,
and repeats. The procedure is given in Figure 2.3.
procedure LRTA∗
i← s // the start node
while i is not a goal node do
foreach neighbor j do
f(j)← w(i, j) + h(j)
i′ ← argminj f(j) // breaking ties at random
h(i)← max(h(i), f(i′))
i← i′
Figure 2.3: The learning real-time A∗ algorithm.
As earlier, we assume that the set of nodes is finite and that all weights w(i, j)
are positive and finite. Note that this procedure uses a given heuristic function
h(·) that serves as the initial value for each newly encountered node. For our
purposes it is not important what the precise function is. However, to guarantee
certain properties of LRTA∗, we must assume that h is admissible. This meansadmissible
heuristic that h never overestimates the distance to the goal, that is, h(i) ≤ h∗(i). Because
Free for on-screen use; please do not distribute. You can get another free copy
of this PDF or order the book at http://www.masfoundations.org.
22 2 Distributed Optimization
weights are nonnegative we can ensure admissibility by setting h(i) = 0 for all i,
although less conservative admissible heuristic functions (built using knowledge of
the problem domain) can speed up the convergence to the optimal solution. Finally,
we must assume that there exists some path from every node in the graph to a goal
node. With these assumptions, LRTA∗ has the following properties:
• The h-values never decrease, and remain admissible.
• LRTA∗ terminates; the complete execution from the start node to termination at
the goal node is called a trial.
• If LRTA∗ is repeated while maintaining the h-values from one trial to the next,
it eventually discovers the shortest path from the start to a goal node.
• If LRTA∗ find the same path on two sequential trials, this is the shortest path.
(However, this path may also be found in one or more previous trials before it
is found twice in a row. Do you see why?)
Figure 2.4 shows four trials of LRTA∗. Do you see why admissibility of the
heuristic is necessary?
LRTA∗ is a centralized procedure. However, we note that rather than have a
single

Outros materiais