Distributed Computing (2008)
756 pág.

Distributed Computing (2008)


DisciplinaAlgorítimos Distribuídos32 materiais283 seguidores
Pré-visualização50 páginas
after it learns that the receiver process has accepted the message. Thus,
the sender and the receiver processes must synchronize to exchange a message.
On the other hand, asynchronous communication model is a non-blocking
type where the sender and the receiver do not synchronize to exchange a
message. After having sent a message, the sender process does not wait for
the message to be delivered to the receiver process. The message is bufferred
by the system and is delivered to the receiver process when it is ready to
accept the message. A buffer overflow may occur if a process sends a large
number of messages in a burst to another process.
Neither of the communication models is superior to the other. Asynchronous
communication provides higher parallelism because the sender process can
execute while the message is in transit to the receiver. However, an implemen-
tation of asynchronous communication requires more complex buffer manage-
ment. In addition, due to higher degree of parallelism and non-determinism, it
is much more difficult to design, verify, and implement distributed algorithms
for asynchronous communications. The state space of such algorithms are
likely to be much larger. Synchronous communication is simpler to handle
48 A model of distributed computations
and implement. However, due to frequent blocking, it is likely to have poor
performance and is likely to be more prone to deadlocks.
2.8 Chapter summary
In a distributed system, a set of processes communicate by exchanging mes-
sages over a communication network. A distributed computation is spread
over geographically distributed processes. The processes do not share a com-
mon global memory or a physical global clock, to which processes have
instantaneous access.
The execution of a process consists of a sequential execution of its actions
(e.g., internal events, message send events, and message receive events.) The
events at a process are linearly ordered by their order of occurrence. Mes-
sage exchanges between processes signify the flow of information between
processes and establish causal dependencies between processes. The causal
precedence relation between processes is captured by Lamport\u2019s \u201chappens
before\u201d relation.
The global state of a distributed system is a collection of the states of its
processes and the state of communication channels connecting the processes.
A cut in a distributed computation is a zigzag line joining one arbitrary
point on each process line. A cut represents a global state in the distributed
computation. The past of an event consists of all events that causally
affect it and the future of an event consists of all events that are causally
affected by it.
2.9 Exercises
Exercise 2.1 Prove that in a distributed computation, for an event, the surface of the
past cone (i.e., all the events on the surface) form a consistent cut. Does it mean that
all events on the surface of the past cone are always concurrent? Give an example to
make your case.
Exercise 2.2 Show that all events on the surface of the past cone of an event are
message send events. Likewise, show that all events on the surface of the future cone
of an event are message receive events.
2.10 Notes on references
Lamport in his landmark paper [4] defined the \u201chappens before\u201d relation between
events in a distributed systems to capture causality. Other papers on the topic include
those by Mattern [6] and by Panengaden and Taylor [7].
49 References
References
[1] K. Birman and T. Joseph, Reliable communication in presence of failures, ACM
Transactions on Computer Systems, 3, 1987, 47\u201376.
[2] K.M. Chandy and L. Lamport, Distributed snapshots: determining global states of
distributed systems, ACM Transactions on Computer Systems, 3(1), 1985, 63\u201375.
[3] A. Kshemkalyani, M. Raynal and M. Singhal, Global snapshots of a distributed
system, Distributed Systems Engineering Journal, 2(4), 1995, 224\u2013233.
[4] L. Lamport, Time, clocks and the ordering of events in a distributed system,
Communications of the ACM, 21, 1978, 558\u2013564.
[5] A. Lynch, Distributed processing solves main-frame problems, Data Communi-
cations, 1976, 17\u201322.
[6] F. Mattern, Virtual time and global states of distributed systems, Proceedings of
the Parallel and Distributed Algorithms Conference, 1988, 215\u2013226.
[7] P. Panengaden and K. Taylor, Concurrent common knowledge: a new definition
of agreement for asynchronous events, Proceedings of the 5th Symposium on
Principles of Distributed Computing, 1988, 197\u2013209.
[8] S.M. Shatz, Communication mechanisms for programming distributed systems,
IEEE Computer, 1984, 21\u201328.
C H A P T E R
3 Logical time
3.1 Introduction
The concept of causality between events is fundamental to the design and
analysis of parallel and distributed computing and operating systems. Usu-
ally causality is tracked using physical time. However, in distributed sys-
tems, it is not possible to have global physical time; it is possible to realize
only an approximation of it. As asynchronous distributed computations make
progress in spurts, it turns out that the logical time, which advances in
jumps, is sufficient to capture the fundamental monotonicity property associ-
ated with causality in distributed systems. This chapter discusses three ways
to implement logical time (e.g., scalar time, vector time, and matrix time)
that have been proposed to capture causality between events of a distributed
computation.
Causality (or the causal precedence relation) among events in a distributed
system is a powerful concept in reasoning, analyzing, and drawing infer-
ences about a computation. The knowledge of the causal precedence
relation among the events of processes helps solve a variety of prob-
lems in distributed systems. Examples of some of these problems is as
follows:
\u2022 Distributed algorithms design The knowledge of the causal precedence
relation among events helps ensure liveness and fairness in mutual exclu-
sion algorithms, helps maintain consistency in replicated databases, and
helps design correct deadlock detection algorithms to avoid phantom and
undetected deadlocks.
\u2022 Tracking of dependent events In distributed debugging, the knowledge
of the causal dependency among events helps construct a consistent state
for resuming reexecution; in failure recovery, it helps build a checkpoint;
in replicated databases, it aids in the detection of file inconsistencies in
case of a network partitioning.
50
51 3.1 Introduction
\u2022 Knowledge about the progress The knowledge of the causal depen-
dency among events helps measure the progress of processes in the
distributed computation. This is useful in discarding obsolete information,
garbage collection, and termination detection.
\u2022 Concurrency measure The knowledge of how many events are causally
dependent is useful in measuring the amount of concurrency in a computa-
tion. All events that are not causally related can be executed concurrently.
Thus, an analysis of the causality in a computation gives an idea of the
concurrency in the program.
The concept of causality is widely used by human beings, often uncon-
sciously, in the planning, scheduling, and execution of a chore or an enterprise,
or in determining the infeasibility of a plan or the innocence of an accused.
In day-to-day life, the global time to deduce causality relation is obtained
from loosely synchronized clocks (i.e., wrist watches, wall clocks). However,
in distributed computing systems, the rate of occurrence of events is sev-
eral magnitudes higher and the event execution time is several magnitudes
smaller. Consequently, if the physical clocks are not precisely synchronized,
the causality relation between events may not be accurately captured. Net-
work Time Protocols [15], which can maintain time accurate to a few tens
of milliseconds on the Internet, are not adequate