756 pág.

# Distributed Computing (2008)

DisciplinaAlgorítimos Distribuídos28 materiais272 seguidores
Pré-visualização50 páginas
```and broadcasts a request to take snapshot. When some other process receives
this request, it takes a snapshot. Channels are not FIFO.
Prove that such a collected distributed snapshot will be consistent iff the following
holds (assume there are n processes in the system and Vti denotes the vector timestamp
of the snapshot taken process pi):
\ufffdVt1\ufffd1\ufffd\ufffdVt2\ufffd2\ufffd\ufffd \ufffd \ufffd \ufffd \ufffd \ufffd Vtn\ufffdn\ufffd\ufffd=max\ufffdVt1\ufffdVt2\ufffd \ufffd \ufffd \ufffd \ufffd \ufffd Vtn\ufffd\ufffd
Exercise 4.2 What good is a distributed snapshot when the system was never in
the state represented by the distributed snapshot? Give an application of distributed
snapshots.
Exercise 4.3 Consider a distributed system where every node has its physical clock
and all physical clocks are perfectly synchronized. Give an algorithm to record global
state assuming the communication network is reliable. (Note that your algorithm
should be simpler than the Chandy\u2013Lamport algorithm.)
Exercise 4.4 What modifications should be done to the Chandy\u2013Lamport snapshot
algorithm so that it records a strongly consistent snapshot (i.e., all channel states are
recorded empty).
Exercise 4.5 Consider two consistent cuts whose events are denoted by C1 =
C1\ufffd1\ufffd\ufffdC1\ufffd2\ufffd\ufffd

\ufffdC1\ufffdn\ufffd and C2 = C2\ufffd1\ufffd\ufffdC2\ufffd2\ufffd\ufffd

\ufffdC2\ufffdn\ufffd, respectively.
Define a third cut, C3 = C3\ufffd1\ufffd\ufffdC3\ufffd2\ufffd\ufffd

\ufffdC3\ufffdn\ufffd, which is the maximum of C1
and C2; that is, for every k, C3\ufffdk\ufffd= later of C1(k) and C2\ufffdk\ufffd.
Define a fourth cut, C4 = C4\ufffd1\ufffd\ufffdC4\ufffd2\ufffd\ufffd

\ufffdC4\ufffdn\ufffd, which is the minimum of C1
and C2; that is, for every k, C4\ufffdk\ufffd= earlierof C1(k) and C2\ufffdk\ufffd.
Prove that C3 and C4 are also consistent cuts.
4.12 Notes on references
The notion of a global state in a distributed system was formalized by Chandy and
Lamport [7] who also proposed the first algorithm (CL) for recording the global state,
and first studied the various properties of the recorded global state. The space\u2013time
diagram, which is a very useful graphical tool to visualize distributed executions, was
introduced by Lamport [19]. A detailed survey of snapshot recording algorithms is
given by Kshemkalyani et al. [16].
Spezialetti and Kearns proposed a variant of the CL algorithm to optimize con-
current initiations by different processes, and to efficiently distribute the recorded
snapshot [29]. Venkatesan proposed a variant that handles repeated snapshots effi-
ciently [32]. Helary proposed a variant of the CL algorithm to incorporate message
waves in the algorithm [12]. Helary\u2019s algorithm is adaptable to a system with non-
FIFO channels but requires inhibition [31]. Besides Helary\u2019s algorithm [12], the
123 References
algorithms proposed by Lai and Yang [18], Li et al. [20], and by Mattern [23] can
all record snapshots in systems with non-FIFO channels. If the underlying network
can provide causal order of message delivery [5], then the algorithms by Acharya
and Badrinath [1] and by Alagar and Venkatesan [2] can record the global state using
O\ufffdn\ufffd number of messages.
The notion of simultaneous regions for monitoring global state was proposed by
Spezialetti and Kearns [30]. The necessary and sufficient conditions for consistent
global snapshots were formulated by Netzer and Xu [25] based on the zigzag paths.
These have particular application in checkpointing and recovery. Manivannan et al.
analyzed the set of all consistent snaspshots that can be built from a given set of
checkpoints [21]. They also proposed an algorithm to enumerate all such consistent
snapshots. The definition of the R-graph and other notations and framework used
by [21] were proposed by Wang [33, 34].
Recording the global state of a distributed system finds applications at several
places in distributed systems. For applications in detection of stable properties such as
deadlocks, see [17] and for termination, see [22]. For failure recovery, a global state
of the distributed system is periodically saved and recovery from a processor failure
is done by restoring the system to the last saved global state [15]. For debugging
distributed software, the system is restored to a consistent global state [8, 9] and the
execution resumes from there in a controlled manner. A snapshot recording method
has been used in the distributed debugging facility of Estelle [11, 13], a distributed
programming environment. Other applications include monitoring distributed events
[30], setting distributed breakpoints [24], protocol specification and verification
[4, 10, 14], and discarding obsolete information [11].
We will study snapshot algorithms for shared memory in Chapter 12.
References
[1] A. Acharya and B. R. Badrinath, Recording distributed snapshots based on
causal order of message delivery, Information Processing Letters, 44, 1992,
317\u2013321.
[2] S. Alagar, and S. Venkatesan, An optimal algorithm for distributed snap-
shots with causal message ordering, Information Processing Letters, 50, 1994,
311\u2013316.
[3] O. Babaoglu and K. Marzullo, Consistent global states of distributed systems:
fundamental concepts and mechanisms, in Mullender, S.J. (ed.) Distributed
Systems, ACM Press 1993.
[4] O. Babaoglu and M. Raynal, Specification and verification of dynamic proper-
ties in distributed computations, Journal of Parallel and Distributed Systems,
28(2), 1995, 173\u2013185.
[5] K. Birman and T. Joseph, Reliable communication in presence of failures,
ACM Transactions on Computer Systems, 3, 1987, 47\u201376.
[6] K. Birman, A. Schiper, and P. Stephenson, Lightweight causal and atomic group
multicast, ACM Transactions on Computer Systems, 9(3), 1991, 272\u2013314.
[7] K.M. Chandy and L. Lamport, Distributed snapshots: determining global states
of distributed systems, ACM Transactions on Computer Systems, 3(1), 1985,
63\u201375.
[8] R. Cooper and K. Marzullo, Consistent detection of global predicates, Pro-
ceedings of the ACM/ONR Workshop on Parallel and Distributed Debugging,
May 1991, 163\u2013173.
124 Global state and snapshot recording algorithms
[9] E. Fromentin, N. Plouzeau, and M. Raynal, An introduction to the analysis and
debug of distributed computations, Proceedings of the 1st IEEE International
Conference on Algorithms and Architectures for Parallel Processing, Brisbane,
Australia, April 1995, 545\u2013554.
[10] K. Geihs and M. Seifert, Automated validation of a cooperation protocol for
distributed systems, Proceedings of the 6th International Conference on Dis-
tributed Computing Systems, 1986, 436\u2013443.
[11] O. Gerstel, M. Hurfin, N. Plouzeau, M. Raynal, and S. Zaks, On-the-fly replay:
a practical paradigm and its implementation for distributed debugging, Pro-
ceedings of the 6th IEEE International Symposium on Parallel and Distributed
Debugging, Dallas, TX, October 1995, 266\u2013272.
[12] J.-M. Helary, Observing global states of asynchronous distributed applications,
Proceedings of the 3rd International Workshop on Distributed Algorithms,
LNCS 392 1989, 124\u2013134.
[13] M. Hurfin, N. Plouzeau and M. Raynal, A debugging tool for distribted Estelle
programs, Journal of Computer Communications, 16(5), 1993, 328\u2013333.
[14] J. Kamal and M. Singhal, Specification and Verification of Distributed Mutual
Exclusion Algorithms, Technical Report, Department of Computer and Infor-
mation Science, The Ohio State University, Columbus, OH, 1992.
[15] R. Koo and S. Toueg, Checkpointing and rollback-recovery in distributed
systems, IEEE Transactions on Software Engineering, January, 1987, 23\u201331.
[16] A. Kshemkalyani, M. Raynal, and M. Singhal, \u2018Global snapshots of a
distributed system\u2019, Distributed Systems Engineering Journal, 2(4), 1995,
224\u2013233.
[17] A. Kshemkalyani and M. Singhal, Efficient detection and resolution of gen-
eralized distributed deadlocks, IEEE Transactions on Software Engineering,
20(1), 1994, 43\u201354.
[18] T. H. Lai and T.H. Yang, On distributed snapshots, Information Processing
Letters, 25, 1987, 153\u2013158.
[19] L. Lamport, Time, clocks, and the ordering of```