Pré-visualização50 páginas

and broadcasts a request to take snapshot. When some other process receives this request, it takes a snapshot. Channels are not FIFO. Prove that such a collected distributed snapshot will be consistent iff the following holds (assume there are n processes in the system and Vti denotes the vector timestamp of the snapshot taken process pi): \ufffdVt1\ufffd1\ufffd\ufffdVt2\ufffd2\ufffd\ufffd \ufffd \ufffd \ufffd \ufffd \ufffd Vtn\ufffdn\ufffd\ufffd=max\ufffdVt1\ufffdVt2\ufffd \ufffd \ufffd \ufffd \ufffd \ufffd Vtn\ufffd\ufffd Don\u2019t worry about channel states. Exercise 4.2 What good is a distributed snapshot when the system was never in the state represented by the distributed snapshot? Give an application of distributed snapshots. Exercise 4.3 Consider a distributed system where every node has its physical clock and all physical clocks are perfectly synchronized. Give an algorithm to record global state assuming the communication network is reliable. (Note that your algorithm should be simpler than the Chandy\u2013Lamport algorithm.) Exercise 4.4 What modifications should be done to the Chandy\u2013Lamport snapshot algorithm so that it records a strongly consistent snapshot (i.e., all channel states are recorded empty). Exercise 4.5 Consider two consistent cuts whose events are denoted by C1 = C1\ufffd1\ufffd\ufffdC1\ufffd2\ufffd\ufffd \ufffdC1\ufffdn\ufffd and C2 = C2\ufffd1\ufffd\ufffdC2\ufffd2\ufffd\ufffd \ufffdC2\ufffdn\ufffd, respectively. Define a third cut, C3 = C3\ufffd1\ufffd\ufffdC3\ufffd2\ufffd\ufffd \ufffdC3\ufffdn\ufffd, which is the maximum of C1 and C2; that is, for every k, C3\ufffdk\ufffd= later of C1(k) and C2\ufffdk\ufffd. Define a fourth cut, C4 = C4\ufffd1\ufffd\ufffdC4\ufffd2\ufffd\ufffd \ufffdC4\ufffdn\ufffd, which is the minimum of C1 and C2; that is, for every k, C4\ufffdk\ufffd= earlierof C1(k) and C2\ufffdk\ufffd. Prove that C3 and C4 are also consistent cuts. 4.12 Notes on references The notion of a global state in a distributed system was formalized by Chandy and Lamport [7] who also proposed the first algorithm (CL) for recording the global state, and first studied the various properties of the recorded global state. The space\u2013time diagram, which is a very useful graphical tool to visualize distributed executions, was introduced by Lamport [19]. A detailed survey of snapshot recording algorithms is given by Kshemkalyani et al. [16]. Spezialetti and Kearns proposed a variant of the CL algorithm to optimize con- current initiations by different processes, and to efficiently distribute the recorded snapshot [29]. Venkatesan proposed a variant that handles repeated snapshots effi- ciently [32]. Helary proposed a variant of the CL algorithm to incorporate message waves in the algorithm [12]. Helary\u2019s algorithm is adaptable to a system with non- FIFO channels but requires inhibition [31]. Besides Helary\u2019s algorithm [12], the 123 References algorithms proposed by Lai and Yang [18], Li et al. [20], and by Mattern [23] can all record snapshots in systems with non-FIFO channels. If the underlying network can provide causal order of message delivery [5], then the algorithms by Acharya and Badrinath [1] and by Alagar and Venkatesan [2] can record the global state using O\ufffdn\ufffd number of messages. The notion of simultaneous regions for monitoring global state was proposed by Spezialetti and Kearns [30]. The necessary and sufficient conditions for consistent global snapshots were formulated by Netzer and Xu [25] based on the zigzag paths. These have particular application in checkpointing and recovery. Manivannan et al. analyzed the set of all consistent snaspshots that can be built from a given set of checkpoints [21]. They also proposed an algorithm to enumerate all such consistent snapshots. The definition of the R-graph and other notations and framework used by [21] were proposed by Wang [33, 34]. Recording the global state of a distributed system finds applications at several places in distributed systems. For applications in detection of stable properties such as deadlocks, see [17] and for termination, see [22]. For failure recovery, a global state of the distributed system is periodically saved and recovery from a processor failure is done by restoring the system to the last saved global state [15]. For debugging distributed software, the system is restored to a consistent global state [8, 9] and the execution resumes from there in a controlled manner. A snapshot recording method has been used in the distributed debugging facility of Estelle [11, 13], a distributed programming environment. Other applications include monitoring distributed events [30], setting distributed breakpoints [24], protocol specification and verification [4, 10, 14], and discarding obsolete information [11]. We will study snapshot algorithms for shared memory in Chapter 12. References [1] A. Acharya and B. R. Badrinath, Recording distributed snapshots based on causal order of message delivery, Information Processing Letters, 44, 1992, 317\u2013321. [2] S. Alagar, and S. Venkatesan, An optimal algorithm for distributed snap- shots with causal message ordering, Information Processing Letters, 50, 1994, 311\u2013316. [3] O. Babaoglu and K. Marzullo, Consistent global states of distributed systems: fundamental concepts and mechanisms, in Mullender, S.J. (ed.) Distributed Systems, ACM Press 1993. [4] O. Babaoglu and M. Raynal, Specification and verification of dynamic proper- ties in distributed computations, Journal of Parallel and Distributed Systems, 28(2), 1995, 173\u2013185. [5] K. Birman and T. Joseph, Reliable communication in presence of failures, ACM Transactions on Computer Systems, 3, 1987, 47\u201376. [6] K. Birman, A. Schiper, and P. Stephenson, Lightweight causal and atomic group multicast, ACM Transactions on Computer Systems, 9(3), 1991, 272\u2013314. [7] K.M. Chandy and L. Lamport, Distributed snapshots: determining global states of distributed systems, ACM Transactions on Computer Systems, 3(1), 1985, 63\u201375. [8] R. Cooper and K. Marzullo, Consistent detection of global predicates, Pro- ceedings of the ACM/ONR Workshop on Parallel and Distributed Debugging, May 1991, 163\u2013173. 124 Global state and snapshot recording algorithms [9] E. Fromentin, N. Plouzeau, and M. Raynal, An introduction to the analysis and debug of distributed computations, Proceedings of the 1st IEEE International Conference on Algorithms and Architectures for Parallel Processing, Brisbane, Australia, April 1995, 545\u2013554. [10] K. Geihs and M. Seifert, Automated validation of a cooperation protocol for distributed systems, Proceedings of the 6th International Conference on Dis- tributed Computing Systems, 1986, 436\u2013443. [11] O. Gerstel, M. Hurfin, N. Plouzeau, M. Raynal, and S. Zaks, On-the-fly replay: a practical paradigm and its implementation for distributed debugging, Pro- ceedings of the 6th IEEE International Symposium on Parallel and Distributed Debugging, Dallas, TX, October 1995, 266\u2013272. [12] J.-M. Helary, Observing global states of asynchronous distributed applications, Proceedings of the 3rd International Workshop on Distributed Algorithms, LNCS 392 1989, 124\u2013134. [13] M. Hurfin, N. Plouzeau and M. Raynal, A debugging tool for distribted Estelle programs, Journal of Computer Communications, 16(5), 1993, 328\u2013333. [14] J. Kamal and M. Singhal, Specification and Verification of Distributed Mutual Exclusion Algorithms, Technical Report, Department of Computer and Infor- mation Science, The Ohio State University, Columbus, OH, 1992. [15] R. Koo and S. Toueg, Checkpointing and rollback-recovery in distributed systems, IEEE Transactions on Software Engineering, January, 1987, 23\u201331. [16] A. Kshemkalyani, M. Raynal, and M. Singhal, \u2018Global snapshots of a distributed system\u2019, Distributed Systems Engineering Journal, 2(4), 1995, 224\u2013233. [17] A. Kshemkalyani and M. Singhal, Efficient detection and resolution of gen- eralized distributed deadlocks, IEEE Transactions on Software Engineering, 20(1), 1994, 43\u201354. [18] T. H. Lai and T.H. Yang, On distributed snapshots, Information Processing Letters, 25, 1987, 153\u2013158. [19] L. Lamport, Time, clocks, and the ordering of