Distributed Computing (2008)
756 pág.

Distributed Computing (2008)


DisciplinaAlgorítimos Distribuídos32 materiais283 seguidores
Pré-visualização50 páginas
55 3.4 Vector time
of process P1 has smaller scalar timestamp than the third event of process P2.
However, the former did not happen before the latter. The reason that scalar
clocks are not strongly consistent is that the logical local clock and logical
global clock of a process are squashed into one, resulting in the loss causal
dependency information among events at different processes. For example,
in Figure 3.1, when process P2 receives the first message from process P1, it
updates its clock to 3, forgetting that the timestamp of the latest event at P1
on which it depends is 2.
3.4 Vector time
3.4.1 definition
The system of vector clocks was developed independently by Fidge [4],
Mattern [12], and Schmuck [23]. In the system of vector clocks, the time
domain is represented by a set of n-dimensional non-negative integer vectors.
Each process pi maintains a vector vti\ufffd1\ufffd\ufffdn\ufffd, where vti\ufffdi\ufffd is the local logical
clock of pi and describes the logical time progress at process pi. vti\ufffdj\ufffd rep-
resents process pi\u2019s latest knowledge of process pj local time. If vti\ufffdj\ufffd = x,
then process pi knows that local time at process pj has progressed till x. The
entire vector vti constitutes pi\u2019s view of the global logical time and is used
to timestamp events.
Process pi uses the following two rules R1 and R2 to update its clock:
\u2022 R1 Before executing an event, process pi updates its local logical time
as follows:
vti\ufffdi\ufffd \ufffd= vti\ufffdi\ufffd+d \ufffdd > 0\ufffd\ufffd
\u2022 R2 Each message m is piggybacked with the vector clock vt of the sender
process at sending time. On the receipt of such a message (m,vt), process
pi executes the following sequence of actions:
1. update its global logical time as follows:
1\u2264 k\u2264 n \ufffd vti\ufffdk\ufffd \ufffd=max\ufffdvti\ufffdk\ufffd\ufffd vt\ufffdk\ufffd\ufffd\ufffd
2. execute R1;
3. deliver the message m.
The timestamp associated with an event is the value of the vector clock of its
process when the event is executed. Figure 3.2 shows an example of vector
clocks progress with the increment value d = 1. Initially, a vector clock is
\ufffd0\ufffd0\ufffd0\ufffd 
 
 
 \ufffd \ufffd0\ufffd.
56 Logical time
Figure 3.2 Evolution of vector
time [19].
2
0
0
2
3
4
5
3
4
2
3
0
2
0
0
3
0
0
4
3
4
0
1
0
2
2
0
2
3
0
2
4
0
5
6
4
0
0
1
2
3
3
2
3
4
2
3
2
1
0
0
5
3
4
5
5
4
p1
p2
p3
The following relations are defined to compare two vector timestamps, vh
and vk:
vh= vk \u21d4 \u2200x \ufffd vh\ufffdx\ufffd= vk\ufffdx\ufffd
vh\u2264 vk \u21d4 \u2200x \ufffd vh\ufffdx\ufffd\u2264 vk\ufffdx\ufffd
vh < vk \u21d4 vh\u2264 vk and \u2203x \ufffd vh\ufffdx\ufffd < vk\ufffdx\ufffd
vh \ufffd vk \u21d4 ¬\ufffdvh < vk\ufffd\u2227¬\ufffdvk < vh\ufffd\ufffd
3.4.2 Basic properties
Isomorphism
Recall that relation \u201c\u2192\u201d induces a partial order on the set of events that
are produced by a distributed execution. If events in a distributed system are
timestamped using a system of vector clocks, we have the following property.
If two events x and y have timestamps vh and vk, respectively, then
x\u2192 y \u21d4 vh < vk
x \ufffd y \u21d4 vh \ufffd vk\ufffd
Thus, there is an isomorphism between the set of partially ordered events
produced by a distributed computation and their vector timestamps. This is a
very powerful, useful, and interesting property of vector clocks.
If the process at which an event occurred is known, the test to compare
two timestamps can be simplified as follows: if events x and y respectively
occurred at processes pi and pj and are assigned timestamps vh and vk,
respectively, then
x\u2192 y \u21d4 vh\ufffdi\ufffd\u2264 vk\ufffdi\ufffd
x \ufffd y \u21d4 vh\ufffdi\ufffd > vk\ufffdi\ufffd\u2227vh\ufffdj\ufffd < vk\ufffdj\ufffd\ufffd
57 3.4 Vector time
Strong consistency
The system of vector clocks is strongly consistent; thus, by examining the
vector timestamp of two events, we can determine if the events are causally
related. However, Charron\u2013Bost showed that the dimension of vector clocks
cannot be less than n, the total number of processes in the distributed com-
putation, for this property to hold [2].
Event counting
If d is always 1 in rule R1, then the ith component of vector clock at process
pi, vti\ufffdi\ufffd, denotes the number of events that have occurred at pi until that
instant. So, if an event e has timestamp vh, vh\ufffdj\ufffd denotes the number of
events executed by process pj that causally precede e. Clearly,
\u2211
vh\ufffdj\ufffd\u2212 1
represents the total number of events that causally precede e in the distributed
computation.
Applications
Since vector time tracks causal dependencies exactly, it finds a wide variety
of applications. For example, they are used in distributed debugging, imple-
mentations of causal ordering communication and causal distributed shared
memory, establishment of global breakpoints, and in determining the consis-
tency of checkpoints in optimistic recovery.
A brief historical perspective of vector clocks
Although the theory associated with vector clocks was first developed in
1988 independently by Fidge and Mattern, vector clocks were informally
introduced and used by several researchers before this. Parker et al. [17] used
a rudimentary vector clocks system to detect inconsistencies of replicated
files due to network partitioning. Liskov and Ladin [11] proposed a vector
clock system to define highly available distributed services. Similar system
of clocks was used by Strom and Yemini [26] to keep track of the causal
dependencies between events in their optimistic recovery algorithm and by
Raynal to prevent drift between logical clocks [18]. Singhal [24] used vector
clocks coupled with a boolean vector to determine the currency of a critical
section execution request by detecting the cusality relation between a critical
section request and its execution.
3.4.3 On the size of vector clocks
An important question to ask is whether vector clocks of size n are necessary
in a computation consisting of n processes. To answer this, we examine the
usage of vector clocks.
\u2022 A vector clock provides the latest known local time at each other process.
If this information in the clock is to be used to explicitly track the progress
at every other process, then a vector clock of size n is necessary.
58 Logical time
\u2022 A popular use of vector clocks is to determine the causality between a
pair of events. Given any events e and f , the test for e \u227a f if and only if
T\ufffde\ufffd < T\ufffdf\ufffd, which requires a comparison of the vector clocks of e and f .
Although it appears that the clock of size n is necessary, that is not quite
accurate. It can be shown that a size equal to the dimension of the partial
order \ufffdE\ufffd\u227a\ufffd is necessary, where the upper bound on this dimension is n.
This is explained below.
To understand this result on the size of clocks for determining causal-
ity between a pair of events, we first introduce some definitions. A linear
extension of a partial order \ufffdE\ufffd\u227a\ufffd is a linear ordering of E that is consistent
with the partial order, i.e., if two events are ordered in the partial order, they
are also ordered in the linear order. A linear extension can be viewed as
projecting all the events from the different processes on a single time axis.
However, the linear order will necessarily introduce ordering between each
pair of events, and some of these orderings are not in the partial order. Also
observe that different linear extensions are possible in general. Let \ufffd denote
the set of tuples in the partial order defined by the causality relation; so there
is a tuple \ufffde\ufffd f\ufffd in \ufffd for each pair of events e and f such that e\u227a f . Let \ufffd1,
\ufffd2 
 
 
 denote the sets of tuples in different linear extensions of this partial
order. The set \ufffd is contained in the set obtained by taking the intersection
of any such collection of linear extensions \ufffd1, \ufffd2 
 
 
 . This is because each
\ufffdi must contain all the tuples, i.e., causality dependencies, that are in \ufffd .
The dimension of a partial order is the minimum number of linear extensions
whose intersection gives exactly the partial order.
Consider a client\u2013server interaction between a pair of processes. Queries
to the server and responses to the client occur in strict alternating