EuroSys15——Putting-consistency-back-into-eventual-consistency

Step 1

题目摘要引言

Title

回到最终一致性

Abstract

For geo-replicated storage systems

consistency model -> Explicit Consistency (strengthens eventual consistency with a guarantee to preserve specific invariants defined by the application)

application-specific invariants

System supports Explicit Consistency identifies which operations would be unsafe under concurrent execution.

programs need to deal with different situations and select proper stategies.

Either violation-avoidance or invariant-repair techniques.

  • violation-avoidance: coordination off the critical path of operation execution by relying on a reservation system
  • invariant-repair: execute without restrictions, restore the invariants by applying a repair operation to the database set.

Introduction

  • Background: As for geo-replication, application's infrastructure need global supports. For low-latency
  • Trade-off: between low-latency and costly coordination(cross-datacenter synchronization)
  • Consistency models -> not strong consistency but weak consistency such as eventual consistency and causal consistency

  • Promising alternative: Combine the strengths of both approaches by supporting both weak and strong consistency, depending on the operation.
    • operations need strong consistency still incur high latency
    • hard for design

not defined in terms of execution orders but in terms of application properties

Contributions and Conclusions

  • Explicit Consistency(model to obey): specify the consistency rules that the system must maintain as a set of variants
  • A methodology to derive an efficient reservation system for enforcing Explicit Consistency(static analysis+ alternative solutions)
  • Indigo (a middleware system implementation): a middleware that can enforce Explicit Consistency on top of a causally consistent store.

Result: modified applications have performance similar to weak consistency for most operations. Rare operations that require intricate rights transfers exhibit high latency.

Future work for the high latency situations mentioned above.

基本理论概况

提出 Explicit Consistency,接近弱一致性的表现,同时配合一套中间件系统讲原有应用转化至符合 Explicit Consistency 的应用。

回答基本问题

  1. 类别 对原型系统进行描述
  1. 内容

    与之相关的文章:EPaxos PoR

  2. 正确性

    存疑,基于程序员给定的 specification

  1. 创新点

  1. 清晰度

    清晰

阅读选择

继续阅读

Step 2

细读笔记

问题记录

未读(且值得读)文献记录

Explicit Consistency

System Model and Definitions

Cloud deployment database, fully replicated in multiple datacenters and partitioned inside each datacenter.

Transaction operation

Step 3

思路复现

证明与推理复现

实验验证复现

Original Presentation(EuroSys '15)

Motivation and BG

  • Services operate on a global scale.
  • An unprecedented number of people are using internet services

WE WANT global application and low latency

BUT geo-replication an its sync is costly

With consistency we can easily get into trouble when sync, for example Mariao is enrolled in tournament that was concurrently removed.

Example: Tounament remove question

Typical solution: strong consistency

EC_01
EC_01

由于人类是了解语义的,所以在观察时序图时我们会发现其中一部分 sync 是不必要的。

Sync when needed and ahieve low latency in normal operations

EC_02
EC_02
EC_03
EC_03

Explicit Consistency

思路

  • Programmer specifies application invariant. 程序员指定程序的正确性(以不变量的形式)
  • System ensures that every state transition preserves the invariant. 系统仅允许状态在保证不变量的前提下进行状态转换
  • Opportunity to improve performance by not restricting execution ordering. 相较于其他一致性模型一个较为明显的区别,是不针对操作的执行序制定 rules,这样允许编程人员可以适时添加或删除 coordination.

实现

the three-step methodlogy to force explicit consistency

  • Identify I-offenders:
    • Static analysis to go through the invariance of the application specifications to identify that may break invariants when executed concurrently.
  • Choose reservations(reservation system?)
    • Efficient mechanism to execute I-offenders avoiding coordination.
  • Instrument application(have the benifit of finding a coordination in runtime) more in paper

Static analysis

Application model

From programmer specifies:

invariant and

  • invariant

    • "players can only participate in existing tounaments"
    • so \(Inv = enrolled(p,t) => player(p) \land tounament(t)\)
  • operations' side effects
    • enroll(p,5): \(\{ ~ enrolled(p,t) := true ~ \}\)
    • removetounament(t): \(\{ ~ tounament(t) := false ~ \}\)

Testing algorithm

The sub image find a pair of operations that when executed concurrently will break the coherence of the application -> one I-offender set

EC_04
EC_04

Programers to do:

  • specify the invariants and the post conditions of the operations (so maybe a burden?)
    • programmers provide simple annotations
    • static analysis detects conlicting operations
    • low-latency operations with reservations

Choose reservations

Mechanisms to control I-offenders without breaking invariants

select strategy? 选择合适的策略。

作者列出了一张表格表明不同的 invariant 类型对应的 reservation 方式(比如用 Escrow 机制保证 Numberic 变量)。

take multi-level lock as an example

Reservations: Multilevel lock

还是举两个 geo replications 的例子,条件与之前相同,但是现在,通过数据库来储存“哪些操作可以执行”的信息。

  • 初始情况下,允许任何用户进行 enroll 操作
  • 试图 remove lock 的本地状态不允许,无法进行 removeTournament 操作
  • 请求挂起(Waiting),并且发起和另一数据中心的通信,要求停止 enroll 操作,
  • 系统进行同步,两地的数据中心都进行状态更新
  • 发起请求的数据中心,此时解锁 remove 操作可执行
  • 执行 removeTournament 操作

Indigo

  • Middleware that provides Explicit consistency on top of KV-Stores. 一个可以运行在大多数 KVS 上的中间件
  • Requires only properties that are known to be efficient. 仅需要添加已知高效的性质 (视频没有详细介绍这些性质,典型的要包括 transaction 和 causality)
  • In short, can be extended with new reservations 可以添加新的 Reservations 方案

Evaluation

Answer the question

  • How well does the system scale?
  • What is the latency of operations?

Deployment

  • Data-centers deployed in AWS:
    • 3 Regions (EU, US-EAST/WEST);
    • N app-servers connect to local DBs;
    • Clients submit operations to the app-server in close loop
  • Compare performance
    • Causal Consistency(baseline)
    • String Consistency(provides invariants , writes to single server)
    • Red-Blue Consistency(causal + writes to single server 是前面两者的综合,如果操作可以在不需要 coordination 时执行则转向 causal,否则就是用 strong)
    • Explicit Consistency

关于 Tounament 案例 remove 的长延时,是做了 trade-off,减少了其他情形下的 coordination,但是在 remove 过程,会 revoke 其他操作,因而延迟更高。不过据作者所言,这些可以进一步通过工程上的优化来减少。

Conclusion

Explicit Consistency 的三步走,给出 annotations,静态分析检测出并发执行冲突 operations,利用 reservation 机制,执行这些操作,同时保证不变量。

如此,达成

Q&A

Q1: How simple the notations are? How many manual work do we need?

A1: Global quantifier and exist quantifier like shown in the slides.

No false-positive found. Interesting

Q2: If system consider of the monotonic(单调性)?

A2: total based on the specifications given by the programmers. So no monotonicity knowledge.

Q3: what if one of the centers fail and what will do with the reservations?

A3: We can not use the reservations as the center goes down. But with the help of network replication, each data center still make progress by using the local reservation that it holds(Lock will block if one goes down?).