分布式架构学习PaperList

Paper List

关注一下SIGOPS Hall of Frame

Consistency Model and Consistency Protocols

State Machine Replication meet Logs

从复制状态机的 replicated command logs 到 shared logs

Slowdown Cascades

I Can't Believe It's Not Causal! Scalable Causal Consistency with No Slowdown Cascades

When systems scale to sufficient size, failures become an inevitable and regular part of their operation. Performance anomalies are typical and can be viewed as a kind of partial failure. In a partitioned system, a failure within a partition will inevitably affect the performance of that partition. A slowdown cascade occurs when the failure spills over to affect other partitions.

简单的表述

The slowdown of a single node can prevent the whole system from making progress.

后面阐述工业界联系 slowdown cascade 不愿意启用强一致性系统的原因。

Industry has long identified the spectre of slowdown cascades as one of the leading reasons behind its reluctance to build strongly consistent systems, pointing out how the slowdown of a single shard, compounded by query amplification (e.g., a single user request in Facebook can generate thousands of, possibly dependent, internal queries to many services), can quickly cascade to affect the entire system.

Observable Consistency

交由 client 在本地维护 metadata 来编码已经观察到的更新,然后根据此 metadata,收到读请求时需要比对版本来决定读操作是否是安全的。

I Can't Believe It's Not Causal! Scalable Causal Consistency with No Slowdown Cascades

The general outline of a system that moves the enforcement of causal consistency to read operations is straightforward. Each client c needs to maintain some metadata to encode the most recent state of the data store that it has observed. On reading an object o, c needs to determine whether the version of o that the data store currently holds is safe to read (i.e., if it reflects all the updates encoded in c’s metadata): to this end, the data store could keep, together with o, metadata of its own to encode the most recent state known to the client that created that version of o. If the version is deemed safe to read, then c needs to update its metadata to reflect any new dependency; if it is not, then c needs to decide how to proceed (among its options: try again; contact a master replica guaranteed to have the latest version of o; or trade safety for availability by accepting a stale version of o

Scale Problems in distributed log-based dissemination

Paxos - 10K Ops/s scale (introduced by Corfu 2016)

Corfu - 600K Op/s scales

vCorfu

(Stream materialization, Composable SMR, Lightweight Transaction Resolution)

组会 20222/04/18

Corfu why 600 Op/s(到底是 Flash devices 的帮助还是)

在每一个工作讲述的时候最好就要讲出,他们的 target,实现的方法,机制的改进。

组会 2022/05/09

This week

  • Paper reading
    • The FuzzyLog: A Partially Ordered Shared Log [OSDI ‘18]
      • Applications build on FuzzyLog abstraction (LogMap, CRDTMap, CAPMap)
      • Evaluation part
      • Still confused about detail implementation
  • Play with FuzzyLog demo examples
    • hello_c: running the FuzzyLog client from C
    • simple_map: LogMap example in Rust
    • LogMap implementation detail

Problems:

作者认为主要有下面几个维度的问题,限制基于 sequencer 构建 total order 的 shared log 的使用:

  • network diameter:序列器位于网络中的固定点,最远端的客户端的通信代价必然高昂。
  • network bandwidth:序列器不是 IO-bound 或易于并行的,不能跟上 I/O bandwidth 的数量级增速。
  • payload granularity:log payload 的粒度维度,shared log 应用的 payload 不会很大(大的目标通常限制为 64-bit 指针指向 external blob store)。作者提到如果 payload 增加,对 sequencer 的效率要去会更严格(要求达到接近 1 billion ops/s)
  • system size: 系统大小维度,如果扩增到支持四十台服务器,我们需要接近 2 billion ops/s 的序列器。

Evaluations:

评估部分针对这些问题做了回应。

Latency micro-benchmarks for Dapple on a lightly loaded deployment

In comparison with other shared log systems, the paper only show CAPMap(one of their applications based on FuzzyLog) latency of gets and puts when network partition tolerance