SOSP21——Understanding-and-Detecting-Software-Upgrade-Failures-in-Distributed-Systems
The First Step
题目摘要引言
Title
了解和检测分布式系统中的软件升级失败
Abstract
Upgrade 操作时分布式系统可用性中最具破坏性但又不可避免地主要任务之一。Upgrade失败进一步引发服务中断问题,CI的进一步采用使用更加剧了更新问题的频率和负担。 目前还没有任何工作关注upgrade failure的特征。testing framework,DUPTester
Introduction
在本文中,我们将软件升级故障 (software-upgrade failures) 定义为仅在软件升级期间 during software upgrade 发生的故障,比如可以由two code versions of the same software or between an upgrade operation and a regular software operation,并且不会在常规操作中出现问题。
Slides
- upgrade failures are problematic
- large-scale
- persistent impact (can't easily roll back)
传统方法 safe upgrade 慢
fast and safe upgrade
Focusing aspects
- Symptoms of Upgrade Failures 症状
- Root-cause study 深层原因
- 不兼容的跨版本交互(63%)
- 破碎的升级操作(33%)
- 错误配置(3%)
- 破碎的库依赖(2%)
- Triggering-condition study
Question
Hi, I'm Chuannan Zhang from USTC. thanks for the talk. in the last 2 pages of your slides, you have mentioned that the DUPtool chains captures not only upgrade failure but also downgrade failure, so in your work. do there have any differences or just the same because of the mismatch version of libs and broken operations.
本博客所有文章除特别声明外,均采用 CC BY-SA 4.0 协议 ,转载请注明出处!