软件工程技术研究开发中心公告

“与顶级会议作者面对面系列”之四: Haoyuan Li, UC Berkeley
公告发布人:admin 2013-12-3

TITLE: Discretized Streams: Fault-Tolerant Streaming Computation at Scale

SPEAKER: Haoyuan Li, Computer Science, UC Berkeley

TIME: 10:00 - 10:45am, Tuesday, December 3, 2013

VENUE: Lecture Hall, 4th floor No. 5 Building, ISCAS

CONTACT: Kai Wang (wangkai@iscas.ac.cn)

ABSTRACT:

Many “big data” applications must act on data in real time. Running these applications at ever-larger scales re- quires parallel platforms that automatically handle faults and stragglers. Unfortunately, current distributed stream processing models provide fault recovery in an expen- sive manner, requiring hot replication or long recovery times, and do not handle stragglers. We propose a new processing model, discretized streams (D-Streams), that overcomes these challenges. D-Streams enable a par- allel recovery mechanism that improves efficiency over traditional replication and backup schemes, and tolerates stragglers. We show that they support a rich set of oper- ators while attaining high per-node throughput similar to single-node systems, linear scaling to 100 nodes, sub- second latency, and sub-second fault recovery. Finally, D-Streams can easily be composed with batch and in- teractive query models like MapReduce, enabling rich applications that combine these modes. We implement D-Streams in a system called Spark Streaming.

TITLE: Tachyon: Memory Throughput I/O for Cluster Computing Frameworks

SPEAKER: Haoyuan Li, Computer Science, UC Berkeley

TIME: 11:00 - 11:45am, Tuesday, December 3, 2013

VENUE: Lecture Hall, 4th floor No. 5 Building, ISCAS

CONTACT: Kai Wang (wangkai@iscas.ac.cn)

ABSTRACT:

As ever more big data computations start to be in-memory, I/O throughput dominates the running times of many workloads. For distributed storage, the read throughput can be improved using caching, however, the write throughput is limited by both disk and network bandwidth due to data replication for fault-tolerance. This paper proposes a new file sys- tem architecture to enable frameworks to both read and write reliably at memory speed, by avoiding syn- chronous data replication on writes.

BIO:

Haoyuan Li (http://www.cs.berkeley.edu/~haoyuan/) is a Computer Science PhD student in the AMP Lab at UC Berkeley, interested in computer systems, big data, and cloud computing. My advisors are Scott Shenker and Ion Stoica. Before Berkeley, he studied at Cornell University and Peking University, and worked at Conviva and Google.