Course Calendar
Date |
Lecture |
Instructor |
Topic |
Week 2 |
|||
Oct 8, 9:00-11:00 |
Lectures 1 + 2 |
Peter Pietzuch |
Scalable distributed systems design (slides) |
Oct 11, 14:00-16:00 |
Lectures 3 + 4 |
Peter Pietzuch |
Data centre and cloud computing (slides) |
Week 3 |
|||
Oct 15, 9:00-11:00 |
Lectures 5 + 6 |
Peter Pietzuch |
BigTable (slides) |
Oct 18, 14:00-16:00 |
Lectures 7 + 8 |
Peter Pietzuch |
Dynamo (slides) |
Week 4 |
|||
Oct 22, 9:00-11:00 |
Lectures 9 + 10 |
Peter Pietzuch |
Spanner (slides) |
Oct 25, 14:00-16:00 |
Lectures 11 + 12 |
Thomas Heinis |
Introduction & Main Memory Databases (slides, slides) |
Week 5 |
|||
Oct 29, 9:00-11:00 |
Lectures 13 + 14 |
Thomas Heinis |
Solid State Disk and Databases (slides, slides) |
Nov 1, 14:00-16:00 |
Lecture 15 + 16 |
Thomas Heinis |
Graph Databases (slides) |
Week 6 |
|||
Nov 5, 9:00-11:00 |
Lecture 17 + 18 |
Peter Pietzuch | MapReduce (slides) Required reading: "MapReduce: Simplified Data Processing on Large Clusters" |
Nov 8, 14:00-16:00 | Lecture 19 + 20 | Peter Pietzuch | Spark (slides) Required reading: "Resilient Distributed Datasets" |
Week 7 | |||
Nov 12, 9:00-11:00 | Lecture 21 + 22 | Thomas Heinis | Document Databases (slides, slides) & XQuery/XPath slides (not examinable) |
Nov 15, 14:00-16:00 | Lecture 23 + 24 | Thomas Heinis | Graph & Document Database Tutorial |
Week 8 |
|
||
Nov 19, 9:00-11:00 |
Lecture 25 + 26 | Thomas Heinis | Transactions on Multicores (slides) |
Nov 22, 14:00-16:00 | Lecture 27 + 28 | Thomas Heinis | Cold Storage (slides, slides) |
Week 9 | |||
Nov 26, 09:00-11:00 | No lecture | ||
Nov 29, 14:00-16:00 |
No lecture |
||
Reading and discussion materials:
Week 3:
- "Bigtable: A Distributed Storage System for Structured Data", Fay Chang, Jeffrey Dean, Sanjay Ghemawat, Wilson C. Hsieh, Deborah A. Wallach, Mike Burrows, Tushar Chandra, Andrew Fikes, and Robert E. Gruber, Seventh Symposium on Operating System Design and Implementation (OSDI), Seattle, WA, November, 2006
- What is the problem that this paper tries to solve? How would summarise its main idea in a few sentences? How does it work in more detail?
- What is good about the paper? What is not good about the paper?
- How does the design of BigTable compare to that of a parallel relational database management system (RDBMS)?
- What limits the scalability of the BigTable design?
- "Dynamo: Amazon's Highly Available Key-Value Store", Guiseppe DeCandia, Deniz Hastorun, Madan Jampani, Gunavardhan Kakulapati, Avinash Lakshman, Alex Pilchin, Swami Sivasubramanian, Peter Vosshall, and Werner Vogels, ACM Symposium on Operating Systems Principles (SOSP), Stevenson, WA, October 2007
- What is the problem that this paper tries to solve? How would summarise its main idea in a few sentences? How does it work in more detail?
- What is good about the paper? What is not good about the paper?
- To what extent is the design of Dynamo inspired by Distributed Hash Tables (DHTs)? What are the advantages and disadvantages of such a design?
- How does the design of Dynamo compare to that of BigTable?
Week 4:
- "Spanner: Google's Globally-Distributed Database", James C. Corbett, Jeffrey Dean, Michael Epstein, Andrew Fikes, Christopher Frost, JJ Furman, Sanjay Ghemawat, Andrey Gubarev, Christopher Heiser, Peter Hochschild, Wilson Hsieh, Sebastian Kanthak, Eugene Kogan, Hongyi Li, Alexander Lloyd, Sergey Melnik, David Mwaura, David Nagle, Sean Quinlan, Rajesh Rao, Lindsay Rolig, Yasushi Saito, Michal Szymaniak, Christopher Taylor, Ruth Wang, and Dale Woodford, Tenth Symposium on Operating System Design and Implementation (OSDI), Hollywood, CA, October, 2012
- What is the problem that this paper tries to solve? How would summarise its main idea in a few sentences? How does it work in more detail?
- What is good about the paper? What is not good about the paper?
- How does the performance of Spanner depend on the workload?
- What other applications could TrueTime have?
Week 6:
- "MapReduce: Simplified Data Processing on Large Clusters", Jeffrey Dean and Sanjay Ghemawat, Sixth Symposium on Operating System Design and Implementation (OSDI), San Francisco, CA, December, 2004
- What is the problem that this paper tries to solve? How would summarise its main idea in a few sentences? How does it work in more detail?
- What is good about the paper? What is not good about the paper?
- What algorithms cannot be easily expressed in the MapReduce model?
- Can you think of other techniques for handling stragglers?
- "Resilient Distributed Datasets", Matei Zaharia, Mosharaf Chowdhury, Tathagata Das, Ankur Dave, Justin Ma, Murphy McCauley, Michael J. Franklin, Scott Shenker, Ion Stoica, 9th USENIX conference on Networked Systems Design and Implementation (NSDI), San Jose, CA, April 2012.
- What is the problem that this paper tries to solve? How would summarise its main idea in a few sentences? How does it work in more detail?
- What is good about the paper? What is not good about the paper?
- Is the comparison with Hadoop fair?
- How well can Spark be used to process graph data?
Coursework 1:
- "ZooKeeper: Wait-Free Coordination for Internet-Scale Systems", Patrick Hunt, Mahadev Konar, Flavio P. Junqueira, and Benjamin Reed, USENIX Annual Technical Conference (ATC), Boston, MA, 2010
Optional:
- "Large-scale cluster managment at Google with Borg" (link)
- "Kubernetes - Scheduling the Future at Cloud Scale" (link)
Other information:
If you print the slides, we encourage you to print them with 4 slides per page. You can do this either by selecting "Multiple pages per sheet" in the "Print" dialog box of Acrobat Reader, or by simply typing the following command in Linux:
$ pdfnup --nup 2x2 file.pdf
which generates "file-nup.pdf" with 4 slides per page.