Normcore Systems Reading List
Design
- Why Distributed Computing
- A Note on Distributed Computing
- On Designing and Deploying Internet Scale Services
- The Perils of Good Abstractions
- A Philosophy of Software Design by John Ousterhout
- Practical Data Oriented Design by Andrew Kelley
- Larry Ellison’s Rant on Cloud Computing
- A View of Cloud Computing
- Building on Quicksand
- Chaotic Perspectives
- What every system programmer should know about concurrency
CRDTs
CRDTs are data structures that restrict operations that can be done such that they can never conflict, regardless of the order of operations or how concurrently they’re performed.
Consistency Models
Key to building systems that suit their environments is finding the right tradeoff between consistency and availability.
- CAP Conjecture - Consistency, Availability, Parition Tolerance cannot all be satisfied at once
- CAP Twelve Years Later: How the “Rules” Have Changed - Eric Brewer expands on the original tradeoff description
- Consistency and Availability - Vogels
- Eventual Consistency - Vogels
- 2PC or not 2PC, Wherefore Art Thou XA? - Two phase commit isn’t a silver bullet
- Life Beyond Distributed Transactions - Helland
- If you have too much data, then ‘good enough’ is good enough - NoSQL, Future of data theory - Pat Helland
- Starbucks doesn’t do two phase commit - Asynchronous mechanisms at work
- You Can’t Sacrifice Partition Tolerance - Additional CAP commentary
- Optimistic Replication - Relaxed consistency approaches for data replication
Infrastructure
- Principles of Robust Timing over the Internet - Managing clocks is essential for even basics such as debugging
Databases
- Let’s Build a Simple Database - A great tutorial that teaches the internals of relational databases by building SQLite from scratch in C.
Real life Distributed Systems and Data Stores:
The following distributed systems papers are seminal and a must read for people interested in building distributed systems.
- Amazon Dynamo: Amazon’s own key-value store
- Google File System: Google’s very own distributed file system.
- Google BigTable: Google’s distributed datastore.
- MapReduce: Simplied Data Processing on Large Clusters: A seminal piece of work that has powered the Hadoop ecosystem
- Autopilot: Automatic Datacenter Management
Books
- Designing Data Intensive Applications by Martin Kleppmann
- Distributed Systems for fun and profit
- Distributed Algorithms (Lynch)
Miscellaneous
- CS525 UIUC SP24: Reading List
- Readings in Distributed Systems by Christopher Meiklejohn
- Readings
- Base DS
- Class materials for a distributed systems lecture series
This list is a work in progress. If you’ve found it helpful or would like to suggest additions, please don’t hesitate to reach out.