Narayanan Shivakumar on Google's Software Infrastructure

Submitted by Dale on August 3, 2006 - 5:52pm

Tagged:

The follow notes are part 2 of 3 from Narayanan 'Shiva' Shivakumar's presentation at the July 27th VanHPC meeting. These notes cover Shivakumar's discussion of the Google software infrastructure.

Infrastructure consists of:

Google File System (GFS)
MapReduce
BigTable

Google File System (GFS)

Why another distributed file system? - We had to have reliability over thousands of nodes
We control libraries and everything
Master system is responsible for a variety of different chunks
Chunk servers manage files
Master handles meta data
Master resolves the chunk location
50+ clusters
Redundancies, chunks on different machines
Handles variety of clients
Running on petabyte file systems
Master has to manage how chunks being transferred, manage for bottlenecks and redundancy
Since there's multiple chunks in multiple servers serving multiple requests need to distribute requests appropriately across chunk servers
Question: Is data distribution and load balancing automated?
Answer: Correct. And chunk servers also do other computation
Question: Do you ever lose data?
Answer: Great question. We've lost 1 - 64meg chuck. Meanwhile machines have been dieing and traffic growing
Question: How did you know you lost exactly 1 chunk
Answer: We had a disaster a couple years back. Enough logic wasn't built in to automatically recover but people had enough knowledge to get it back
Question: It seems you'll need to assign value to chunks because of worthless data
Answer: Do we have idea of priority for chunks? We have replication weights
Question: Is it data type specific
Answer: No, it's just a blob. Google File System is data type neutral
Question: How do you check integrity?
Answer: We have error checking, some of it happens at the client end. Chunk master will keep track of how often things happen and route accordingly. Clearly not everything was thought through early on.
Question: Are chunk masters stronger than chunk servers
Answer: The machines I expect to be higher in reliability than others but the same failure logic applies. Chunk servers are typically more disks.

MapReduce

A programming model and library to simplify large-scale computations on large clusters
Kind of comes out of functional programming
Mapping step
Reduction step
Word frequencies example given
Graph shown of execution on thousands of machines (Similar slides can be found here: http://labs.google.com/papers/mapreduce-osdi04-slides/index.html)
This makes it easy to build applications without having to worry about underlying
Need to locate task close to data
Pipelining, have more tasks per machine
MR_grep 1 terabyte in 100 seconds (using 1800 machines)
Search Google on MapReduce for paper
Gave example of 3000 lines of C++ code down to 100 lines
Question: Could MapReduce be decoupled form GFS?
Answer: Pretty hard. We publish techniques so people can create open source, as our competitors are.
Question: Are servers are dedicated to a specific talk?
Answer: Certain tasks are natural choices for machines (disks), otherwise no

BigTable

For semi-structured data
Like a really big hash table
Scale is too large for commercial database and they didn't scale
Cost for commercial database isn't nearly as interesting
Since we control the entire layer we know about the application we need to support
Multi-layer map
Fault tolerate, terabyte in memory, petabyte on disk
Chart shown on data module
Distributed multi-dimensional sparse map
Good match for most of our applications
We need versions, so over time/timestamp.
Locality of versions desirable for deltas
The key is a string, the value a blob
Storage up to application
This lets you cat the entire web
Rows are broken into tablets at row boundaries
Divided among tablet servers
Beneath tablet servers is GFS
100 machines each pick up 1 tablet from failed machine

Narayanan Shivakumar on Google's Software Infrastructure

Search

Recent Comments