-
Notifications
You must be signed in to change notification settings - Fork 24
raft: consensus protocol design doc #100
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Open
SUMUKHA-PK
wants to merge
32
commits into
master
Choose a base branch
from
Consensus-protocol-Design-Doc
base: master
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
Open
Changes from 12 commits
Commits
Show all changes
32 commits
Select commit
Hold shift + click to select a range
5a7a27b
Create Consensus-Protocol.md
SUMUKHA-PK df185c3
Update Consensus-Protocol.md
SUMUKHA-PK 9657559
Merge branch 'master' into Consensus-protocol-Design-Doc
tsatke 099498a
Update doc/internal/parser/scanner/Consensus-Protocol.md
SUMUKHA-PK 43e600e
Update Consensus-Protocol.md
SUMUKHA-PK bd90b1f
Update Consensus-Protocol.md
SUMUKHA-PK 7d26490
Update Consensus-Protocol.md
SUMUKHA-PK 5923f4f
Update Consensus-Protocol.md
SUMUKHA-PK 02239c8
Update Consensus-Protocol.md
SUMUKHA-PK 2b900c0
Update Consensus-Protocol.md
SUMUKHA-PK 8527df5
Merge branch 'master' of https://github.com/tomarrell/lbadd into Cons…
SUMUKHA-PK b18332a
Moved doc to appropriate folder
SUMUKHA-PK b11486e
Update Consensus-Protocol.md
SUMUKHA-PK 4e559a5
Merge branch 'master' into Consensus-protocol-Design-Doc
SUMUKHA-PK 7eef110
Update Consensus-Protocol.md
SUMUKHA-PK 9b209f1
Merge branch 'master' into Consensus-protocol-Design-Doc
SUMUKHA-PK 17a8e1a
Merge branch 'master' into Consensus-protocol-Design-Doc
SUMUKHA-PK 1cb7555
Merge branch 'master' into Consensus-protocol-Design-Doc
SUMUKHA-PK 1209205
Update doc/internal/consensus/Consensus-Protocol.md
SUMUKHA-PK 8d25041
Apply suggestions from code review
SUMUKHA-PK 01d0c32
Apply suggestions from code review
SUMUKHA-PK d354ab9
Apply suggestions from code review
SUMUKHA-PK 2e71890
Merge branch 'master' into Consensus-protocol-Design-Doc
SUMUKHA-PK 2a8c4d9
Update Consensus-Protocol.md
SUMUKHA-PK c32828c
Update Consensus-Protocol.md
SUMUKHA-PK 6815e8c
Update Consensus-Protocol.md
SUMUKHA-PK 281906e
Update doc/internal/consensus/Consensus-Protocol.md
SUMUKHA-PK b180d33
Update doc/internal/consensus/Consensus-Protocol.md
SUMUKHA-PK 981d873
Merge branch 'master' into Consensus-protocol-Design-Doc
SUMUKHA-PK fd1f272
Update Consensus-Protocol.md
SUMUKHA-PK a16b909
Update Consensus-Protocol.md
SUMUKHA-PK c312812
Merge branch 'master' into Consensus-protocol-Design-Doc
tsatke File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,80 @@ | ||
| # The consensus | ||
|
|
||
| Before talking about consensus, we need to discuss some logistics based on how the systems can co-exist. | ||
|
|
||
| * Communication: Distributed systems need a method to communicate between each other. Remote Procedure Calls is the mechanism using which a standalone server can talk to another. The standard Go package [RPC](https://golang.org/pkg/net/rpc/) serves us the purpose. | ||
| * Security: Access control mechanisms need to be in place to decide on access to functions in the servers based on their state (leader, follower, candidate) | ||
SUMUKHA-PK marked this conversation as resolved.
Show resolved
Hide resolved
|
||
| * Routing to leader: One of the issues with a varying leader is for the clients to know which IP address to contact for the service. We can solve this problem by advertising any/all IPs of the cluster and simply forward this request to the current leader; OR have a proxy that can forward the request to the current leader wheneve the requests come in. (Section client interaction of post has another approach which works too) | ||
SUMUKHA-PK marked this conversation as resolved.
Outdated
Show resolved
Hide resolved
|
||
| * The servers will be implemented in the `interal/master` or `internal/worker` folders which will import the raft API and perform their functions. | ||
SUMUKHA-PK marked this conversation as resolved.
Outdated
Show resolved
Hide resolved
|
||
|
|
||
| Maintaining consensus is one of the major parts of a distributed system. To know to have achieved a stable system, we need the following two parts of implementation. | ||
|
|
||
| ## The Raft protocol | ||
|
|
||
| A raft server may be in any of the 3 states; leader, follower or candidate. All requests are serviced through the leader and it then decides how and if the logs must be replicated in the follower machines. The raft protocol has 3 almost independent modules: | ||
tsatke marked this conversation as resolved.
Show resolved
Hide resolved
|
||
| 1. Leader Election | ||
| 2. Log Replication | ||
| 3. Safety | ||
|
|
||
| A detailed description of all the modules follow: | ||
|
|
||
| ### Leader Election | ||
|
|
||
| #### Spec | ||
| * Startup: All servers start in the follower state and begin by requesting votes to be elected as a leader. | ||
tsatke marked this conversation as resolved.
Show resolved
Hide resolved
|
||
| * Pre-election: The server increments its `currentTerm` by one, changes to `candidate` state and sends out `RequestVotes` RPC parallely to all the peers. | ||
| * Vote condition: FCFS basis. If there was no request to the server, it votes for itself (read 3.6 and clear out when to vote for itself) | ||
SUMUKHA-PK marked this conversation as resolved.
Outdated
Show resolved
Hide resolved
|
||
| * Election timeout: A preset time for which the server waits to see if a peer requested a vote. It is randomly chosen between 150-300ms. | ||
| * Election is repeated after an election timeout until: | ||
| 1. The server wins the election | ||
| 2. A peer establishes itself as leader. | ||
| 3. Election timer times out or a split vote occurs (leading to no leader) and the process will be repeated. | ||
| * Election win: Majority votes in the term. (More details in safety) The state of the winner is now `Leader` and the others are `Followers`. | ||
| * Maintaining leaders reign: The leader sends `heartbeats` to all servers to establish its reign. This also checks whether other servers are alive based on the response and informs other servers that the leader still is alive too. If the servers do not get timely heartbeat messages, they transform from the `follower` state to `candidate` state. | ||
| * Transition from working state to Election happens when a leader fails. | ||
SUMUKHA-PK marked this conversation as resolved.
Outdated
Show resolved
Hide resolved
|
||
| * Maintaining sanity: While waiting for votes, if a `AppendEntriesRPC` is received by the server, and the term of the leader is greater than of equal to the "waiter"'s term, the leader is considered to be legitimate and the waiter becomes a follower of the leader. If the term of the leader is lesser, it is rejected. | ||
SUMUKHA-PK marked this conversation as resolved.
Outdated
Show resolved
Hide resolved
SUMUKHA-PK marked this conversation as resolved.
Outdated
Show resolved
Hide resolved
|
||
| * The split vote problem: Though not that common, split votes can occur. To make sure this doesnt continue indefinitely, election timeouts are randomised, making the split votes less probable. | ||
SUMUKHA-PK marked this conversation as resolved.
Outdated
Show resolved
Hide resolved
|
||
|
|
||
|
|
||
| #### Implementation | ||
|
|
||
| * A separate `interal/raft` folder will have a raft implementation which provides APIs for each server to call. | ||
| * | ||
| ### Log Replication | ||
|
|
||
| #### Spec | ||
|
|
||
| * Pre-log replication: Once a leader is elected, it starts servicing the client. The leader appends a new request to its `New Entry` log then issues `AppendEntriesRPC` in parallel to all its peers. | ||
| * Successful log: When all logs have been applied successfully to all follower machines, the leader applies the entry to its state machine and returns the result to the client. | ||
| * Repeating `AppendEntries`: `AppendEntriesRPC` are repeated indefinitely until all followers eventually store all log entries. | ||
| * Log entry storage: Log entries are a queue of state machine commands which are applied to that particular state machine. Log entries are associated with a term number to indicate the term of application of that log along with an integer index to identify a particular logs position. | ||
| * Committed entry: A log entry is called committed once its replicated on the majority of the servers in the cluster. Once an entry is committed, it commits all the previous entries in the leaders log, including the entries created by the previous leaders. | ||
| * The leader keeps track of the highest known index that it knows is committed and it is included in all the future `AppendEntriesRPC` (including heartbeats) to inform other servers. | ||
| * Theres some issue about log committing - paragraph says its committed when its applied everywhere and also says its applied everywhere once its committed. | ||
SUMUKHA-PK marked this conversation as resolved.
Outdated
Show resolved
Hide resolved
|
||
| * Log matching property: | ||
| * If two entries in different logs have the same index and term, then they store the same com-mand. | ||
SUMUKHA-PK marked this conversation as resolved.
Outdated
Show resolved
Hide resolved
|
||
| * If two entries in different logs have the same index and term, then the logs are identical in allpreceding entries. | ||
SUMUKHA-PK marked this conversation as resolved.
Outdated
Show resolved
Hide resolved
|
||
| * | ||
| * Add more from section 3.6 | ||
|
|
||
| #### Implementation | ||
|
|
||
| ### Safety | ||
|
|
||
| ## A strict testing mechanism | ||
|
|
||
| The testing mechanism to be implemented will enable us in figuring out the problems existing in the implementation leading to a more resilient system. | ||
| We have to test for the following basic failures: | ||
| 1. Network partitioning. | ||
| 2. Un-responsive peers. | ||
| 3. Overloading peer. | ||
| 4. Corruption of data in transit. | ||
|
|
||
| ## Graceful handling of failures | ||
|
|
||
| Accepting failures exist and handling them gracefully enables creation of more resilient and stable systems. Having _circuit breakers_, _backoff mechanisms in clients_ and _validation and coordination mechanisms_ are some of the pointers to be followed. | ||
|
|
||
| ## Running Lbadd on Raft | ||
|
|
||
| * Background: Raft is just a consensus protocol that helps keep different database servers in sync. We need methods to issue a command and enable the sync between servers. | ||
| * Logistics: The `AppendEntriesRPC` will have the command to be executed by the client. This command goes through the leader, is applied by all the followers and then committed by the leader. Thus ensuring an in-sync distributed database. | ||
SUMUKHA-PK marked this conversation as resolved.
Show resolved
Hide resolved
|
||
This file was deleted.
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Uh oh!
There was an error while loading. Please reload this page.