Gossip protocols, Epidemic Broadcast and Eventual Consistency in Practice

Mariano Guerra

event-fabric.com

About: Riak Core Metadata

Not About

riak_ring.png

Riak Core Metadata

How did I ended up here?

Attempt #1

Make riak_core run on windows

Attempt #2

Strategies

Read Code

Paper Pile Bankrupcy

99 papers on the pile,
take one,
read it,
follow the references,

123 papers on the pile...

Beware!

Alvaro Videla's Syndrome

alvaro-wanted-a-random-number.jpg

Tracing

Achievement Unlocked

Tracing Setup

rebar3 new rebar3_riak_core tanodb
# add recon dependency to rebar.config
make devrel

# in different consoles
make dev1-console
make dev2-console
make dev3-console

Tracing Setup

make devrel-join
make devrel-status
make devrel-cluster-plan
make devrel-cluster-commit
make devrel-status

Tracing Execution

Using recon on the 3 nodes:

Tracing Execution

ReturnTrace = fun(_) -> return_trace() end.
% at most 1000 calls per second
Rate = {1000, 1000}.
recon_trace:calls([{riak_core_broadcast, '_',
    fun ([A, _]) when A /= lazy_tick -> return_trace() end},
       {riak_core_metadata_hashtree, '_', ReturnTrace},
       {riak_core_metadata_object, '_', ReturnTrace},
       {riak_core_metadata_manager, '_', ReturnTrace},
       {riak_core_metadata_exchange_fsm, '_', ReturnTrace},
       {riak_core_metadata, '_', ReturnTrace}], Rate).

Tracing Execution

Write something in riak_core_metadata:

FullPrefix = {<<"tanodb">>, <<"mymeta">>}.
MDKey = my_key_1.
MDValue = <<"my metadata value">>.
riak_core_metadata:put(FullPrefix, MDKey, MDValue).

Tracing Execution

When the dust settles:

recon_trace:clear().

Tracing Homework

Observations

What did I learn?

The life of a metadata:put 1/3

sequence-diagram-final.png

The life of a metadata:put 2/3

sequence-diagram_001-final.png

The life of a metadata:put 3/3

sequence-diagram_002.png

Broadcast Stage

Node State

Eager Push

plumtree-0.png

Eager Push

plumtree-1.png

Eager Push

plumtree-2.png

Eager Push

plumtree-3.png

Eager Push

plumtree-4.png

Eager Push

plumtree-5.png

Eager Push

plumtree-6.png

Eager Push

plumtree-7.png

Lazy Push

plumtree-lazy-0.png

Lazy Push

plumtree-lazy-1.png

Lazy Push

plumtree-lazy-2.png

Lazy Push

plumtree-lazy-3.png

Lazy Push

plumtree-lazy-4.png

Lazy Push

plumtree-lazy-5.png

Lazy Push

plumtree-lazy-6.png

Lazy Push

plumtree-lazy-7.png

Active Anti Entropy

How does node 3 gets the values broadcasted while he was down?

Merkle Tree

merkel-tree.jpg

Merkle Tree

Tree in which every non-leaf node is labelled with the hash of the labels or values (in case of leaves) of its child nodes

Merkle Tree

Hash Tree

hashtree.png

Segments

hashtree-segments.png

Segment Hashes

hashtree-segment-hashes.png

Upper Hashes

hashtree-upper-hashes.png

Hash Tree Operations

Insert

hashtree-insert-0.png

Insert

hashtree-insert-1.png

Insert

hashtree-insert-2.png

Insert

hashtree-insert-3.png

Insert

hashtree-insert-4.png

Update

hashtree-insert-5.png

Update

hashtree-insert-6.png

Update

hashtree-insert-7.png

Update

hashtree-insert-8.png

Update

hashtree-insert-9.png

Update

hashtree-insert-10.png

Compare

hash-compare-1.png

Compare

hash-compare-2.png

Compare

hash-compare-3.png

Compare

hash-compare-4.png

Compare

hash-compare-5.png

Compare

hashtree:compare(Tree, RemoteFun).
hashtree:compare(Tree, RemoteFun, AccFun).

Did I understand it?

Change in riak_core_metadata_manager from:

riak_core_metadata_hashtree:insert(PKey, Hash),
ok = dets_insert(dets_tabname(FullPrefix), Objs);

Did I understand it?

To:

PersistenceType = proplists:get_value(persistence_type, Opts, disk),
case PersistenceType of
    disk ->
        riak_core_metadata_hashtree:insert(PKey, Hash),
        ok = dets_insert(dets_tabname(FullPrefix), Objs);
    memory ->
        ok
end,

Did I understand it?

Implemented my own riak_core_broadcast_handler

Add it to advanced.config in the riak_core section:

{riak_core, [
  {broadcast_mods, [riak_core_metadata_manager,
                    tanodb_broadcast_handler]}]}

Did I understand it?

Try it:

tanodb_broadcast_handler:start_link().
tanodb_broadcast_handler:put({{<<"tanodb">>, <<"memmeta">>},
                             mem_key_1}, <<"my value">>).

Did I understand it?

It works!

Papers

Dotted Version Vectors:

Papers

Gossip/Broadcast:

Thanks

@warianoguerra

github.com/marianoguerra