Listing Keys from a Bucket ========================== Since we already implemented some commands you may be asking yourself, why do we need a full chapter for another command? well, think again... Since bucket and key are hashed together to decide to which vnode a request will go it means that the keys for a given bucket may be distributed in multiple vnodes, and in case you are running in a cluster this means your keys are distributed in multiple physical nodes. This means that to list all the keys from a bucket we have to ask all the vnodes for the keys on a given bucket and then put the responses together and return the set of all responses. For this Riak Core provides something called coverage calls, which are a way to handle this process of running a command on all vnodes and gathering the responses. In this chapter we are going to implement the `tanodb:keys(Bucket)` function using coverage calls. Implementing the CORE API ------------------------- We start as usual by `adding the metric for the keys function `_. Then implement `tanodb:keys/1 `_, but as you may notice it's not similar to the previous ones because of what we talked about in the introduction. In this case we call `tanodb_coverage_fsm:start({keys, Bucket}, Timeout)`, which is a new module, it implements a behavior called `riak_core_coverage_fsm`, short for riak_core_coverage `finite state machine `_, it implements some predefined callbacks that are called on different states of a finite state machine. The start function calls `tanodb_coverage_fsm_sup:start_fsm([ReqId, self(), Request, Timeout]) `_ which starts a supervisor for this new process. We also need to `register the supervisor in the supervisor tree `_. As a side note, tanodb_coverage_fsm uses a module called time_compat to avoid problems with deprecated uses of time in Erlang, for that we need to `add the module as a dependency `_. When we start the fsm with a command ({keys, Bucket}) and a timeout in milliseconds, it starts a supervisor that starts the finite state machine process, it first `calls the init function `_ which initializes the state of the process and returns some information to riak_core so it knows what kind of coverage call we want to do, then riak_core calls the `handle_coverage `_ function on each vnode and with each response it `calls process_results `_ in our process, when all the results are received or if an error happens (such as a timeout) it will call the `finish callback `_ there we `send the results `_ to the calling process which `is waiting for it `_. The `handle_coverage implementation `_ is really simple, it uses the `ets:match/2 function `_ to match against all the entries with the given bucket and returns the key from the matched results. You can read more about ets match specs in the `match spec chapter on the Erlang documentation `_. Relevant code from tanodb.erl: .. code-block:: erlang keys(Bucket) -> tanodb_metrics:core_keys(), Timeout = 5000, tanodb_coverage_fsm:start({keys, Bucket}, Timeout). Relevant code from tanodb_vnode.erl: .. code-block:: erlang handle_coverage({keys, Bucket}, _KeySpaces, {_, RefId, _}, State=#state{table_name=TableName}) -> Keys0 = ets:match(TableName, {{Bucket, '$1'}, '_'}), Keys = lists:map(fun first/1, Keys0), {reply, {RefId, Keys}, State}; Test It ....... Let's start by checking keys on an empty bucket. .. code-block:: erlang (tanodb@127.0.0.1)1> tanodb:keys(<<"mybucket">>). {ok,[{1347321821914426127719021955160323408745312813056, 'tanodb@127.0.0.1',[]}, ... {959110449498405040071168171470060731649205731328, 'tanodb@127.0.0.1',...}, {411047335499316445744786359201454599278231027712,...}, {...}|...]} The output is quite verbose, here is redacted for clarity, but we get back: .. code-block:: erlang {ok, [{Partition, Node, ListOfKeys}*64]} That means 64 3-item tuples (one for each vnode) with the partition id, the node where the partition is and the list of keys for that vnode, in this case all of them are empty and in the following cases most of them will be empty so we will filter them to clean the output. Now let's put a value: .. code-block:: erlang (tanodb@127.0.0.1)2> tanodb:put({<<"mybucket">>, <<"k1">>}, 42). {ok,228359630832953580969325755111919221821239459840} And try again listing keys but this time filtering the empty results: .. code-block:: erlang (tanodb@127.0.0.1)3> lists:filter(fun ({_, _, []}) -> false; (_) -> true end, element(2, tanodb:keys(<<"mybucket">>))). [{228359630832953580969325755111919221821239459840, 'tanodb@127.0.0.1', [<<"k1">>]}] We get one partition that returns the key that we just inserted, you can also check that the partition id is the same as the result from the put call before. Now let's insert another value: .. code-block:: erlang (tanodb@127.0.0.1)4> tanodb:put({<<"mybucket">>, <<"k2">>}, 43). {ok,1210306043414653979137426502093171875652569137152} And list again, now we get two partitions with keys: .. code-block:: erlang (tanodb@127.0.0.1)5> lists:filter(fun ({_, _, []}) -> false; (_) -> true end, element(2, tanodb:keys(<<"mybucket">>))). [{1210306043414653979137426502093171875652569137152, 'tanodb@127.0.0.1', [<<"k2">>]}, {228359630832953580969325755111919221821239459840, 'tanodb@127.0.0.1', [<<"k1">>]}] Yet another value: .. code-block:: erlang (tanodb@127.0.0.1)6> tanodb:put({<<"mybucket">>, <<"k3">>}, 44). {ok,1073290264914881830555831049026020342559825461248} And the list again: .. code-block:: erlang (tanodb@127.0.0.1)7> lists:filter(fun ({_, _, []}) -> false; (_) -> true end, element(2, tanodb:keys(<<"mybucket">>))). [{1210306043414653979137426502093171875652569137152, 'tanodb@127.0.0.1', [<<"k2">>]}, {1073290264914881830555831049026020342559825461248, 'tanodb@127.0.0.1', [<<"k3">>]}, {228359630832953580969325755111919221821239459840, 'tanodb@127.0.0.1', [<<"k1">>]}] Implementing the REST API ------------------------- The REST API is quite straight forward, we `add a new route to cowboy `_ allowing to do `GET /store/:bucket` without specifying the key, we will interpret this as a request to "get the bucket" which for us means to return the keys. Then when doing a GET and key is undefined we assume it's a request to list the bucket's keys so we `request the keys `_ and deduplicate them by `using them as keys in a map with the values set to true `_ and then `collecting the keys of the map `_. Test It ....... Like in the previous test, let's start listing an empty bucket: .. code-block:: sh $ http localhost:8080/store/mybucket .. code-block:: http HTTP/1.1 200 OK content-length: 2 content-type: application/json date: Sat, 31 Oct 2015 14:12:52 GMT server: Cowboy [] Let's put a value in that bucket: .. code-block:: sh $ http post localhost:8080/store/mybucket/bob name=bob color=yellow .. code-block:: http HTTP/1.1 204 No Content content-length: 0 content-type: application/json date: Sat, 31 Oct 2015 14:12:58 GMT server: Cowboy And list it again: .. code-block:: sh $ http localhost:8080/store/mybucket .. code-block:: http HTTP/1.1 200 OK content-length: 7 content-type: application/json date: Sat, 31 Oct 2015 14:13:00 GMT server: Cowboy [ "bob" ] Yet another one: .. code-block:: sh $ http post localhost:8080/store/mybucket/patrick name=patrick color=pink .. code-block:: http HTTP/1.1 204 No Content content-length: 0 content-type: application/json date: Sat, 31 Oct 2015 14:13:18 GMT server: Cowboy List again: .. code-block:: sh $ http localhost:8080/store/mybucket .. code-block:: http HTTP/1.1 200 OK content-length: 17 content-type: application/json date: Sat, 31 Oct 2015 14:13:20 GMT server: Cowboy [ "bob", "patrick" ]