» RPC Protocol
The Serf agent provides a complete RPC mechanism that can be used to control the agent programmatically. This RPC mechanism is the same one used by the CLI, but can be used by other applications to easily leverage the power of Serf without directly embedding. Additionally, it can be used as a fast IPC mechanism to allow applications to receive events immediately instead of using the fork/exec model of event handlers.
A reference implementation in Go can be found here.
» Implementation Details
The RPC protocol is implemented using MsgPack over TCP. This choice is driven by the fact that all operating systems support TCP, and MsgPack provides a fast serialization format that is broadly available across languages.
All RPC requests have a request header, and some requests have a request body. The request header looks like:
{"Command": "handshake", "Seq": 0}
All responses have a response header, and some may contain a response body. The response header looks like:
{"Seq": 0, "Error": ""}
The Command
is used to specify what command the server should
run, and the Seq
is used to track the request. Responses are
tagged with the same Seq
as the request. This allows for some
concurrency on the server side, as requests are not purely FIFO.
Thus, the Seq
value should not be re-used between commands.
All responses may be accompanied by an error.
Possible commands include:
- handshake - Used to initialize the connection, set the version
- auth - Used to authenticate a client
- event - Fires a new user event
- force-leave - Removes a failed node from the cluster
- join - Requests Serf join another node
- members - Returns the list of members
- members-filtered - Returns a subset of members
- tags - Modifies tags on a running Serf agent
- stream - Starts streaming events over the connection
- monitor - Starts streaming logs over the connection
- stop - Stops streaming logs or events
- leave - Serf agent performs a graceful leave and shutdown
- query - Initiates a new query
- respond - Responds to an incoming query
- install-key - Installs a new encryption key
- use-key - Changes the primary key used for encrypting messages
- remove-key - Removes an existing encryption key
- list-keys - Provides a list of encryption keys in use in the cluster
- stats - Provides a debugging information about the running serf agent
- get-coordinate - Returns the network coordinate for a node
Below each command is documented along with any request or response body that is applicable.
» handshake
The handshake MUST be the first command that is sent, as it informs the server which version the client is using.
The request header must be followed with a handshake body, like:
{"Version": 1}
The body specifies the IPC version being used, however only version 1 is currently supported. This is to ensure backwards compatibility in the future.
There is no special response body, but the client should wait for the response and check for an error.
» auth
If the agent is configured to use an auth key, then the client must
issue an auth
command after the handshake is complete.
The auth request body looks like:
{"AuthKey": "my-secret-auth-token"}
The AuthKey
must be provided and is the authorization key.
There is no special response body.
» event
The event command is used to fire a new user event. It takes the following request body:
{"Name": "foo", "Payload": "test payload", "Coalesce": true}
The Name
is a string, but Payload
is just opaque bytes. Coalesce
is used to control if Serf should enable event coalescing.
There is no special response body.
» force-leave
This command is used to remove failed nodes from a cluster. It takes the following body:
{"Node": "failed-node-name"}
There is no special response body.
» join
This command is used to join an existing cluster using a known node. It takes the following body:
{"Existing": ["192.168.0.1:6000", "192.168.0.2:6000"], "Replay": false}
The Existing
nodes are each contacted, and Replay
controls if we will replay
old user events or if they will simply be ignored. The response body in addition
to the header is returned. The body looks like:
{"Num": 2}
The body returns the number of nodes successfully joined.
» members
The members command is used to return all the known members and associated information. There is no request body, but the response looks like:
{"Members": [
{
"Name": "TestNode"
"Addr": [127, 0, 0, 1],
"Port": 5000,
"Tags": {
"role": "test"
},
"Status": "alive",
"ProtocolMin": 0,
"ProtocolMax": 3,
"ProtocolCur": 2,
"DelegateMin": 0,
"DelegateMax": 1,
"DelegateCur": 1,
},
...]
}
» members-filtered
The members-filtered command is used to return a subset of the known members based on their metadata. It takes the following body:
{"Tags": {"key": "val"}, "Status": "alive", "Name": "node1"}
Tags
are used to filter nodes based on tag values. Status
is used to filter
nodes based on operational status. Name
is used to filter based on node names.
Both Name
and Status
, as well as all Tags
values, can contain regular
expression patterns.
Note that regular expression patterns will automatically be placed between start
(^
) and end ($
) anchors.
The response will be in the same format as the members
command.
» tags
The tags command is used to alter the tags on a Serf agent while it is running.
A member-update
event will be triggered immediately to notify the other agents
in the cluster of the change. The tags command can add new tags, modify existing
tags, or delete tags. The request body looks like:
{"Tags": {"tag1": "val1"}, "DeleteTags": ["tag2"]}
There is no special response body.
» stream
The stream command is used to subscribe to a stream of all events matching a given type filter. Events will continue to be sent until the stream is stopped. The request body looks like:
{"Type": "member-join,user:deploy"}`
The format of type is the same as the event handler,
except no script is specified. The one exception is that "*"
can be specified to
subscribe to all events.
The server will respond with a standard response header indicating if the stream
was successful. However, now as events occur they will be sent and tagged with
the same Seq
as the stream command that matches.
Assume we issued the previous stream command with Seq 50
,
we may start getting messages like:
{"Seq": 50, "Error": ""}
{
"Event": "user",
"LTime": 123,
"Name": "deploy",
"Payload": "9c45b87",
"Coalesce": true,
}
{"Seq": 50, "Error": ""}
{
"Event": "member-join",
"Members": [
{
"Name": "TestNode"
"Addr": [127, 0, 0, 1],
"Port": 5000,
"Tags": {
"role": "test"
},
"Status": "alive",
"ProtocolMin": 0,
"ProtocolMax": 3,
"ProtocolCur": 2,
"DelegateMin": 0,
"DelegateMax": 1,
"DelegateCur": 1,
},
...
]
}
{"Seq": 50, "Error": ""}
{
"Event": "query",
"ID": 1023,
"LTime": 125,
"Name": "load",
"Payload": "15m",
}
It is important to realize that these messages are sent asynchronously,
and not in response to any command. That means if a client is streaming
commands, there may be events streamed while a client is waiting for a
response to a command. This is why the Seq
must be used to pair requests
with their corresponding responses.
There is no limit to the number of concurrent streams a client can request,
however a message is not deduplicated, so if multiple streams match a given
event, it will be sent multiple times with the corresponding Seq
number.
To stop streaming, the stop
command is used.
» monitor
The monitor command is similar to the stream command, but instead of events it subscribes the channel to log messages from the Agent.
The request is like:
{"LogLevel": "DEBUG"}
This subscribes the client to all messages of at least DEBUG level.
The server will respond with a standard response header indicating if the monitor
was successful. However, now as logs occur they will be sent and tagged with
the same Seq
as the monitor command that matches.
Assume we issued the previous monitor command with Seq 50
,
we may start getting messages like:
{"Seq": 50, "Error": ""}
{"Log": "2013/12/03 13:06:53 [INFO] agent: Received event: member-join"}
It is important to realize that these messages are sent asynchronously,
and not in response to any command. That means if a client is streaming
commands, there may be logs streamed while a client is waiting for a
response to a command. This is why the Seq
must be used to pair requests
with their corresponding responses.
The client can only be subscribed to at most a single monitor instance.
To stop streaming, the stop
command is used.
» stop
The stop command is used to stop either a stream or monitor. The request looks like:
{"Stop": 50}
This unsubscribes the client from the monitor and/or stream registered
with Seq
value of 50.
There is no special response body.
» leave
The leave command is used trigger a graceful leave and shutdown. There is no request body, or special response body.
» query
The query command is used to issue a new query. It takes the following request body:
{
"FilterNodes": ["foo", "bar"],
"FilterTags": {"role": ".*web.*"},
"RequestAck": true,
"Timeout": 0,
"Name": "load",
"Payload": "15m",
}
The Name
is a string, but Payload
is just opaque bytes. The remaining fields are
optional. FilterNodes
is used to restrict the nodes that should respond to only
those named. FilterTags
is used to filter tags using a regular expression on each
tag. RequestAck
is used to ask that nodes send an "ack" once the message is received,
otherwise only responses are delivered. Timeout
can be provided (in nanoseconds) to
optionally override the default.
The server will respond with a standard response header indicating if the query
was successful. However, the channel is now subscribed to receive any acks or
responses. This is similar to stream
, except scoped only to this query. The same
Seq
is used as the query command that matches.
We will start to get the following:
{"Seq": 50, "Error": ""}
{
"Type": "ack",
"From": "foo",
}
{"Seq": 50, "Error": ""}
{
"Type": "response",
"From": "foo",
"Payload": "1.02",
}
{"Seq": 50, "Error": ""}
{
"Type": "done",
}
Each query record has a Type
to indicate what is being represented. This is
one of ack
, response
or done
. Once done
is received the client should
not expect any further messages corresponding to that query.
» respond
The respond command is with stream
to subscribe to queries and then respond.
It takes the following request body:
{"ID": 1023, "Payload": "my response"}
The ID
is an opaque value that is assigned by the IPC layer. This number is
unique per client connection and cannot be used across connections. Payload
is
just opaque bytes.
There is no special response body.
» install-key
The install-key command is used to install a new encryption key onto the cluster's keyring. The request looks like:
{"Key": "5K9OtfP7efFrNKe5WCQvXvnaXJ5cWP0SvXiwe0kkjM4="}
The Key
should be 32 bytes of base64-encoded data. 24
and 16
bytes are also accepted, but 32 bytes are recommended for improved security. This value can be generated
using the keygen command.
Once invoked, this method will begin broadcasting the new key to all members in the cluster via the gossip protocol. Once the query has completed, a response like the following will be returned:
{
"Messages": {
"node1": "message from node1",
"node2": "message from node2"
},
"NumErr": 0,
"NumNodes": 2,
"NumResp": 2
}
The Messages
field contains a per-node mapping of messages. Messages may be
informational or error messages, which can be determined by examining the other
fields in the response. The NumErr
field indicates the total number of
errors encountered by members during the query. NumNodes
indicates the total
number of members in the cluster, and NumResp
indicates the number of
responses received during the query.
» use-key
The use-key command is used to change the primary key, which is used to encrypt messages. The request looks like:
{"Key": "5K9OtfP7efFrNKe5WCQvXvnaXJ5cWP0SvXiwe0kkjM4="}
The key requested must already exist in the keyring of all agents for this call to succeed. Once invoked, this method will broadcast the desired key to all members. The members will attempt to change their current primary key pointer and respond with the result.
The response returned by this method is in the same format as the install-key
call.
» remove-key
The remove-key command is used to remove a key from the cluster's keyring. The request looks like:
{"Key": "5K9OtfP7efFrNKe5WCQvXvnaXJ5cWP0SvXiwe0kkjM4="}
The key requested must already exist in the keyring of each agent for this command to succeed. Once invoked, this method will broadcast the key requested for deletion to all members in the cluster and ask them to remove it from their internal keyring. Each node will reply with their individual results.
The response returned by this method is in the same format as the install-key
call.
NOTE: If the key requested for deletion is currently the primary key on any node, that node will report failure and refuse to remove the key.
» list-keys
The list-keys command is used to return a list of all encryption keys currently in use on the cluster. There is no request body, but the response looks like:
{
"Messages": {
"node1": "message from node1",
"node2": "message from node2"
},
"Keys": {
"5K9OtfP7efFrNKe5WCQvXvnaXJ5cWP0SvXiwe0kkjM4=": 2,
"T9jncgl9mbLus+baTTa7q7nPSUrXwbDi2dhbtqir37s=": 1
},
"NumErr": 0,
"NumNodes": 2,
"NumResp": 2
}
This response body is almost the same as the other key operations, but notice
that it contains a Keys
field. The Keys
field lists all keys known to the
cluster, and how many members know about it. Typically if key broadcasting is
successful, this number should be equivalent to the NumNodes
field. If not all
members are aware of a key, you should either rebroadcast that key using the
install-key
RPC command, or remove it using the remove-key
RPC command. More
on encryption keys can be found on the
agent encryption page.
» stats
The stats command is used to obtain operator debugging information about the running serf agent. There is no request body, but the response looks like:
{
"agent": {
"name": "node1"
},
"runtime": {
"os": "linux",
"arch": "amd64",
"version": "go1.2",
"max_procs": "1",
"goroutines": "22",
"cpu_count": "4"
},
"serf": {
"failed": "0",
"left": "0",
"event_time": "1",
"query_time": "1",
"event_queue": "0",
"members": "5",
"member_time": "5",
"intent_queue": "0",
"query_queue": "0"
},
"tags": {}
}
» get-coordinate
The get-coordinate command is used to obtain the network coordinate of a given node.
Serf builds up a set of network coordinates for all the nodes in the cluster. Agents cache these, and once the coordinates for two nodes are known, it's possible to estimate the network round trip time between them using a simple calculation.
The request looks like:
{"Node": "n1"}
Once invoked, this method will look up the coordinate in the agent's cache and return it, yielding a response like this:
{
"Coord": {
"Adjustment": 0,
"Error": 1.5,
"Height": 0,
"Vec": [0,0,0,0,0,0,0,0]
},
"Ok": true
}
The returned coordinate is valid only if Ok
is true. Otherwise, there wasn't
a coordinate available for the given node. This might mean that coordinates
are not enabled, or that the node has not yet contacted the agent.
See the Network Coordinates internals guide for more information on how these coordinates are computed, and for details on how to perform calculations with them.