Skip to content
This repository has been archived by the owner on Oct 18, 2018. It is now read-only.

Support for zone as a first class argument once resolved to a physical KV cluster #138

Open
2 tasks
robskillington opened this issue Jul 31, 2017 · 8 comments

Comments

@robskillington
Copy link
Contributor

robskillington commented Jul 31, 2017

During a discussion with @xichen2020 @jeromefroe and a few others there are some needs that need to be met to make usage of m3cluster pragmatic in the near future:

  • Able to connect to say zone Foo and access variables for zone Bar and zone Baz, this is to support the use case of running software in zone Bar and Baz that use the zone Foo KV cluster but need to have logically separate variables without changing their environment from the zone Foo variables

    • i.e. same key, environment and namespace, one example is key="agg-routing-config", environment="production", namespace="central-agg" and another value would be key="agg-routing-config", environment="production", namespace="meta-agg" - both these values have distinct values in zone Foo, Bar and Baz but need to be served from the same KV cluster in zone Foo
  • The structure of keys for a given namespace should be at the same hierarchy depth as the "global" namespace to provide a consistent snapshot of one namespace vs another

    • i.e. comparing the setup of a namespace called "meta" vs the default global namespace is quite hard right now as you can't create a string prefix that isolates global vs meta (as the global namespace which misses any namespace qualifier at all is already a prefix of all the "meta" namespace values)

This may mean needing to migrate existing keys as it is a breaking change to add these features.

cc @cw9

@cw9
Copy link
Contributor

cw9 commented Aug 3, 2017

For item 1, could we do it by using a different env for zone Bar and Baz? because otherwise we basically need another layer between zone and namespace, then for Zone Foo, which is the main use case, we need to put something default there as well.

For item 2, not sure if I understand it correctly,
a namespaced key looks like this: (/r2/control is the namespace)

/r2/control/production/whiteListFilter

a global key looks like this:

_kv/production/query.m3dbshadow.staging.sample

so _kv is the namespace prefix for global namespace and you can replace the _kv with a namespace prefix to look up the key in a certain namespace?

@jeromefroe
Copy link
Contributor

jeromefroe commented Aug 3, 2017

Said in another way, Item 1 is meant to address the use case where we have services running in different zones (think datacenters, or AZs) that talk to the same etcd cluster. So, for example, we may have three zones: A, B, and C but they use all use the etcd cluster which is located in zone A. However, despite them using the same etcd cluster, the services in each zone are independent and require their own unique keys.

In this case, the services in each zone will use the same Zone in their clients, which is A, since that is where the etcd cluster resides. Furthermore, they should all use the production environment as well, since they are all production services. That leaves us with namespace as the only knob to turn. And we could certainly make this work with namespaces. We could, for example, decide to use the zone that the services run in as a top-level namespace (A, B, and C) and then any previous namespaces we were using would be below those zone namespaces. As an example, services running in zone B using the foo namespace would now use a new "B/foo" namespace.

With that being said, Item 1, therefore, is an attempt to make the implicit assumption of a top-level zone namespace an explicit contract that is of the same weight as env or namespace. The advantage here is that as the need to support more and more zones increases the API will support it naturally instead of it being an ad-hoc addition.


The gist of item 2 is just that it's a little jarring to see kv metrics that have different prefixes. It would be easier to grok if the paths looked the same. So, taking the examples provided, kv keys that use an explicit namespace would look like

_kv/<namespace>/<environment>/<key_name>

whereas kv keys that do not use an explicit namespace would be assigned the global namespace and look like the following:

_kv/global/<environment>/<key_name>

@xichen2020
Copy link
Contributor

Re item 2, I'd prefer a consistently formed path-based key structure, e.g., /<zone>/<environment>/<namespace>/<key_name>. This is also how keys are structured in zookeeper and resembles file paths on Linux systems. Also, if all the keys in kv start with _kv prefix, we might as well not have the _kv prefix. The <zone> part of the key does not have to be the same as the zone where the kv cluster is located to accommodate the use case where one kv cluster stores keys for multiple zones.

@jeromefroe
Copy link
Contributor

jeromefroe commented Aug 4, 2017

I'm also in favor of traditional paths for the keys. However, I think prefixing the keys with kv/ would be advantageous since it would allow us to easily distinguish keys which are used in the key value store aspect of m3cluster from those that are used in the service discovery and other features offered by m3cluster.

@xichen2020
Copy link
Contributor

Understood, but shouldn't that be part of the hierarchical namespace though? e.g., something like /<zone>/<environment>/<namespace_1>/<namespace_2>/.../<namespace_n>/<key_name>, so you can have /zone1/production/kv/service1/foo and /zone1/production/sd/service1/foo?

@jeromefroe
Copy link
Contributor

O, I see, I didn't realize you wanted to keep kv as a namespace. Yea I like that idea, and if kv is now a namespace,zone and environment should take precedence.

@cw9
Copy link
Contributor

cw9 commented Aug 10, 2017

Re item 1, I'd prefer not to use namespace that way, The original plan for namespace is to support a registry per namespace, basically allowing user to define the type of each key under the namespace.

I would be more inclined to use different environment for those special cases, right now environment is more or less mapping to the deployment group in udeploy (like staging or test). If the zone B and C in your example are also listed as a different deployment group, I'd prefer to keep the convention the same. If we really want to use "production" as the environment for those cases, I'd rather add a new field in the identifier rather than change the value of the namespace.

@cw9
Copy link
Contributor

cw9 commented Jan 9, 2018

To sum up, the current format of keys are:

in cluster_for_zone1:
<namespace1>/<environment1>/<key>
<namespace1>/<environment2>/<key>
_kv/<environment1>/<key>

in cluster_for_zone2:
<namespace1>/<environment1>/<key>
<namespace1>/<environment2>/<key>
_kv/<environment1>/<key>

Given that we don't have a use case to have a zone depending on another zone, in other words, no etcd cluster will serve keys from multiple "zone"s, the current format should be ok?

In worst case if we want to host a kv cluster for zone "foo" in zone "bar" plus a kv cluster for zone "bar" in zone "bar", we could cohost 2 clusters in the same set of hosts in zone bar.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants