Skip to content

Shared configuration management

Kenneth Henderick edited this page Feb 15, 2016 · 12 revisions

ETCD as shared configuration management system for Open vStorage

Service configuration management

An etcd cluster gets deployed on the master nodes(to be decided if auto-discovery or static configuration), all the rest become proxies in read/write modus. As such every node can locally connect to the cluster and might not require any special configuration to be passed onto the different Open vStorage services.

The Open vStorage Framework populates the majority of the configuration tree required for each services to start and every service can directly communicate with etcd to retrieve configuration.

The intention is where possible to also deduplicate to certain extent some configuration management. If a service requires a client configuration that can be derived from an already available server entry somewhere else, then we reference it by id. E.g.: volumedriver arakoon_cluster client config is retrieved from /ovs/arakoon/$arakoon_cluster_id

As we might want to have the option to use this etcd for other purposes than only Open vStorage we would start the root of our tree under /ovs where split the config tree into different components /ovs/<component>

Framework = /ovs/framework

All keys stated below are relative to /ovs/framework. E.g. /cluster_id will be /ovs/framework/cluster_id.

/cluster_id = "$cluster_id"
/external_etcd = "$external_etcd"
/registered = True|False
/memcache = {"endpoints": [$endpoint_1, $endpoint2]},
/messagequeue = {"endpoints": [$endpoint_3, $endpoint_4],
                 "protocol": "amqp",
                 "user": "ovs",
                 "port": 5672,
                 "password": "$unencrypted_password",
                 "queues": {"storagedriver": "volumerouter"}},
/plugins/installed = {"backends": [$plugin_a],
                      "generic": [$plugin_a, $plugin_b]},
/versions = {"ovs": 4,
             $plugin_a: $plugin_a_version},
/stores = {"persistent": "pyrakoon",
           "volatile": "memcache"},
/paths = {"cfgdir": "/opt/OpenvStorage/config",
          "basedir": "/opt/OpenvStorage",
          "ovsdb": "/opt/OpenvStorage/db"},
/support = {"enablesupport": True|False,
            "enabled": True|False,
            "interval": 60},
/storagedriver = {"mds_safety": 2,
                  "mds_tlogs": 100,
                  "mds_maxload": 75},
/webapps = {"html_endpoint": "/",
            "oauth2": {"mode": "local|remote",
                       "authorize_uri": "$autorize_url_for_remote",
                       "client_id": "$client_id_for_remote",
                       "client_secret": "$client_secret_for_remote",
                       "scope": "$scope_for_remote",
                       "token_uri": "$token_uri_for_remote"}}
/hosts/$host_id/storagedriver = {"rsp": "/var/rsp",
                                 "vmware_mode": "ganesha"},
/hosts/$host_id/ports = {"storagedriver": [[26200, 26299]],
                         "mds": [[26300, 26399]],
                         "arakoon": [26400]},
/hosts/$host_id/setupcompleted = True|False,
/hosts/$host_id/type = "MASTER|EXTRA|UNCONFIGURED"

The /ovs/framework/webapps/oauth2 mode can be either local or remote. When using remote, certain extra keys on how to reach the remote authentication platform should be given.

  • authorize_uri: If a user is not logged in, he will be redirected to this page for authenitcation
  • client_id: OVS client identification towards remote oauth2 platform
  • client_secret: OVS client password for authenitcation to remote oauth2 platform
  • scope: Requested scope for OVS users
  • token_uri: URI where OVS can request the token

Alba = /ovs/alba and for the proxies /ovs/vpools/$vpool_guid/proxies

/ovs/vpools/$vpool_guid/proxies/$proxy_guid/config/abm = "$raw_alba_manager_init_config_format"
/ovs/vpools/$vpool_guid/proxies/$proxy_guid/config/main = {"log_level": "$log_level",
                                                           "port": $proxy_port,
                                                           "ips": ["127.0.0.1"],
                                                           "manifest_cache_size": 100000,
                                                           "fragment_cache_dir": "$fc_dir",
                                                           "fragment_cache_size": "$fc_size",
                                                           "albamgr_cfg_url": "etcd://127.0.0.1:2379/$proxy_abm_key"}

All keys stated below are relative to /ovs/alba. E.g. /foo/bar will be /ovs/alba/foo/bar.

/backends/$backend_guid/maintenance/config = {"log_level": "info",
                                              "albamgr_cfg_url": "etcd://127.0.0.1:2379/ovs/arakoon/$abm_name/config"}
/asdnodes/$node_id/config/network = {"ips": [$asd_ip_1, $asd_ip_2],
                                     "port": $asd_start_port}
/asdnodes/$node_id/config/main = {"node_id": "$node_id",
                                  "password": "$random_generated_password",
                                  "username": "root",
                                  "ip": "$api_ip",
                                  "port": $api_port,
                                  "version": $api_version}
/asds/$asd_id/config = {"home": "$data_directory",
                        "node_id": "$node_id",
                        "asd_id": "$asd_id",
                        "log_level": "info",
                        "port": $asd_port}

$node_id is generated by the ASD Manager during installation. At certain points, there is a referal to an albamgr configuration path to make sure abm configuration (which is basically an arakoon cluster configuration) is deduplicated as much as possible).

Certain configuration sections are split up because they contain different information. For example the asdnodes contain a main key containing information about the node itself (api inforamtion, ...) and a network key with information for asds that can be configured on that node (e.g. the port range for the asds)

Arakoon = /ovs/arakoon

The same principal, but again, there's an immutable part in the configuration. Cluster evolution is in a first implementation still orchestrated by the framework.

/ovs/arakoon/$cluster_name/config = "
[global]
cluster = UORC2S5PmYcAabOI
__tainted_tlog_entries_per_file = 5000
cluster_id = ovsdb
plugins = 

[UORC2S5PmYcAabOI]
tlog_compression = snappy
client_port = 26400
messaging_port = 26401
name = UORC2S5PmYcAabOI
fsync = true
home = /opt/OpenvStorage/db/arakoon/ovsdb/db
ip = 172.20.54.251
log_level = info
tlog_dir = /opt/OpenvStorage/db/arakoon/ovsdb/tlogs
log_dir = /var/log/arakoon/ovsdb"

Volumedriver

VolumeDriver is configured from a JSON file. A Python API call to pass a new JSON config to a running VolumeDriver exists. Keys can be divided into 3 categories:

  • dynamically reconfigurable: these settings are reconfigurable at runtime, e.g. the default DTL configuration, ...
  • reconfigurable across restarts: ClusterCache mountpoints, SCO Cache params, ...
  • not reconfigurable: cluster ID, node ID, ...

The complete list of configuration parameters can be found under the volumedriver release notes

The Python APIs update_configuration signals the volumedriver to reload its configuration and returns a list of successfully applied updates.

Volumes can be created with a number of interfaces, the most limiting one at the moment being the filesystem interface: the only available input there is the path. Since the path is not stable (the parent directory could be renamed, ...), volumes are identified by UUIDs (also used for the backend namespace/bucket). To create a volume, default values (specified in the JSON config of the VolumeDriver) are used which can later on be overridden using Python API calls. Python API calls are automatically routed to the node owning the volume being targeted.

volumedriver configuration

All keys for stated below are relative to /ovs/vpools/$vpool_guid/hosts/$vrouter_id/config

$vrouter_id = unique identification string for each storagedriver instance

/content_addressed_cache = { "read_cache_serialization_path": "/var/rsp/$vpool_name", 
                             "clustercache_mount_points": [ { "path": "/mnt/ssd1/${vpool_name}_read_1/read.dat", 
                                                              "size": "267173888KiB" } ] }
/backend_connection_manager = { "alba_connection_preset": "$preset_name", 
                                "backend_type": "ALBA", 
                                "alba_connection_timeout": 15, 
                                "alba_connection_host": "$ip", 
                                "alba_connection_port": $port }
/filesystem = { "fs_raw_disk_suffix": ".raw", 
                "fs_metadata_backend_type": "MDS", 
                "fs_metadata_backend_arakoon_cluster_nodes": [], 
                "fs_dtl_mode": "Asynchronous|Synchronous", 
                "fs_metadata_backend_mds_nodes": [ { "host": "$ip", 
                                                     "port": $port }, 
                                                   { "host": "$ip", 
                                                     "port": $port } ], 
                "fs_virtual_disk_format": "raw", 
                "fs_dtl_config_mode": "Automatic|Manual", 
                "fs_file_event_rules": [ { "fs_file_event_rule_calls": [ "Mknod", 
                                                                         "Unlink", 
                                                                         "Rename", 
                                                                         "Write" ], 
                                           "fs_file_event_rule_path_regex": "(?!vmcasts)(.*.xml)" } ] }
/volume_registry = { "vregistry_arakoon_cluster_id": "voldrv", 
                     "vregistry_arakoon_cluster_nodes": [ { "host": "$ip", 
                                                            "node_id": "$node_id", 
                                                            "port": $port }, 
                                                          { "host": "$ip",
                                                            "node_id": "$node_id", 
                                                            "port": $port }, 
                                                          { "host": "$ip", 
                                                            "node_id": "$node_id", 
                                                            "port": $port } ] }
/volume_router_cluster = { "vrouter_cluster_id": "$vrouter_cluster_guid" }
/failovercache = { "failovercache_path": "/mnt/ssd2/${vpool_name}_write_dtl_1", 
                   "failovercache_transport": "TCP" }
/volume_router = { "vrouter_migrate_timeout_ms": 5000, 
                   "vrouter_sco_multiplier": 16384, 
                   "vrouter_redirect_timeout_ms": "5000", 
                   "vrouter_id": "${vpool_name}qrbINBX73FmMd2Zt", 
                   "vrouter_file_read_threshold": 1024, 
                   "vrouter_file_write_threshold": 1024, 
                   "vrouter_volume_write_threshold": 1024, 
                   "vrouter_max_workers": 16, 
                   "vrouter_min_workers": 4, 
                   "vrouter_routing_retries": 10, 
                   "vrouter_volume_read_threshold": 1024, 
                   "vrouter_backend_sync_timeout_ms": 5000 }
/volume_manager = { "clean_interval": 1, 
                    "tlog_path": "/mnt/ssd2/${vpool_name}_db_tlog_1", 
                    "non_disposable_scos_factor": 20.0, 
                    "metadata_path": "/mnt/ssd2/${vpool_name}_db_md_1", 
                    "foc_throttle_usecs": 4000, 
                    "read_cache_default_mode": "ContentBased|LocationBased", 
                    "number_of_scos_in_tlog": 1, 
                    "read_cache_default_behaviour": "CacheOnWrite|CacheOnRead|NoCache" }
/scocache = { "trigger_gap": "1GB",
              "backoff_gap": "2GB",
              "scocache_mount_points": [ { "path": "/mnt/ssd2/${vpool_name}_write_sco_1", 
                                           "size": "239910912KiB" } ] }
/distributed_lock_store = { "dls_arakoon_cluster_id": "voldrv", 
                            "dls_type": "Arakoon",
                            "dls_arakoon_cluster_nodes": [ { "host": "$ip", 
                                                             "node_id": "$node_id", 
                                                             "port": $port }, 
                                                           { "host": "$ip", 
                                                             "node_id": "$node_id", 
                                                             "port": $port }, 
                                                           { "host": "$ip", 
                                                             "node_id": "$node_id", 
                                                             "port": $port }, ], }
/file_driver = { "fd_extent_cache_capacity": "1024", 
                 "fd_namespace": "fd-${vpool_name}-69513c90-c4f1-431b-a60e-7699745e7c08", 
                 "fd_cache_path": "/mnt/ssd2/${vpool_name}_write_fd_1" }
/event_publisher = { "events_amqp_routing_key": "volumerouter", 
                     "events_amqp_uris": [ { "amqp_uri": "amqp://$user:$password@$ip" }, 
                                           { "amqp_uri": "amqp://$user:$password@$ip" }, 
                                           { "amqp_uri": "amqp://$user:$password@$ip" } ] }
/metadata_server = { "mds_nodes": [ { "host": "$ip", 
                                      "scratch_directory": "/mnt/ssd2/${vpool_name}_db_mds_1", 
                                      "port": $port, 
                                      "db_directory": "/mnt/ssd2/${vpool_name}_db_mds_1" } ] }
/threadpool_component = { "num_threads": 16 }
volume configuration

The configuration for every volume is stored as follows:

/ovs/vpool/$vpool_id/volumes/$volume_id/config = { "volume_router": $vrouter_id,
                                                   "name": $volume_name,
                                                   "filesystem_path": "/path/to/volume",
                                                   "sco_size": 64,
                                                   "read_cache_mode": "ContentBased|LocationBased",
                                                   "readcache_limit": 2048,
                                                   "write_buffer": 1024,
                                                   "dtl_mode": "Asynchronous|Synchronous",
                                                   "dtl_target": $ip,
                                                   "cache_strategy": "CacheOnWrite|CacheOnRead|NoCache",
                                                   "parent_volume_id" : $volume_id,
                                                   "snapshot_id": $snapshot_id }

how to (re)start / update components

volumedriver

On startup VolumeDriver is passed the relevant information to contact an etcd cluster and retrieves the relevant configuration from there.

/usr/bin/volumedriver_fs.sh -f --config etcd://<etcd coords>/<path_to_vpool_config> --lock-file /opt/OpenvStorage/run/storagedriver_<VPOOL_NAME>.lock --logrotation --mountpoint <VPOOL_MOUNTPOINT> --logfile /var/log/ovs/volumedriver/<VPOOL_NAME>.log -o big_writes -o sync_read -o allow_other -o use_ino -o default_permissions -o uid=<OVS_UID> -o gid=<OVS_GID> -o umask=0002

where for example

<etcd coords> = '127.0.0.1:2379'
<path_to_vpool_config> = '/ovs/vpools/261520ba-3688-45ea-8e09-a306f430ef4a/hosts/vpool7vl2AEEYFVislxpy/config'

Updating The Configuration At Runtime

Configuration updates might need orchestration (adding a new Arakoon node for example requires adding its NodeId, IP address and TCP port (from the VolumeDriver POV), let alone starting it up in the first place before notifying consumers). Watching individual keys will lead to VolumeDriver having to deal with partial updates. Instead it's preferable to introduce a special key to watch for config updates. Since not all keys can be reconfigured dynamically or invalid values could be rejected, it might be preferable to offer a channel (a key in etcd) for feedback that e.g. returns the list of successfully applied updates. This however will not be taken care of in the initial implementation.

how to start alba asd:

$> alba asd-start --config etcd://<etcd coords>/<path_to_asd>
  • The value would be the the literal json needed (maybe later more fine grained)

  • The asd will listen for changes, and will act accordingly is mutable configuration values (log_level, ips, port) changed.

  • It should die if immutable configuration values were changed.

    • @todo WRONG !!!! should not die, the asd should ignore these changes untill next restart
  • @ask How does an asd know that an immutable configuration value changed ? It can know this if it's running and a new config was fetched by comparing the current config with the new one. But what about a restart ?

    • NO NEED ! immutable are fetched when asd restart's no need for ASD to do this

how to start alba proxy:

$> alba proxy-start --config etcd://<etc/coords>/<path_to_proxy>
  • the value behind that key is again the literal json needed.
  • The albamgr_cfg_file changes to a URI
  • we don't have any important persistent state (the fragment cache would be lost if that config is changed, but that's not a problem )

Alba maintenance

Similar changes as above, but again, maintenance drones have no persistent state. The team configuration is maintained inside the alba manager.

  • @ask not clear ???

remarks

Etcd C++ libs

Challenges / Open Questions

  • updates to keys that are reconfigurable across restarts will be out of sync with the real state until VolumeDriver was restarted.
  • VolumeDriver offers a Python API call to retrieve the running configuration.
    • @todo signals should be used here

Volume Lifecycle Management

Next to service configuration management etcd can be used as a declarative API for volume configuration/lifecycle management. The creation of a volume or update can be requested by creating or updating a volume configuration in the etcd tree.

Per volumedriver instance a thread performs a watch on /ovs/vpool/$vpool_id/volumes/ and upon modification will figure out the action to be performed create/clone/update/delete

Challenges / Open Questions
  • consistency: volume configuration information is currently stored in the backend namespace / Arakoon. Keeping that in etcd as well poses the danger of ending up with inconsistencies, e.g. a volume is successfully migrated from node N -> M but updating etcd fails @todo you only want 1 location for your config mgmt, why would update etcd fail? is no different for arakoon
  • locating volumes: volumes must be reconfigured on their owning node - how can this be efficiently married with watchers? @todo each voldriver has their watcher, its like a signal, then it knows it needs to reload
  • which role will the Python API play, i.e. what is done via the API(check status) and what via etcd? @todo
  • What about volumes being removed from etcd which are in use by the filesystem -> Ignore and let the FUSE run into error? @todo this is a handling error, this should never be the case if so the automator made a mistake