TorchCraftAI
A bot for machine learning research on StarCraft: Brood War
|
Helper class for job coordination via a central Redis instance. More...
#include <cpid2kworker.h>
Public Types | |
using | Clock = std::chrono::steady_clock |
Public Member Functions | |
Cpid2kWorker (Cpid2kWorkerInfo info, std::string prefix, std::string host, int port=6379, int64_t hbIntervalMs=10 *1000) | |
~Cpid2kWorker () | |
Cpid2kWorkerInfo const & | info () const |
std::string_view | prefix () const |
bool | consideredDead () const |
Checks whether this worker is considered dead by the scheduler. More... | |
bool | isDone () |
Checks whether the training job is considered to be done. More... | |
std::string | redisKey (std::string_view key) const |
Returns a prefixed key. More... | |
std::shared_ptr< RedisClient > | threadLocalClient () |
Returns a Redis client dedicated to the calling thread. More... | |
Cpid2kHeartBeater & | heartBeater () |
distributed::Context & | dcontext (std::string const &role=kAnyRole, std::chrono::milliseconds timeout=kDefaultTimeout) |
Provides a cpid::distributed context among workers with the given by role. More... | |
void | discardDContext (std::string const &role=kAnyRole) |
Discards the cpid::distributed context that was previously created for workers with the specified role. More... | |
std::vector< Cpid2kWorkerInfo > | peers (std::string_view role=kAnyRole) |
Provides information about peers, filtered by role. More... | |
std::vector< std::string > | serviceEndpoints (std::string const &serviceName) |
bool | waitForOne (std::string_view role, std::chrono::milliseconds timeout=kNoTimeout) |
Block until a worker with the specified role is available, or until a specified time has passed. More... | |
bool | waitForAll (std::string_view role, std::chrono::milliseconds timeout=kNoTimeout) |
Block until a all workers with the specified role are available, or until a specified time has passed. More... | |
void | appendMetrics (std::string_view metricsName, nlohmann::json const &json) |
Static Public Member Functions | |
static std::unique_ptr< Cpid2kWorker > | fromEnvVars (Cpid2kWorkerInfo const &) |
static std::unique_ptr< Cpid2kWorker > | fromEnvVars () |
Static Public Attributes | |
static std::string const | kAnyRole = "*" |
static std::chrono::milliseconds const | kNoTimeout |
static std::chrono::milliseconds const | kDefaultTimeout |
Helper class for job coordination via a central Redis instance.
In a nutshell, the Cpid2kWorker class does the following:
peers()
, isDone()
, etc.) and local status as seen by the scheduler (consideredDead()
).dcontext()
, waitForOne/All()
, etc.)For manual operations on the Redis database, use threadLocalClient()
to obtain a RedisClient instance for the current thread. Note that these will be re-used.
All public functions are thread-safe, i.e. it's alright to call them from several trainer or game threads.
using cpid::Cpid2kWorker::Clock = std::chrono::steady_clock |
cpid::Cpid2kWorker::Cpid2kWorker | ( | Cpid2kWorkerInfo | info, |
std::string | prefix, | ||
std::string | host, | ||
int | port = 6379 , |
||
int64_t | hbIntervalMs = 10 * 1000 |
||
) |
cpid::Cpid2kWorker::~Cpid2kWorker | ( | ) |
void cpid::Cpid2kWorker::appendMetrics | ( | std::string_view | metricsName, |
nlohmann::json const & | json | ||
) |
bool cpid::Cpid2kWorker::consideredDead | ( | ) | const |
Checks whether this worker is considered dead by the scheduler.
distributed::Context & cpid::Cpid2kWorker::dcontext | ( | std::string const & | role = kAnyRole , |
std::chrono::milliseconds | timeout = kDefaultTimeout |
||
) |
Provides a cpid::distributed context among workers with the given by role.
If this function succeeds, rendez-vous has been successful. If rendez-vous fails or there are no peers available for the given role, an exception will be thrown. The worker calling this function is required to match the given role.
Contexts are cached for re-use and invalidated (and hence re-constructed) if job constellation changes.
timeout
describes the timeout for all gloo primitives performed with this context. The rendez-vous will be done with timeout that is twice as high as the one specified. The default timeout value is 1.5 times the heartbeat interval – the intention is to notice job constellation changes upon retries, and failing peers will be detected once their heartbeat expires.
void cpid::Cpid2kWorker::discardDContext | ( | std::string const & | role = kAnyRole | ) |
Discards the cpid::distributed context that was previously created for workers with the specified role.
|
static |
|
static |
|
inline |
Cpid2kWorkerInfo const & cpid::Cpid2kWorker::info | ( | ) | const |
bool cpid::Cpid2kWorker::isDone | ( | ) |
Checks whether the training job is considered to be done.
This function returns true if
consideredDead()
returns truedone
flag is set in the Redis database std::vector< Cpid2kWorkerInfo > cpid::Cpid2kWorker::peers | ( | std::string_view | role = kAnyRole | ) |
Provides information about peers, filtered by role.
std::string_view cpid::Cpid2kWorker::prefix | ( | ) | const |
std::string cpid::Cpid2kWorker::redisKey | ( | std::string_view | key | ) | const |
Returns a prefixed key.
This should be used when working with the Redis database using threadLocalClient()
.
std::vector< std::string > cpid::Cpid2kWorker::serviceEndpoints | ( | std::string const & | serviceName | ) |
std::shared_ptr< RedisClient > cpid::Cpid2kWorker::threadLocalClient | ( | ) |
Returns a Redis client dedicated to the calling thread.
The client can be used to perform custom operations on the Redis database. hiredis clients are not thread-safe, hence we'lll keep around a dedicated client for each thread that requires one.
bool cpid::Cpid2kWorker::waitForAll | ( | std::string_view | role, |
std::chrono::milliseconds | timeout = kNoTimeout |
||
) |
Block until a all workers with the specified role are available, or until a specified time has passed.
Returns true once all workers are available and false if the function times out. A timeout of zero will disable timing out.
bool cpid::Cpid2kWorker::waitForOne | ( | std::string_view | role, |
std::chrono::milliseconds | timeout = kNoTimeout |
||
) |
Block until a worker with the specified role is available, or until a specified time has passed.
Returns true once a worker is available and false if the function times out. A timeout of zero will disable timing out. If there will be no worker with the given role, throw an exception.
|
static |
|
static |
|
static |