TorchCraftAI
A bot for machine learning research on StarCraft: Brood War
Public Types | Public Member Functions | Static Public Member Functions | Static Public Attributes | List of all members
cpid::Cpid2kWorker Class Reference

Helper class for job coordination via a central Redis instance. More...

#include <cpid2kworker.h>

Public Types

using Clock = std::chrono::steady_clock
 

Public Member Functions

 Cpid2kWorker (Cpid2kWorkerInfo info, std::string prefix, std::string host, int port=6379, int64_t hbIntervalMs=10 *1000)
 
 ~Cpid2kWorker ()
 
Cpid2kWorkerInfo const & info () const
 
std::string_view prefix () const
 
bool consideredDead () const
 Checks whether this worker is considered dead by the scheduler. More...
 
bool isDone ()
 Checks whether the training job is considered to be done. More...
 
std::string redisKey (std::string_view key) const
 Returns a prefixed key. More...
 
std::shared_ptr< RedisClientthreadLocalClient ()
 Returns a Redis client dedicated to the calling thread. More...
 
Cpid2kHeartBeaterheartBeater ()
 
distributed::Contextdcontext (std::string const &role=kAnyRole, std::chrono::milliseconds timeout=kDefaultTimeout)
 Provides a cpid::distributed context among workers with the given by role. More...
 
void discardDContext (std::string const &role=kAnyRole)
 Discards the cpid::distributed context that was previously created for workers with the specified role. More...
 
std::vector< Cpid2kWorkerInfopeers (std::string_view role=kAnyRole)
 Provides information about peers, filtered by role. More...
 
std::vector< std::string > serviceEndpoints (std::string const &serviceName)
 
bool waitForOne (std::string_view role, std::chrono::milliseconds timeout=kNoTimeout)
 Block until a worker with the specified role is available, or until a specified time has passed. More...
 
bool waitForAll (std::string_view role, std::chrono::milliseconds timeout=kNoTimeout)
 Block until a all workers with the specified role are available, or until a specified time has passed. More...
 
void appendMetrics (std::string_view metricsName, nlohmann::json const &json)
 

Static Public Member Functions

static std::unique_ptr< Cpid2kWorkerfromEnvVars (Cpid2kWorkerInfo const &)
 
static std::unique_ptr< Cpid2kWorkerfromEnvVars ()
 

Static Public Attributes

static std::string const kAnyRole = "*"
 
static std::chrono::milliseconds const kNoTimeout
 
static std::chrono::milliseconds const kDefaultTimeout
 

Detailed Description

Helper class for job coordination via a central Redis instance.

In a nutshell, the Cpid2kWorker class does the following:

For manual operations on the Redis database, use threadLocalClient() to obtain a RedisClient instance for the current thread. Note that these will be re-used.

All public functions are thread-safe, i.e. it's alright to call them from several trainer or game threads.

Member Typedef Documentation

using cpid::Cpid2kWorker::Clock = std::chrono::steady_clock

Constructor & Destructor Documentation

cpid::Cpid2kWorker::Cpid2kWorker ( Cpid2kWorkerInfo  info,
std::string  prefix,
std::string  host,
int  port = 6379,
int64_t  hbIntervalMs = 10 * 1000 
)
cpid::Cpid2kWorker::~Cpid2kWorker ( )

Member Function Documentation

void cpid::Cpid2kWorker::appendMetrics ( std::string_view  metricsName,
nlohmann::json const &  json 
)
bool cpid::Cpid2kWorker::consideredDead ( ) const

Checks whether this worker is considered dead by the scheduler.

distributed::Context & cpid::Cpid2kWorker::dcontext ( std::string const &  role = kAnyRole,
std::chrono::milliseconds  timeout = kDefaultTimeout 
)

Provides a cpid::distributed context among workers with the given by role.

If this function succeeds, rendez-vous has been successful. If rendez-vous fails or there are no peers available for the given role, an exception will be thrown. The worker calling this function is required to match the given role.

Contexts are cached for re-use and invalidated (and hence re-constructed) if job constellation changes.

timeout describes the timeout for all gloo primitives performed with this context. The rendez-vous will be done with timeout that is twice as high as the one specified. The default timeout value is 1.5 times the heartbeat interval – the intention is to notice job constellation changes upon retries, and failing peers will be detected once their heartbeat expires.

void cpid::Cpid2kWorker::discardDContext ( std::string const &  role = kAnyRole)

Discards the cpid::distributed context that was previously created for workers with the specified role.

std::unique_ptr< Cpid2kWorker > cpid::Cpid2kWorker::fromEnvVars ( Cpid2kWorkerInfo const &  info)
static
std::unique_ptr< Cpid2kWorker > cpid::Cpid2kWorker::fromEnvVars ( )
static
Cpid2kHeartBeater& cpid::Cpid2kWorker::heartBeater ( )
inline
Cpid2kWorkerInfo const & cpid::Cpid2kWorker::info ( ) const
bool cpid::Cpid2kWorker::isDone ( )

Checks whether the training job is considered to be done.

This function returns true if

  • consideredDead() returns true
  • the global done flag is set in the Redis database
std::vector< Cpid2kWorkerInfo > cpid::Cpid2kWorker::peers ( std::string_view  role = kAnyRole)

Provides information about peers, filtered by role.

std::string_view cpid::Cpid2kWorker::prefix ( ) const
std::string cpid::Cpid2kWorker::redisKey ( std::string_view  key) const

Returns a prefixed key.

This should be used when working with the Redis database using threadLocalClient().

std::vector< std::string > cpid::Cpid2kWorker::serviceEndpoints ( std::string const &  serviceName)
std::shared_ptr< RedisClient > cpid::Cpid2kWorker::threadLocalClient ( )

Returns a Redis client dedicated to the calling thread.

The client can be used to perform custom operations on the Redis database. hiredis clients are not thread-safe, hence we'lll keep around a dedicated client for each thread that requires one.

bool cpid::Cpid2kWorker::waitForAll ( std::string_view  role,
std::chrono::milliseconds  timeout = kNoTimeout 
)

Block until a all workers with the specified role are available, or until a specified time has passed.

Returns true once all workers are available and false if the function times out. A timeout of zero will disable timing out.

bool cpid::Cpid2kWorker::waitForOne ( std::string_view  role,
std::chrono::milliseconds  timeout = kNoTimeout 
)

Block until a worker with the specified role is available, or until a specified time has passed.

Returns true once a worker is available and false if the function times out. A timeout of zero will disable timing out. If there will be no worker with the given role, throw an exception.

Member Data Documentation

std::string const cpid::Cpid2kWorker::kAnyRole = "*"
static
std::chrono::milliseconds const cpid::Cpid2kWorker::kDefaultTimeout
static
Initial value:
=
std::chrono::milliseconds(-1)
std::chrono::milliseconds const cpid::Cpid2kWorker::kNoTimeout
static
Initial value:
=
std::chrono::milliseconds::zero()

The documentation for this class was generated from the following files: