compute package

Submodules

compute.config module

Reading Algorithm config used and the ini file

class compute.config.AlgorithmConfig

Bases: compute.config.Config

Read the section Algorithm in global.ini.

class compute.config.Config(config_file, section)

Bases: object

Read and write ini files.

read_ini_file(key)

Write ini files.

Args:

key (str): The name of the key to be written. value (str): Value to be written.

write_ini_file(key, value)
Args:

config_file (str): The name of ini file. section (str): The name of the part to be read or written.

class compute.config.RedisConfig

Bases: compute.config.Config

Read the section Redis in global.ini.

compute.db module

compute.db.add_info(gpu_info, info)

Two tables will be operated, gpu_list and gpu_arxiv_list all the gpu infomation should be added to gpu_arxiv_list while only the available gpu list is added to gpu_list, because operating this table, a delete operation should be given

Args:

gpu_info (dict): The gpu infomation info (list): The information to be added

compute.db.check_table(conn, table_name)

Get the table

Args:

conn (class): The connection of SQL connection. table_name (str): The table name

compute.db.confirmed_used_gpu(ids)

Delete the information of used gpus.

compute.db.get_available_gpus()

Get the information of available gpus.

compute.db.get_db_name()

Get the name of SQL database

compute.db.get_db_path()

Get the path of SQL database

compute.db.init_db()

Initial SQL database

compute.file module

compute.file.detect_file_exit(ssh_name, ssh_password, ip, file_name)

Detect if the file exist

Args:

ssh_name (str): The name of user on node. ssh_password (str): The password of user on node. ip (str): The ip of node. file_name (str): The file name to be detected.

Returns:

bool: If the file exist.

compute.file.exec_cmd_remote(_cmd, need_response=True)

Remote run command.

Args:

_cmd (str): The command. need_response (bool): Is the response required.

Returns:

str: The stdout_str. str: The stderr_str.

compute.file.exec_python(ssh_name, ssh_password, ip, py_file, args, python_exec='/usr/bin/python3')

Execute python command on node.

Args:

ssh_name (str): The name of user on node. ssh_password (str): The password of user on node. ip (str): The ip of node. py_file (str): The name of python file. args (dict): The args of python file.

compute.file.get_algo_local_dir()

Get the corresponding folder under the running algorithm runtime.

Returns:

str: The corresponding folder under the running algorithm runtime.

compute.file.get_algo_name()

Get the name of the current algorithm to run.

Returns:

str: The name of the current algorithm to run.

compute.file.get_all_edl_modules()

Get name and relative path of the modules in edl project

Returns:

dict: The name and relative path of the modules in edl project

compute.file.get_dependences_by_module_name(module_name)

Get dependences by module name.

Args:

module_name (str): The name of modules.

Returns:

dict: The dependences fict.

compute.file.get_gen_number()

Get the max generation number.

Returns:

int: The max generation number.

compute.file.get_global_ini_path()

Get the absolute path of global.ini.

Returns:

str: The absolute path of global.ini.

compute.file.get_local_path()

Get the absolute path of the project.

Returns:

str: The absolute path of the project.

compute.file.get_pop_siz()

Get the population size.

Returns:

int: The population size.

compute.file.get_population_dir()

Get the populations folder in the corresponding folder under the running algorithm runtime, or create it if it does not exist.

Returns:

str: the populations folder.

compute.file.get_top_dest_dir()

Get the path of the algorithm under the root path of the server.

Returns:

str: The path of the algorithm under the root path of the server.

compute.file.get_train_ini_path()

Get the absolute path of train.ini.

Returns:

str: The absolute path of train.ini.

compute.file.get_training_file_dependences()

Get training file dependences.

Returns:

dict : The dependences.

compute.file.init_work_dir(ssh_name, ssh_password, ip)

Initial the dir on compute node.

Args:

ssh_name (str): The name of user on node. ssh_password (str): The password of user on node. ip (str): The ip of node.

compute.file.init_work_dir_on_all_workers()

Initial the dir on all compute nodes.

compute.file.makedirs(ssh_name, ssh_password, ip, dir_path)

Create the dir on node.

Args:

ssh_name (str): The name of user on node. ssh_password (str): The password of user on node. ip (str): The ip of node. dir_path (str): The dir path to be created.

compute.file.sftp_makedirs(sftp_sess, dir_path)

Use relative path to transfer file, both source and dest are relative path

Args:

ssh_name (str): The name of user on node. ssh_password (str): The password of user on node. ip (str): The ip of node. source (str): The name of source path. dest (str): The name of dest path.

compute.file.sftp_transfer(sftp_sess, src_path, dst_path)

transfer file from src_path to dst_path.

Args:

sftp_sess (paramiko.sftp_client.SFTPClient): The SFTPClient. src_path (str): The source path. dst_path (str): The dest path.

compute.file.transfer_file_relative(ssh_name, ssh_password, ip, source, dest)

Use relative path to transfer file, both source and dest are relative path

Args:

ssh_name (str): The name of user on node. ssh_password (str): The password of user on node. ip (str): The ip of node. source (str): The name of source path. dest (str): The name of dest path.

compute.file.transfer_training_files(ssh_name, ssh_password, worker_ip)

Transfer training files to node.

Args:

ssh_name (str): The name of user on node. ssh_password (str): The password of user on node. worker_ip (str): The ip of node.

compute.gpu module

compute.gpu.detect_gpu()

Detecting the usage of each GPU.

compute.gpu.get_gpu_info()

Get gpu info.

Returns:

dict: The info of gpu.

compute.gpu.gpus_all_available()

Detecting all available GPUs.

Returns:

bool: If all GPUs are available.

compute.gpu.locate_gpu_to_be_used()

Detecting the usage of each GPU.

Returns:

dict: The info of a random available GPU.

compute.gpu.parse_nvidia_info(gpu_enabled_list, nvidia_info)
Args:

gpu_enabled_list (list): A list of the GPUs on node. nvidia_info (str): The information output via nvidia-smi.

Returns:

dict: The gpu info on node

compute.gpu.run_detect_gpu()

Continuous detection of the GPU

compute.log module

class compute.log.Log

Bases: object

Class for logging

classmethod debug(_str)

Logs a message with level DEBUG on this logger.

Args:

_str (str): The message to be logged.

classmethod info(_str)

Logs a message with level INFO on this logger.

Args:

_str (str): The message to be logged.

classmethod warn(_str)

Logs a message with level WARN on this logger.

Args:

_str (str): The message to be logged.

compute.pid_manager module

class compute.pid_manager.PIDManager

Bases: object

class SuperEnd

Bases: object

static kill_all_process()

Kill all running processes on the compute node.

static query_worker_tuple()
class WorkerEnd

Bases: object

static add_worker_pid(super_node_ip, super_node_pid, worker_node_ip)

Registry a process, only called by worker node

static remove_worker_pid(super_node_ip, super_node_pid, worker_node_ip)

Unregister a process

gpu_info = {'192.168.50.202': {'gpu': [0, 1], 'ssh_name': 'xiangning', 'ssh_password': '678^&*', 'worker_ip': '192.168.50.202', 'worker_name': 'cuda2'}}
class compute.pid_manager.PIDRedis

Bases: object

FORMAT_STR = 'EDL=>PID->|S:%s,SPID:%s,W:%s|'
IDENTIFIER_STR = 'EDL=>PID->|S:'
REDIS_DB_IP = '192.168.50.202'
REDIS_DB_PORT = '6379'
add_record(super_node_ip, super_node_pid, worker_node_ip, worker_node_pid)

Add a record into redis list

Args:

super_node_ip (str): center server’s ip. super_node_pid (int): pid for running process in machine of worker_node_ip. worker_node_ip (str): worker server’s ip. worker_node_pid (int): pid of running process in machine of worker_node_ip.

g = <compute.config.RedisConfig object>
get_all_keys()
get_all_record(super_node_ip, super_node_pid, worker_node_ip, clear=True)

Get all records

Args:

super_node_ip (str) : center server’s ip super_node_pid (int) : pid for running process in machine of worker_node_ip worker_node_ip (str) : worker server’s ip clear (bool): weather clear the buffer

remove_record(super_node_ip, super_node_pid, worker_node_ip, worker_node_pid)

remove a record from redis list

Args:

super_node_ip (str): center server’s ip. super_node_pid (int): pid for running process in machine of worker_node_ip. worker_node_ip (str): worker server’s ip. worker_node_pid (int): pid of running process in machine of worker_node_ip.

compute.process module

compute.process.dispatch_to_do(_id, _uuid)

Assign individuals to the available GPUs for evaluation.

Args:

_id (str): The name of the individual. _uuid (str): The identifier of the individual.

compute.redis module

class compute.redis.RedisLog(name)

Bases: object

Log to the redis database

MSG_TYPE = ['LOG', 'WRITE_FILE']
info(info)
static run_dispatch_service()
write_file(fdir, fname, data)
compute.redis.get_logger(logger_name, log_file, level=20)

Get logger

compute.utils module

class compute.utils.CacheUtils

Bases: object

Utils for the cache component.

classmethod get_lock_for_write_fitness()
classmethod load_cache_data()

Load the cache

Returns:

dict: The identifier and the corresponding fitness value.

classmethod save_fitness_to_cache(uuid, _acc)

Save fitness to cache.

Args:

uuid: identifier of the individual. _acc: The acc/fitness value of the individual.

Module contents