Tuning

MITuna is a distributed tuning infrastructure that provides pre-compiled kernels for MIOpen customers through automated Jenkins pipelines and SLURM scalable architecture. MITuna also provides a scalable task management infrastructure ready to integrate with external libaries. The Example library provides a sample on how to achieve this.

Tuning through MIOpen

As a high-performance kernels library, MIOpen needs a substantive tuning effort to discover the optimal tuning parameters. Kernel tuning entails compiling and running MIOpen kernels with different tuning parameters to determine the best performing tuning parameters for each kernel. While MIOpen contains much of the logic needed to iterate over possible tuning parameters, it is only applicable to a single machine. Therefore, a mechanism is required to parallelize this process across different machines as well as across multiple GPUs to speed up this inherently parallel procedure. Among other features, such a framework, it needs to be able to handle errors in both MIOpen and the stack on which MIOpen depends.

Tuna is MIOpens team library, which parallelizes the tuning procedure across multiple GPUs on multiple machines. In addition to distributing jobs to servers, it is aware of the various architectures, whether a tuning effort was successful, or resulted in an error and other housekeeping. This makes it a useful automation tool. Tuna is also the custodian of the convolution layer parameters of interest (to the MIOpen team), received from customers, as well as various benchmarks. With the introduction of ‘find database’ for immediate mode, Tuna is also responsible for generating Find database as well as the upcoming precompiled kernels package.

Tuna uses [Celery](https://docs.celeryq.dev/en/stable/getting-started/introduction.html) as a scheduler and offloads job scheduling. The backend and broker for Celery are both implemented through Redis. Tuning jobs are enqueued in a Redis queue then launched and executed through one or more Celery workers depending on the operation - compile or eval.

When do we tune

There are two occasions that trigger tuning: 1. Someone opens a Github issue that contains the configurations and network to be tuned. This implies we only need to tune the network specified in the issue along with the configurations specified. If the person requesting this did not mention any configurations, please ask for them. The Tuna team does not provide these. 2. Recurrent configurations need retuning when internals of MIOpen/Tuna change. The tuning phase of all the recurrent configurations takes up to a few days. There are many configurations used for each network and one should try and use as many machines as possible to speed up the tuning part.

MIOpen Tuning Steps

Tuning stores final data in a central mySQL database(DB). Each reference to a table, refers to a table in this database. Table mixins can be found in [tuna_tables.py](tuna/db/tuna_tables.py) and [session_mixin.py](tuna/db/session_mixin.py). The actual implementations of these mixins can be found in the case of MIOpen in [miopen/db](tuna/miopen/db). Intermittent tuning data generated by the celery workers is stored in a REDIS DB. The MITuna enqueue call will then drain this DB and populate the mySQL DB with the final results.

Tuning is divided in multiple steps and each step builds on top of the previous ones. To start a tuning session, some prerequisite have to be asserted: setting up configurations, getting the latest solvers and their associated applicability from MIOpen, and adding the jobs that compose the tuning session. The correct environment variables defined in the README must be set in each tuning terminal as well. Once these prerequisite are established the tuning session can begin. Each step, including the prerequisites are detailed below.

Add Network Configurations(1)

Before a configuration gets tagged, a model and framework need to be added. This allows for benchmarking of a certain model, post tuning.

./go_fish.py --add_model Resnet50 --md_version 1
./go_fish.py --add_framework Pytorch --fw_version 1
--add_model - model name
--md_version - model version
--add_framework - framework name
--fw_version - framework version

The config table contains network configurations. If provided with a text file of MIOpenDriver commands, the import script can translate those commands and populate the config table. Additionally the user may provide a name to tag a configuration for easier recall later. A tag will be required when adding a tuning job. Tags are stored in the config_tags table. A model and framework name and version are also required. This enables MITuna to track benchmark performance post-tuning.

./go_fish.py miopen import_configs --add_model Resnet50 --md_version 1
./go_fish.py miopen import_configs --add_framework Pytorch --fw_version 1
./go_fish.py miopen import_configs -t resnet50 -f ../utils/recurrent_cfgs/resnet50.txt
--model Resnet50 --md_version 1 --framework Pytorch --fw_version 1</p>
-t - tag
-f - filepath
--model - model name
--md_version - model version
--framework - framework name
--fw_version - framework version

Add Solvers (2)

The solver table contains MIOpen solvers and solver characteristics. This should be updated when an MIOpen version modifies solvers.

./go_fish.py miopen --update_solvers

Add Tuning Session (3)

Session will track the architecture and skew, as well as the miopen version and rocm version for the tuning session.

This command will need to be run from inside the tuning environment eg MITuna docker and will populate the table with the version and architecture information.

[Use backend=HIPNOGPU docker]

 ./go_fish.py miopen --init_session -l reason
 --init_session - create a session entry
 -l             - reference text description

Add Applicability (4)

Each network configuration has a set of applicable solvers. This step will update the solver_applicability table with applicable solvers for each configuration for the session.

[Use backend=HIPNOGPU docker]

 ./go_fish.py miopen --update_applicability --session_id 1
 --session_id - tuning session id

Load Jobs (5)

Time to create the jobs for the tuning session. Specify the session id, the configs that should be tuned, and the fin_step to be executed. Configs can be added by using the tag from the config_tags table. Jobs should have a compile and an eval MIFin step pair.

Fin steps include: miopen_perf_compile, miopen_perf_eval, miopen_find_compile, and miopen_find_eval.

./load_job.py --session_id 1 -t resnet50 --fin_steps miopen_perf_compile,miopen_perf_eval -o -l reason
--session_id - tuning session id
-t           - config tag
--fin_steps  - operations to be performed by MIFin (tuning handle into miopen)
-o           - only_applicable, will create a job for each applicable solver
-l           - reference text description

Compile Step (6)

Once prerequisites are set, tuning can begin. To compile the jobs, supply the session id along with the compile fin_step matching the one in the job table. This step is launched in 2 different terminals: the job-enqueue terminal and the job-execution terminal.

To enqueue the jobs run the following on any node:

[Use backend=HIPNOGPU docker]

 ./go_fish.py miopen --session_id 1 --fin_steps miopen_perf_compile --enqueue_only
 --session_id    - tuning session id
 --fin_steps     - execute this operation
 --enqueue_only  - enqueue the jobs to the redis queue

To launch the jobs through Celery workers, on the compile node run:

./go_fish.py miopen --session_id 1 --fin_steps miopen_perf_compile
--session_id    - tuning session id
--fin_steps     - execute this operation

Evaluation Step (7)

Once compilation has been started, evaluation can also be launched. This command is similar to the previous. It is also comprised of 2 steps, the job enqueue process, and the job execution process that launched Celery workers on the evaluation node.

[Use backend=HIP docker] To enqueue the jobs run the following on any node:

./go_fish.py miopen --session_id 1 --fin_steps miopen_perf_eval --enqueue_only
--session_id    - tuning session id
--fin_steps     - execute this operation
--enqueue_only  - enqueue the jobs to the redis queue

Database Export (8)

To export the results the export_db.py script can be run with options for selecting session as well as database type.

The outputs of this function are database files in the format that MIOpen keeps and manages. eg for MI100, -p will produce a gfx90878.db file, -f will produce gfx90878.HIP.fdb.txt, and -k will produce gfx90878.kdb.

./export_db.py --session_id 1 -p
--session_id - tuning session id
-p           - export performance db
-f           - export find db
-k           - export kernel db

Note

A celery worker can also be launched manually. It requires a few extra env variables. Launch the enqueue step in a terminal, then separately launch the celery worker manually, sample:

export CELERY_BROKER_URL=redis://<hostname>:6379/14
export CELERY_RESULT_BACKEND=redis://<hostname>:6379/15
cd MITuna
celery -A tuna.celery_app.celery_app worker -l info --logfile=<logfile> -n <unique_worker_name> -Q <q_name>

Sample manual launch for the eval step:

export CELERY_BROKER_URL=amqp://<user><password>@<hostname>:6379/14
export CELERY_RESULT_BACKEND=redis://<hostname>:6379/15
cd MITuna
<for each GPUID>
  celery -A tuna.celery_app.celery_app worker -l info --logfile=<logfile>_gpu_id_<GPUID> -n <unique_worker_name>_gpu_id_<GPUID> -Q <q_name> -c 1

Launching the worker manually can help with debugging part of the code used by the celery worker such as the decorated celery task (@app.task).

MIOpen Golden Database

Tuna’s MySQL database tracks versioned data for MIOpen. These versions are kept in the golden table. A golden miopen version holds the complete tuning history at each step.

Adding to the Golden Table

Once a tuning session is approved, the results in the generated find db may be used to populate the golden table.

./update_golden.py --session_id \<id\> --golden_v \<new_ver\> --base_golden_v \<old_ver\>
--golden_v      - create this golden version
--base_golden_v - initialize the new golden version with this previous golden data
--session_id    - id of the tuning session to populate the golden version
--overwrite     - may be used to force writing to existing golden_v

If there are no previous golden version –base_golden_v need not be specified. Otherwise writing a new golden version will require –base_golden_v.

Tuning through the Example library

An example of how to integrate external applications in MITuna.

Example is mock library that runs the rocminfo binary. The supported tuning steps are:

./go_fish.py example --add_tables
./go_fish.py example --init_session -l my_label
./example/load_job.py -a gfx908 -n 120 -l my_label --session_id 1
./go_fish.py example --execute --session_id 1

The first step is:

./go_fish.py example --add_tables

This command will create the following new tables in the DB: * machine * session_example * job

The next command is:

./go_fish.py example --init_session -l my_label

This command will add a new session in the session_example table. This session id will be used to add new jobs and track the tuning data, post execution step.

The third step is:

./tuna/example/load_job.py -a gfx908 -n 120 -l my_label --session_id 1

This steps loads jobs in the job table. These jobs will be picked up for execution in the next step. Once these jobs are completed their status will be updated to ‘completed’ or ‘errored’.

The last step:

./go_fish.py example --execute --session_id 1

This command will pick up jobs in the new state from the job tables associated with the session_id 1. The job status will be updated as the jobs are executing, from new to running and completed or errored.

To integrate a new library, similar source code would have to be provided, as the one included in /tuna/example. The full MIOpen library source code for tuning is included in /tuna/miopen.