Python Hydra
A short introduction to python hydra and why is it useful.Hydra is a framework developed by FAIR (Facebook AI Research). As you might expect, it is primarily used for AI applications.
Introduction
Its main purpose is to easily manage configurations in projects. When using Hydra, all your configurations are defined in .yaml
files inside a config
folder.
The main advantage of using Hydra is that configuration files seamlessly integrate with the command line tool, and everything in the configuration can be rewritten, reconfigured, or overwritten through the CLI when executing your scripts. Hydra is essentially a CLI framework.
Additionally, it makes it very easy to run the same scripts multiple times using different parameters to understand the parameter's effects. This is particularly useful for data science projects and any work that requires experimentation.
To get started, first install the required packages:
pip install hydra-core omegaconf
Now create your configuration file configurations/my-script.yaml
myclass:
_target_: my_classes.MySuperBeautifulClass
name: just a random name
title: heeey!
logger:
_target_: logging.getLogger
name: Logger
handlers:
- _target_: logging.StreamHandler
stream: ext://sys.stdout
level: INFO
formatter:
_target_: logging.Formatter
fmt: "[%(asctime)s] [%(levelname)s] %(name)s: %(message)s"
datefmt: "%Y-%m-%d %H:%M:%S"
level: INFO
propagate: false
foo: bar
verbose: true
number: 22
The _target_
fields are special fields that tell Hydra that the objects they are specified in represent classes that need to be instantiated, and everything defined in the object are parameters to pass to them during instantiation.
my_classes.py
class MySuperBeautifulClass:
def __init__(self, name, title):
self.name = name
self.title = title
# ...
main.py
import hydra
from hydra.utils import instantiate
@hydra.main(version_base=None, config_path="configurations", config_name="my-script")
def main(configuration):
print("[configuration]:", configuration)
print("[foo]:", configuration.foo)
print("[number]:", configuration.number)
print("[verbose]:", configuration.verbose)
logger = instantiate(configuration.logger)
myclass = instantiate(configuration.myclass)
print("[myclass.title]:", myclass.title)
number = configuration.number
for i in range(number):
logger.info(f"loop-{i}...")
logger.info("done.")
if __name__ == "__main__":
main()
Now to run the script, we can execute:
python main.py
This will run the main function and pass the configurations defined in the YAML file.
We can also override one or more of the predefined parameters:
python main.py number=47 myclass.title=ANewCustomTitle
We can also run the script multiple times with different combinations of parameters:
python main.py --multirun number=47,56 myclass.title=ACustomTitle,AnotherCustomTitle
Now our script will be run a total of 4 times, with one of the following parameter combinations each time: (47, ACustomTitle), (47, AnotherCustomTitle), (56, ACustomTitle), (56, AnotherCustomTitle).
This was a basic introduction to the Python Hydra module. There is more to it, such as inline Python execution in configurations, configuration composition and hierarchy, field composition, undefined and required parameters, etc.