This site uses cookies

By using this site, you consent to our use of cookies. You can view our terms and conditions for more information.

Convergent hierarchical curiosity loops in a multi-objective intrinsic reinforcement learning agent

Ms. Gal Aviram
Tel Aviv University ~ Department of Biomedical Engineering
Prof. Goren Gordon
Tel Aviv University ~ Curiosity Lab, Department of Industrial Engineering

During the learning process, the brain processes various sources, including sensory and motor input, memory, and attention. However, it can only produce a single output at a time, i.e. motor commands, which are crucial for actively controlling the learning process. In real-world scenarios, a reinforcement learning agent may need to balance multiple conflicting objectives. Furthermore, the field of artificial curiosity, in which intrinsic reward is linked to learning progress has gained much attention. Here, we are investigating the use of convergent intrinsic reward functions, which incentivize the agent to pursue learning goals that align with multiple objectives. Previous work in Multi-Objective Reinforcement Learning (MORL) with intrinsic reward has focused on developing algorithms that can optimize multiple objectives simultaneously while also incorporating intrinsic rewards to encourage exploration and the discovery of new strategies. These algorithms have used multi-objective optimization frameworks such as Pareto optimization and hierarchical RL to achieve this goal. To investigate the scenario of a learning agent with multiple sensory-motor learning objectives that can only take one action at a time, we generated a hierarchical composition framework, in which all possible learners are co-learned, each represented by a neural network that explores unique two-to-one correlations. This adaptable system has a generic structure and can adjust to different input types. These networks’ learning progress is translated to intrinsic rewards, which were fed to a singular actor. Our findings indicate that certain networks achieve convergence, while others do not, which sheds light on the types of correlations that are learnable and those that are not. Moreover, the singular actor eventually develops a policy that produces more successful learning networks compared to random. This study has the potential to provide insights into how convergent intrinsic reward functions correlate with mechanisms underlying human cognition and behavior.



multi-objective optimization
artificial curiosity
reinforcement learning
intrinsic reward
hierarchical networks.

There is nothing here yet. Be the first to create a thread.

Cite this as:

Aviram, G., & Gordon, G. (2023, July). Convergent hierarchical curiosity loops in a multi-objective intrinsic reinforcement learning agent. Abstract published at MathPsych/ICCM/EMPG 2023. Via