Neural Graph Evolution
|
|
Tingwu Wang*
Yuhao Zhou*,
Sanja Fidler,
Jimmy Ba Despite the recent successes in robotic locomotion control, the design of robot relies heavily on human engineering. Automatic robot design has been a long studied subject, but the recent progress has been slowed due to the large combinatorial search space and the difficulty in evaluating the found candidates. To address the two challenges, we formulate automatic robot design as a graph search problem and perform evolution search in graph space. We propose Neural Graph Evolution (NGE), which performs selection on current candidates and evolves new ones iteratively. Different from previous approaches, NGE uses graph neural networks to parameterize the control policies, which reduces evaluation cost on new candidates with the help of skill transfer from previously evaluated designs. In addition, NGE applies Graph Mutation with Uncertainty (GM-UC) by incorporating model uncertainty, which reduces the search space by balancing exploration and exploitation. We show that NGE significantly outperforms previous methods by an order of magnitude. As shown in experiments, NGE is the first algorithm that can automatically discover kinematically preferred robotic graph structures, such as a fish with two symmetrical flat side-fins and a tail, or a cheetah with athletic front and back legs. Instead of using thousands of cores for weeks, NGE efficiently solves searching problem within a day on a single 64 CPU-core Amazon EC2 machine. |
|
![]() |
![]() |
|
|
Paper |
|
Citation: Paper PDF [9.5 MB] | OpenReview | Codes (* denotes equal contribution) |
![]() |
Neural Graph Evolution |
|
We highlight the novelty of the algorithm in the following points. |
|
![]() |
|
Quantitative Result |
|
We created two customized environment to benchmark our algorithm. The first is an environment without gravity to simulate the environment of water. The other is an environment with gravity to simulate land. Both of the environments have the reward that encourages the agent to move faster in a fixed direction. The topic studying the design of structures of Reinforcement Learning is under-explored. We provided a total of 5 baselines to benchmark our algorithm. The baselines range from naive evolutionary structure search to modern variants of relevant works. From the curves (Left - Environment without gravity and Right - Environment with gravity), we can see that our algorithm performs significantly better comparing to other methods.
|
|
![]() ![]() |
|
Qualitative Result Demo |
|
We here present the qualitative result of our algorithm running in the first environment. The texts at the bottom-left corner has the following meaning: "Generation" is the generation (outer-loop) of the evolutionary algorithm; "Reward" is the amortized fitness. In this case, it is the speed of the fish in a fixed direction. The sideview of the agents' animation is presented. |
|
We highlight some representative species occurred in the genealogy tree.
In the early generation, the agent started with randomly generated body parts.
It was not making much progress due to the unbalanced body parts.
It quickly learnt to get rid of side fins and instead used the back-fin to paddle in order to swim faster.
As the agent was being trained, it gradually learnt that symmetric side fins, which originally was a small body part, performed better.
Eventually, the agent learnt its corresponding controller given by structure it evolved, which matched with the kinematic phenomenon found in nature. |
|
![]() We study how symmetric structures were generated by visualizing the genealogy tree of the species. For visualization purposes, we pruned the genealogy tree to highlight the significant development for one species. Most notably, the entire species were eventually dominated by those kinematically-fit individuals. Those that don’t possess a good structure (lopsided and minimal structure) will cease to exist due to the competitiveness from fitted structure. The genealogy tree of species trained in the second environment is attached below. ![]() |
|
|
|
Acknowledgement |
|
The research is supported by NSERC. We thank NVIDIA for GPU donation. We thank Relu Patrascu for infrastructure support. |
|
Credits: University of Toronto, Vector Institute: Yuhao Zhou, Tingwu Wang, Sanja Fidler, Jimmy Ba |