Neural Graph Evolution
Towards Efficient Automatic Robot Design

Tingwu Wang* Yuhao Zhou*, Sanja Fidler, Jimmy Ba
{tingwuwang, henryzhou, fidler, jba} @ cs.toronto.edu
University of Toronto, Canada
Vector Institute

Despite the recent successes in robotic locomotion control, the design of robot relies heavily on human engineering. Automatic robot design has been a long studied subject, but the recent progress has been slowed due to the large combinatorial search space and the difficulty in evaluating the found candidates. To address the two challenges, we formulate automatic robot design as a graph search problem and perform evolution search in graph space. We propose Neural Graph Evolution (NGE), which performs selection on current candidates and evolves new ones iteratively. Different from previous approaches, NGE uses graph neural networks to parameterize the control policies, which reduces evaluation cost on new candidates with the help of skill transfer from previously evaluated designs. In addition, NGE applies Graph Mutation with Uncertainty (GM-UC) by incorporating model uncertainty, which reduces the search space by balancing exploration and exploitation. We show that NGE significantly outperforms previous methods by an order of magnitude. As shown in experiments, NGE is the first algorithm that can automatically discover kinematically preferred robotic graph structures, such as a fish with two symmetrical flat side-fins and a tail, or a cheetah with athletic front and back legs. Instead of using thousands of cores for weeks, NGE efficiently solves searching problem within a day on a single 64 CPU-core Amazon EC2 machine.



Paper

Citation:
Tingwu Wang*, Yuhao Zhou*, Sanja Fidler, and Jimmy Ba
Neural Graph Evolution: Towards Efficient Automatic Robot Design

Paper PDF [9.5 MB] | OpenReview | Codes


(* denotes equal contribution)

Bib format


Neural Graph Evolution

We highlight the novelty of the algorithm in the following points.
Toggle each part to learn more.
An overview of the algorithm is shown combining all the points.
For a more detailed explanation, please refer to the paper.

Evolutionary Algorithm

Graph Mutation on Species

Policy Sharing across Generations

Amortized Fitness

Bayesian Pruning for Species Selection




Quantitative Result

We created two customized environment to benchmark our algorithm. The first is an environment without gravity to simulate the environment of water. The other is an environment with gravity to simulate land. Both of the environments have the reward that encourages the agent to move faster in a fixed direction.

The topic studying the design of structures of Reinforcement Learning is under-explored. We provided a total of 5 baselines to benchmark our algorithm. The baselines range from naive evolutionary structure search to modern variants of relevant works. From the curves (Left - Environment without gravity and Right - Environment with gravity), we can see that our algorithm performs significantly better comparing to other methods.




Qualitative Result Demo

We here present the qualitative result of our algorithm running in the first environment. The texts at the bottom-left corner has the following meaning: "Generation" is the generation (outer-loop) of the evolutionary algorithm; "Reward" is the amortized fitness. In this case, it is the speed of the fish in a fixed direction. The sideview of the agents' animation is presented.



We highlight some representative species occurred in the genealogy tree. In the early generation, the agent started with randomly generated body parts. It was not making much progress due to the unbalanced body parts. It quickly learnt to get rid of side fins and instead used the back-fin to paddle in order to swim faster. As the agent was being trained, it gradually learnt that symmetric side fins, which originally was a small body part, performed better. Eventually, the agent learnt its corresponding controller given by structure it evolved, which matched with the kinematic phenomenon found in nature.

An additional study of finetuning the body parts in the environment of 'cheetah' is also presented. In this experiment, the topological structure of the agents is fixed, whereas the attributes of each body part is allowed to change. Notice that the front leg of the cheetah is evolved into a craw-like structure to strike the ground.




We study how symmetric structures were generated by visualizing the genealogy tree of the species. For visualization purposes, we pruned the genealogy tree to highlight the significant development for one species. Most notably, the entire species were eventually dominated by those kinematically-fit individuals. Those that don’t possess a good structure (lopsided and minimal structure) will cease to exist due to the competitiveness from fitted structure. The genealogy tree of species trained in the second environment is attached below.




Acknowledgement

The research is supported by NSERC. We thank NVIDIA for GPU donation. We thank Relu Patrascu for infrastructure support.

Credits: University of Toronto, Vector Institute: Yuhao Zhou, Tingwu Wang, Sanja Fidler, Jimmy Ba