Alternative dispatch techniques for the Tcl VM

Benjamin Vitale and Mathew Zaleski
Tcl 2005

A Brief Slide Presentation


Abstract We compare the performance of various virtual machine dispatch strategies in Tcl, including traditional highly-portable techniques, and newer techniques which sacrifice some portability for performance.

Tcl's high-level opcodes have large implementation bodies and contain C function calls. Compared to other VMs, the opcodes require many cycles to execute. Dispatch overhead is relatively low, because large bodies consume much more execution time than dispatch. Direct threaded code improves tclbench benchmarks by about 5% over switch dispatch. We review our catenation technique, which compiles bytecode using copied templates of Sparc code made from the normal C-compiled VM's implementation of each virtual opcode. This eliminates all dispatch, but is impractical due to poor portability, and because the copying amplifies Tcl's heavy instruction cache load.

Based on subroutine threading, context threading generates native call instructions for dispatch. Simpler than catenation, it imposes much lower I-cache load. It preserves more interpreter state, is a better vehicle for mixed-mode execution, and accomodates interesting optimizations. Our implementation for Tcl on Sparc improves 97% of benchmarks in the tclbench suite with more than 1000 dispatches, by an average of 9.5% over switch dispatch (12.0% and 16.5% for >10000 and 100000 dispatches, respectively.)