Brief History of BGBScript

Earlier Efforts

Well, once I was young, and early on spent a lot of time messing around with Quake. Quake had a scripting language, known as QuakeC, which was fairly limited but made it a lot easier to tweak things in-game, at least compared with the use of C for scripting in Quake II.

Eventually, I was working on more of my own projects, and came across a language known as Lisp. I had tried using some of the existing scripting VMs I could find, which IIRC was a standalone variant of Emacs Lisp, and the SCM / Guile implementations of Scheme.

After running into frustrations with these VMs, and noting the general simplicity of the language, decided to do my own VM. It wasn’t very good, having relied on a sort of modified S-Expressions for interpretation, and later wrote a backend which spit out C. This project, however, turned into a mess, and Scheme’s syntax and language design wasn’t particularly usable, so I later ended up abandoning it.

At first, I thought maybe I was done messing with scripting VMs, but started to face the issue that writing code purely in C is limiting, given even if only a small amount of code ever ends up as scripts, the parts that do make a difference. Around this time, I had also become basically familiar with HTML and JavaScript, and had recently done an implementation of XML-RPC. So, I did a nasty hack: I built a script interpreter for a JavaScript knock-off interpreter on top of XML-RPC, which parsed into an XML tree and then directly interpreted the XML expressions (this was around mid 2004). This first VM suffered from horrible performance, and basically spewed garbage at an incredible rate, so in not too long I wrote a replacement.

The next iteration of the language was partially redesigned (in 2006), and basically copy-pasted a lot of parts from my Scheme interpreter, but made use of a bytecode interpreter. This VM was a lot faster, but suffered some of the problems of my original Scheme interpreter, for example the relative pain of trying to deal with a precise garbage collector from C land.

The next VM (2009) was basically a lot of the machinery from the 2006 VM, re-merged with parts of the 2004 VM, namely using a more C friendly memory manager. In 2010 and 2011 I started adding a lot of features I had sort of copied from ActionScript 3.0, as well as some features from Java and C#.

The bytecode used was originally a dynamically-typed stack-oriented bytecode, but opcode types annotations were added on after the fact via prefixes. The VM then used an awkward mix of static and dynamic typing.

Had also been running side efforts trying to do natively compiled and 3AC compiled code, but my early efforts here were very hit or miss (my first real code-generator collapsed into being unmaintainable, …). A lot of optimizations for my more efficient 3AC interpreters, and a few other VM designs, ended up back-ported to the stack machine VM, so there ended up being not a huge pressure to replace it with something different.

 

First BGBTech Engine

Around this time, I also wrote the first BGBTech Engine by pulling together parts from my various previous projects, so BGBScript was the primary scripting VM in BGBTech. The engine was developed out of a combination of a 3D modeller and mapper I had previously written mostly for the Quake 2 engine, mostly as by that time QuArK had decayed into an almost unusable mess, and couldn’t really be easily extended to deal with heavily modified games or engines.

Its performance was overall pretty bad, early on me feeling it was doing good when framerates were in the double-digits, and I later ended up switching over mostly to a Minecraft style terrain system after noting that my mapping skills were pretty bad.

Most of the horrible performance was due to me early on not really understanding OpenGL that well, and doing lots of things in ways which were pretty but not all that fast, then noting that they scaled poorly to larger scenes. A lot of the performance gains were mostly figuring out how to draw things more efficiently, and cutting things back to limit their performance costs.

By around 2014, progress on this engine was pretty much stalled.

 

Formation of BGBScript 2

Likewise, I had gone off to some side projects, mostly working on trying to build robots, so I put a lot of this on hold.

At this point I ran into a problem that my older BGBScript VM was too heavyweight for use in my robot projects, and wasn’t really well suited to low-latency use-cases. Had started on a redesigned version of the language I had called BGBScript-RT (for BGBScript Real-Time). It was redesigned to be much more like C in many areas, switching from garbage collection to the use of manual memory management.

In the meantime, needing something more immediate, I had used a variation of C compiled to a 3AC Bytecode. The bytecode VM was originally designed as part of a new faster experimental backend for the BGBScript VM, but it was easier to get C working on it. While the VM was pretty fast, it had a number of more subtle design issues.

Partly as a result, the newer BGBScript RT/2 VM had made use of a statically typed stack-machine, more like that in the JVM. This initial form of the VM stalled out with the general stall of my robot projects.

 

BGBTech 2 Engine

Around late 2015, and the general stall of my robot-building efforts, I ended up starting on a reboot of my 3D engine. My initial thinking was that I could take what I learned from my prior engine, and probably reboot it into a much faster and better engineered engine.

Since it quickly became apparent I needed scripting, I ended up reviving the BGBScript 2 VM, and worked a lot on beating the compiler/VM into a usable form. This was along with working a lot on stuff relevant to “engine architecture”.

This Script VM has ended up a fair bit bigger and more complicated than my original vision, but a lot of this was for things relevant to a game engine, and to performance: A simpler VM with a smaller / simpler ISA either comes at the cost of time or complexity in the back-end, or in worse performance. Generally, getting better performance comes to some extent at the cost of VM complexity.

As of the beginning of 2017, while it hasn’t really caught up to the old engine in many areas, and it is debatable whether the “better engineering” part is actually paying off, I at least seem to be making progress (and my new engine is much less prone to running out of memory or having awkward GC stalls).

 

Bytecode Tradeoffs

Over time, it can be noted that different ways of expressing operations has different tradeoffs.

Bytecode Tradeoffs

  • Dynamically Typed Stack Machine
    • Simplest to design/implement
      • Front-end mostly ignores types.
    • Naive case, is also the slowest
      • Deterimining types and operations at runtime does not come for free.
    • Getting decent performance requires a very complicated back-end.
      • Back-end needs to deal with type inference and JIT
    • FWIW: A few VM’s (such as Parrot) have a Dynamically-Typed Register IR
  • Implicit-Type Statically-Typed Stack Machine
    • Used, for example, in the .NET CLR.
    • Most elegant design on paper
    • Reliance on propagating types via the stack is problematic for interpreters
      • Sensible if the VM is JIT only
  • Explicitly-Typed Stack Machine
    • An example of this is the JVM
    • Requires use of more opcodes and/or prefixes to express types.
      • Less elegant design due to lots of repeated operations per-type.
      • May seem less orthogonal as operations may only exist for certain types.
    • Is a decent design for an interpreter.
      • Explicit type information means back-end doesn’t need to figure this out.
      • May also be easier for a simplistic JIT.
  • Statically-Typed Register-IR (Non-SSA)
    • An example of this would include the Dalvik VM
    • Can allow for a faster interpreter than what is typical from a stack machine.
    • A drawback is that things can also get more complicated.
      • Compiler also needs to deal with the management of temporary variables.
      • No convenient way implicitly pass arguments to functions.
  • Statically-Typed Register-IR (SSA)
    • An example of this is LLVM BitCode
    • Best suited for compiler analysis than interpretation
      • The additional temporary registers and phi operations are counter-productive for an interpreter.
      • Though, sub-scripting rather than duplicating registers is an option.
        • In the subscripted case, identity is shared, and phi is made implicit.
    • Is good for knowing register lifetime and may aid in register allocation.
      • Can know more easily when the lifespan of a register has ended.

 

In its current form, the BS2 VM uses a mix of an explicitly typed stack machine with some number of register ops mixed in. The use of hybrid register-ops can in some cases be used to reduce the length of instruction sequences, and thus improve performance for an interpreter, at the cost of a larger ISA.

There is some possible debate as to whether this is better or worse than a purely register IR would have been.

Seen another way, the stack can be thought of as a collection implicit temporaries which are accessed in LIFO order. Within the interpreter (if using threaded-code), there is no need to maintain a stack-pointer at runtime (the stack pointer then only really exists when decoding functions). A side-effect of this is needing some non-executed instructions which primarily serve the purpose of keeping the stack aligned around labels and branches.

For example, compiling “z=25*x+y;”with a pure stack model:

LDI L0   //(=>S)=x
LDC.I 25 //(=>S)=25
MULI     //(=>S)=(S=>)+(S=>)
LDI L1   //(=>S)=y
ADDI     //(=>S)=(S=>)+(S=>)
STI L2   //z=(S=>)

Or, with some additional “hybrid” opcodes:

MULISLC L0, 25  //(=>S)=x*25
ADDILSL L2, L1  //z=(S=>)+y

Which in this case, can match the density of a pure register form.

 

Typically, the bytecode is internally converted into a sort of threaded code, where instructions are decoded into structures with a function pointer, and instructions are executed by calling the function pointers.

These instructions are then stored in arrays, and in C side, unrolled loops can be used to execute a sequence of instructions via such arrays. Execution then primarily works via a main trampoline loop, which calls function pointers holding the unrolled loops for executing instructions.

A simplistic JIT strategy for this is to emit machine code blobs which contain, in the general cases, function calls to the embedded function pointers. For operations handled directly by the JIT, it may instead emit a blob of instructions for executing its operation. Extra fancy is one can add a makeshift register allocator, so values may be cached in CPU registers rather than using memory loads/stores.

But, if the bytecode is kept friendly to interpreters, an interpreter is a good fallback in case a JIT is not available or is too expensive to support on the target, and directly interpreting bytecode operations can be good if for whatever reason threaded-code is not a good option, or if the code will probably only ever be executed maybe once or twice.

 

Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s