The virtual machine
Inko runs your code using a custom register based bytecode interpreter, written in Rust. The VM has over a hundred instructions, though many of these are high-level instructions such as "send a message to a process" or "get the length of a byte array".
For slower operations such as IO, the VM goes through an extra function call,
with each operation exposed as a function. These functions are called using the
instruction BuiltinFunctionCall
. The decision to make something an instruction
or a built-in function is a bit arbitrary, but over time we'll probably turn
more instructions into built-in functions, meaning instructions are only used
for core operations such as comparing integers or getting the length of a
string.
Running processes
Processes are scheduled onto a fixed-size pool of OS threads, with the default
size being equal to the number of CPU cores. This can be changed by setting the
environment variable INKO_PROCESS_THREADS
to a value between 1 and 65 535.
The main thread
The main OS thread isn't used for anything special, instead it waits for the process threads to finish. This means that C libraries that require the use of the main thread won't work with Inko. Few libraries have such requirements, most of which are GUI libraries, and these probably won't work with Inko anyway due to their heavy use of callbacks, which Inko doesn't support.
Load balancing
Work is distributed using a work stealing algorithm. Each thread has a bounded local queue that they produce work on, and other threads can steal work from this queue. Threads also have a single slot for a process to run at a higher priority. This is used in certain cases where we want to reduce latency.
When new work is produced but the queue is full, the work is instead pushed onto a global queue all threads have access to. Threads perform work in these steps:
- Run the process in the priority slot
- Run all processes in the local queue
- Steal processes from another thread
- Steal processes from the global queue
- Go to sleep until new work is pushed onto the global queue
Reductions
Processes maintain a reduction counter, starting at a pre-determined value. Certain operations reduce this counter. When the counter reaches zero it's reset and the process is rescheduled. This ensures processes performing CPU intensive work can't block OS threads indefinitely.
The default reduction count is 1000 and can be changed by setting the
environment variable INKO_REDUCTIONS
to a value between 1 and 65 535. The
higher the value, the more time a process is allowed to run for.
Reductions are performed using the Reduce
instruction, and it's up to the
compiler to insert these instructions in the right place.
IO operations
Sockets
For network IO the VM uses non-blocking sockets. When performing an operation that would block, the process and its socket are registered with "the network poller". This is a system/thread that polls a list of sockets until they are ready, rescheduling their corresponding processes. Polling is done using APIs such as epoll on Linux, kqueue on macOS/BSD, and IO completion ports on Windows.
By default a single network poller thread is spawned, and each process thread
uses the same poller. The number of poller threads is configured using the
INKO_NETPOLL_THREADS
environment variable. This variable can be set to a value
between 1 and 127. When the value is greater than one, network poller threads
are assigned to process threads in a round-robin fashion. Most programs won't
need more than a single thread, but if you make heavy use of (many) sockets you
may want to increase this value.
Blocking IO
For blocking operations, such as file IO, Inko uses a fixed amount of backup threads. When an OS thread is about to enter a blocking operation, it sets a flag indicating when it did so. This is implemented such that it in most cases it won't take more than 100-200 nanoseconds.
In the background a monitor thread periodically examines all OS threads. If it finds an OS thread is blocking for too long, it wakes up a backup thread to take over the work of this blocking OS thread. When the blocking OS thread finishes the blocking call it continues running its process. When the process is rescheduled and the OS thread would pick up new work, it becomes a backup thread instead.
The number of backup threads is controlled using the environment variable
INKO_BACKUP_THREADS
and defaults to four times the number of CPU cores. The
monitor thread runs at an interval of 100 microseconds, though the exact
interval may differ between platforms. This interval can't be changed.
Timeouts
Processes can suspend themselves with a timeout, or await a future for up to a certain amount of time. A separate thread called the "timeout worker" handles managing such processes. The timeout worker uses a binary heap for storing processes along with their timeouts, sorting them such that those with the shortest timeout are processed first.
When a process suspends itself with a timeout, it stores itself in a queue owned by the timeout worker.
The timeout worker performs its work in these steps:
- Move messages from the synchronised queue into an unsynchronised local FIFO queue
- Defragment the heap by removing entries that are no longer valid (e.g. a process got rescheduled before its timeout expired)
- Process any new entries to add into the heap
- Sleep until the shortest timeout expires, taking into account time already spent sleeping for the given timeout
- Repeat this cycle until we shut down
If the timeout worker is asleep and a new entry is added to the synchronised queue, the worker is woken up and the cycle starts anew.
Memory management
The VM uses the system allocator for allocating memory. You can also build the VM with jemalloc support as follows:
cargo build --release --features jemalloc
The use of jemalloc is recommended for production environments that don't already use jemalloc as the system allocator.
In earlier versions of Inko we used a custom allocator based on Immix. We moved away from this for the following reasons:
- The implementation was quite complex and difficult to debug
- Immix suffers from fragmentation, and without a GC (what it's designed for) it's hard to clean up the fragmentation
- Our implementation was unlikely to outperform highly optimised allocators such as jemalloc, so we figured we may as well use an existing allocator and direct our attention elsewhere
FFI
The VM provides an FFI using libffi. By default the VM builds libffi from source, but you can use a system wide installation of libffi by building Inko as follows:
cargo build --release --features libffi-system
To use the FFI, the VM provides a few built-in functions such as
ffi_library_open
and ffi_function_attach
, called using the
BuiltinFunctionCall
instruction.
Bytecode format
The VM uses a custom bytecode format and is subject to change. The best way to
understand how it works is to look at the source code of the Image
type in the
VM, found in vm/src/image.rs
.
VM instructions need 12 bytes of space and support up to 5 arguments.
The VM provides the instruction GetConstant
to read a constant (e.g. a string
literal) into a register. When the compiler generates these instructions, it
encodes a module-local constant index into the instruction. When the VM loads
the bytecode it rewrites these instructions to include a pointer to the
constant. This means that at runtime a GetConstant
instruction just interprets
its arguments as a pointer and stores it in a register, instead of having to
deference several pointers and look up a value in an array using an index.
Method dispatch
The VM supports two ways of calling regular methods: virtual dispatch and dynamic dispatch. For virtual dispatch the VM reads the class of a receiver, then looks up the method in the class' method table.
For dynamic dispatch we use hashing. The hash code for a method is based on its name and generated at compile time by the bytecode compiler. The hash is embedded into the instruction as an argument. To handle conflicts, the VM uses linear probing. Due to rounding method table sizes up to the nearest power of two (based on its raw/initial size), conflicts are quite rare and a method can be found in at most a few probes. The VM doesn't perform inline caching, though we may implement this in the future.
Static methods are called using virtual dispatch on the class. We may add a dedicated instruction for this in the future.
Method arguments are pushed onto a stack instead of using registers. This is
needed because certain operations may need more arguments than we can store in
an instruction. Arguments are pushed onto the stack using a Push
instruction
and popped off the stack with a Pop
instruction. When entering a method,
arguments are popped off the stack in reverse order, so the last argument is
popped first.
Each method call writes its result to a "result" field stored in a process. The
instruction MoveResult
takes this value and stores it in a register, clearing
the "result" field in the process.
Jump tables
The VM has a JumpTable
instruction, primarily used for pattern matching
against enums. Jump tables are encoded in the bytecode image, and generated by
the compiler. The instruction is simple: it takes the value to test, which must
be an Int
, and an index to the jump table (stored in the surrounding method).
The jump is then performed using essentially
goto method.jump_tables[table_index][jump_value]
.
Throwing
A throw is just a specialised return: the Throw
instruction writes its value
to the "result" field, then sets the thrown
flag to true
. Error handling is
done using the BranchResult
instruction, which checks the thrown
flag and
branches accordingly. The compiler inserts this instruction after every method
call that may throw.
Immediate values
The VM is able to optimise the allocations of small values, such as integers and booleans. This is achieved using pointer tagging.
If a pointer has its lowest bit set to 1
it means the pointer is an immediate
value, instead of pointing to a heap allocated object.
If the lowest two bits are set to 1
(so xx11
), the pointer is a tagged
integer capable of storing integers up to 62 bits. 63 and 64 bits integers are
heap allocated.
If the lowest two bits of a pointer are set to 10
, it means the pointer is a
pointer to a permanently allocated heap object, such as a string literal.
If the third lowest bit of a pointer is set to 1
(e.g. the pattern is 100
)
it means the pointer is a reference to an owned value.
Here's an overview of all these patterns (x
indicates the bit could be set or
unset):
Value | Pattern |
---|---|
Heap object | xxxx x000 |
Permanent object | xxxx xx10 |
Reference | xxxx x1x0 |
Tagged integer | xxxx xx11 |
true |
0000 1101 |
false |
0000 0101 |
nil |
0000 0001 |
undefined |
0000 1001 |
For more information, refer to vm/src/mem.rs
, which implements most of the
memory management logic of the VM.
Memory layout and fields
The VM uses the structure Object
for regular class instance, and Process
for
instances of processes. Both these Rust structures are variable-sized
structures, with the user-defined fields stored at the end. Each object has a
header that's 16 bytes. Instances of a class without any fields thus need only
16 bytes of memory, while an instance of a class with two fields needs 32 bytes
of memory (= each field is 8 bytes).
Different objects such as strings and arrays have a different layout, but all objects always start with a header.
When a class is defined, the number of fields is used to calculate the size of instances of said class. When allocating such an instance we read the size from the class and allocate accordingly.
Fields are just offsets to a structure, much like fields in e.g. C structures.
The VM doesn't use hashing for fields unlike e.g. Ruby and Python. Fields are
read using the GetField
instruction, and written using the SetField
instruction. For processes the VM uses ProcessGetField
and ProcessSetField
respectively, as processes have a slightly different memory layout.
Strings
Strings are immutable, and need at least 41 bytes of space. To allow easy passing of strings to C, each string ends with a NULL byte on top of storing its length. This NULL byte is ignored by Inko code. When passing a string to C, we just pass the pointer to the string's bytes which includes the NULL byte.
Since C strings must be NULL terminated, the alternative would've been to create a copy of the Inko string with a NULL byte at the end. When passing C strings to Inko we'd then have to do the opposite, leading to a lot of redundant copying. Our approach instead means we can pass strings between C and Inko with almost no additional cost.
Strings use atomic reference counting when copying, meaning that a copy of a string increments the reference count instead of creating a full copy.