(っ◔◡◔)っ  ∇

Preface

The Nabla VM system is one I've been working on for the last few months. These posts are ones I wrote while developing the Nabla VM and the "Del" language. The github project space for the NablaVM can be found here. For some cool information regarding the bytecode and assembly you can check this out.

For deeper information regarding the NablaVM check out the github wiki.

The Beginning - 6/2/2020

The Nabla Virtual Machine is a side project that I began working on during the onset of COVID-19. I'm thankfully still employed, but I still find myself having a significant amount more free time than I've had in a while so I decided to explore VM, compiler, and language design. This project has been a lot larger than I thought, and now a few months into working on it I'm still finding it quite interesting. I think that I'll be working on this for a long time to come so I decided to create some web pages to showcase the project and host its documentation .

As of today (3/ June/ 2020) the nabla project has things setup to compile large assembly projects to bytecode, and the VM can execute all instructions.

To get a proof of concept in place, the VM can also perform TCP/UDP actions using its Network Device, and CLI/File IO using its IO Device.  Using these devices requires drivers written in Nabla ASM and it isn't too friendly, so before I expand and harden networking and io I intend on developing the yet to named high level language so that the drivers can be written in a human-friendly dialect.

The First Program - 6/4/2020

I've been working on the compiler for the HLL "Del" for about 12 hours today. About 11 hours into the session I got the very first program compiled and executed.  This is a big moment for me. I've been working on the compiler for almost a month, and I've had to learn a lot just to get what I have.

This post is meant to be a freeform  stream of conscious update. My brain is pretty mush from programming all day.  

To start off, I picked up this book to get me going on the Lexer / Parser, and since then I've been chugging on along.I recommend it if you're interested in Flex and Bison.

Right now the compiler can only handle integers, doubles, and single chars, but it can handle assignment and reassignment expressions!

This is the program that I was able to get compiled :



def main() -> int {

  int a = 2 ^ 4;
}

I opted to keep '^' as power instead of its standard bitwise actions.

The resulting ASM code is a bit nuts:




;
;   This file was auto generated by the DEL compiler 
;   At the time of this generation DEL compiler was heavily under development
;   by someone who knows very little about compiler design
;       - Josh A. Bosley

.file "DEL_Lang"

.init __del__program__init



.int64 __MEM_ALLOC_COUNTER__ 73



<__del__program__init:

    ; Allocate the space required for the GS
    ; Codegen put the number in GS 0 for us. 
    mov r0 $0
    ldw r1 $0(gs) 
    mov r2 $0
top:
    add r0 r0 $1    ; Add to the counter
    pushw gs r2     ; Expand GS by a byte
    blt r0 r1 top   ; Loop until complete

    call main       ; Entry to the user's application

    exit
>


;
;	MODULUS
;	Performs modulus operation on r1, and r2
;	Results in r0
;	Ex:  r0 = r1 % r2
<__del__math__MODULUS:

    blt r1 r2 modspecial
    pushw ls r1
    pushw ls r2 
    pushw ls r3
    pushw ls r4

	mov r3 $0   		; Init temp, and result
	div r3 r1 r2		; Divide Num/Denom
    mov r4 $0
	beq r3 r4 moduiz	; Check if mod is 0
	jmp moduinz	    	; If not, go to moduinz
moduiz:
		mov r0 $0
		jmp moddone
moduinz:
		mul r3 r2 r3 	; Mult denom by result
		sub r0 r1 r3	; Sub result from num
		jmp moddone
moddone:
    popw r4 ls 
    popw r3 ls 
    popw r2 ls 
    popw r1 ls 
    ret
modspecial:
    mov r0 r1
>


;
;	POWER
;	Performs power operation on r1, and r2
;	Results in r0
;	Ex:  r0 = r1 pow( r2 )

<__del__math__POWER:
    pushw ls r3
    mov r0 $1
    mov r3 $1
top:
    bgt r3 r2 bottom
    add r3 r3 $1
    mul r0 r0 r1
    jmp top

bottom:
    popw r3 ls 
    ret
>


;
;	MODULUS_D
;	Performs modulus operation on r1, and r2
;	Results in r0
;	Ex:  r0 = r1 % r2
<__del__math__MODULUS_D:

    blt.d r1 r2 modspecial
    pushw ls r1
    pushw ls r2 
    pushw ls r3
    pushw ls r4

	mov r3 $0   		; Init temp, and result
	div.d r3 r1 r2		; Divide Num/Denom
    mov r4 $0
	beq.d r3 r4 moduiz	; Check if mod is 0
	jmp moduinz	    	; If not, go to moduinz
moduiz:
		mov r0 $0
		jmp moddone
moduinz:
		mul.d r3 r2 r3 	; Mult denom by result
		sub.d r0 r1 r3	; Sub result from num
		jmp moddone
moddone:
    popw r4 ls 
    popw r3 ls 
    popw r2 ls 
    popw r1 ls 
    ret
modspecial:
    mov r0 r1
>


;
;	POWER_D
;	Performs power operation on r1, and r2
;	Results in r0
;	Ex:  r0 = r1 pow( r2 )

<__del__math__POWER_D:
    pushw ls r3
    pushw ls r4
    mov r4 $1
    mov r0 $1
    mov r3 $1
top:
    bgt.d r3 r2 bottom
    add.d r3 r3 r4
    mul.d r0 r0 r1
    jmp top

bottom:
    popw r4 ls 
    popw r3 ls 
    ret
>

<main: 


	; <<< SETUP INT >>> 

	; ---- a ----
	lsh r0 $2 $0	 ; Byte 0

	; ---- Move the value ----
	pushw ls r0	 ; Place on local stack for calculation

	; <<< SETUP INT >>> 

	; ---- a ----
	lsh r0 $4 $0	 ; Byte 0

	; ---- Move the value ----
	pushw ls r0	 ; Place on local stack for calculation

	; <<< POW >>> 

	popw r2 ls 	 ; Calculation RHS
	popw r1 ls 	 ; Calculation LHS

	call __del__math__POWER ; Call to perfom power

	pushw ls r0	; Push value to local stack for calculation

	; <<< STORE WORD >>> 

	; ---- Address for [a] ----
	lsh r0 $64 $0	 ; Byte 0
	lsh r1 $0 $8	 ; Byte 1
	lsh r2 $0 $16	 ; Byte 2
	lsh r3 $0 $24	 ; Byte 3
	lsh r4 $0 $32	 ; Byte 4
	lsh r5 $0 $40	 ; Byte 5
	lsh r6 $0 $48	 ; Byte 6
	lsh r7 $0 $56	 ; Byte 7

	; ---- OR value together ---- 
	or r0 r0 r1
	or r0 r0 r2
	or r0 r0 r3
	or r0 r0 r4
	or r0 r0 r5
	or r0 r0 r6
	or r0 r0 r7

	; ---- Get result ----
	popw r8 ls

	; ---- Store result ---- 
	stw r0(gs) r8	 ; Store in memory
> 



Its a bit much, I know, but in a gist, this detects that a special math function is being used and then places some built-in methods for mathematical operations on doubles and ints.

It then takes an RPN representation of the expression and calculates the result, storing it to an assigned address at the end.

Right now I'm using a shift  / or-ing mechanism for getting 64-bit addresses stored into the 64-bit registers. Once I get the base functionality of the compiler written I will revisit and optimize these things. I'm trying to be as verbose as possible because on-top of the compiler I've also written the assembler, byte code, and the VM it runs on. A bug present from something written in the HLL might actually be coming from the underlying computation mechanisms so I want to see every step until I am sure everything is bullet proof.

The Nabla VM - 6/11/2020

The Nabla VM is a virtual machine I designed with an idea that started with wanting to make a programming language. I wanted it to be higher level, and I didn't want to just define a grammar and plug it into LLVM. I wanted to really MAKE the language. When I mean "make" I mean, I wanted to write things from top to bottom, and I wanted that bottom to be special. What do I mean by special? I mean that I wanted the bottom-most layer of the language to be an executable byte code so I wouldn't have to target different platforms. I figured making a VM that executed byte code aught to be easy enough.

Stack Based, or Register Based?

As I started to investigate what the world was doing with VMs I noticed that there were two primary types of VMs that languages leveraged to perform their operations. Stack based, and Register based. There is an article here talking about the two, but basically it comes down to the means by which you move data around. Do you use a stack, or pre-defined locations for storage (registers). I only thought about this for about 5 minutes before deciding that what I wanted was something... more similar to actual computer architecture. Now I'm not claiming that the Nabla VM represents 1-for-1 physical computer architecture, but it is definitely more similar to that than a simple stack. Realistically, while designing the VM around this register idea, it is kind of pointless to make a distinction when it comes to the Nabla VM. It does in fact use registers, but it also leverages stacks for computation... as do most things.

So what does this Architecture look like ?

I'm glad you asked. The Nabla VM was built around a concept I've wanted to play with for a while. The concept of physical distinction between data and instructions. Usually when you have assembly code everything is stored in the same place and executed sequentially. Instructions and data are in the same area so fun things like stack smashing / stack overflow can allow mischievous users to overwrite code in memory and execute their own routines. With a physical distinction between data and instructions, we can mitigate some (only some, the idea is not bullet proof) of the attack vectors that people could use to poke and prod around on a machine.

Now, being up front, the Nabla VM isn't meant to be secure or anything fancy like that. I just wanted to take that idea and have it influence my design. I realize that the VM exists on a regular machine and the distinction I'm making between instructions is just a facade.

I decided that there would be physical areas that separated out functionality of code such-that these areas' data could even be separated. Now, in order for these distinct things to communicate they needed methods to transfer information. As discussed earlier, I decided to go with a register based VM, so obviously there are some registers, but what about big swaths of data? With this in mind I decided to add a global area for shared data storage. Wild! Making things even more crazy, I added stacks to each of the function units so they could process data without leveraging the storage stack for arithmetic calculations. This will make sense later on when I start to cover the pseudo-parallelism that the VM can do.

So what does this thing look like ?

Stunning!

I gave the machine 16 registers, a global stack (GS), function units (FU), and stacks local to a function unit (LS).

Instruction Picking

So now we have a box. With boxes in it, and in some cases, another layer of boxes. Now we just need a stick to poke them with and make them do things. This stick of course is an instruction set. Now when it comes to instruction sets there are a few overall ideas called RISC CISC and MISC that I got to choose from. What a world! These stand for "Reduces Instruction Set Computing," "Complex Instruction Set Computing," and "Minimal Instruction Set Computing." The gist of what these mean is that I can either have a literal metric ton of instructions (CISC), a reasonable amount of instructions (RISC), or barely enough to do anything more than basic computation and we'll have to to tons of execution do do literally anything (MISC). My descriptions here are clearly biased by my opinion, and it should be clear that I chose RISC. Mostly because I didn't want to write all of the instructions, but also because lots of types of instructions in a virtual environment would have a detrimental impact on the execution cycle (it would be slower).

Armed with the idea that I wanted a 'reasonable' amount of instructions I started writing out the things I would need to facilitate computation in / with the boxes above. As I started the project I realized that I also needed to decide that weather or not I would have fixed-width or variable width instructions.  There are pros and cons to both, but ultimately I decided to go with fixed-width 64-bit instructions (for execution.) Why 64-bit? Primarily because I could stuff a whole lot more information directly into the instruction set with 64-bits than say, with 8, 16, or 32 bits. Why not 128 / 512 ? Because thats ridiculous, and because I didn't need that much space. A lot more space would have been wasted.

The Instruction Set

The instruction set documentation can be found here. I don't want to go into too much detail about the instruction set here, but this is an example of the ASM that the Nabla VM can run:

.file "Example"
.init main

.int8 my_gs_int 24 ; Special instruction to load a value into GS on init

; This 'function' represents one of the yellow boxes obove
<main:

    mov r0 $42     ; Load 42 into register 0
    pushw ls r0.   ; Push the entire register into the local stack
    pushw gs r0.   ; Push the entire register into the global stack
    ret
>

As you can see, the instruction set specifies things like which of the boxes above to store data, but what you don't see is that the instructions themselves reside outside of the data storage areas! You can't access the instructions at run time at all. Beautiful.

What about File IO and Networking?

Good question. There is no use in a programming language if you can't load data to/from disk, or chat online with it. For these things I made it possible to extend the VM using devices.  The use of these get a bit heady, but in brief, they have their own instruction set (like drivers) that need to be formed within the ASM its self. What do I mean? The byte code that the devices actually executes has to be written in the Nabla ASM. I linked the pages that describe these devices' byte codes. Right now the devices support basic TCP/UDP and IO to disk / stdin and stdout. The networking portion is in line to be beefed up once I get the high level language developed to a workable state. WAIT. Did I just say high level language? Good transition time.

High Level Language

As I mentioned in the start of this post, I wanted to make a programming language. I guess I kind of did by making the assembly language, but that was more of a means to an end. The end being the high level language I wanted to make. Now, I didn't have an idea of what I wanted the language to look like or feel like. I just wanted to do it. So thats what I'm doing. The language "Del" is currently being developed. Its pretty primitive right now, but here is an example of what it looks like :


def add_one(int some_int) -> int {

    return some_int + 1;
}

def main() -> int {

    int my_int = add_one(41);
    
    return 0;
}

It isn't too pretty yet, but its getting there. As of the writing of this post the language compiles to NablaVM's ASM code and can support ints, doubles, chars for types, and it calculate every expression I've thrown at it. Just earlier today I got parameter passing completed. Thats how primitive it is.

Reentrant Memory Units - 6 /14/2020

First off, I want to say thanks to Gurge who seems to take at least some interest in this project for hearing me rant on about different aspects of what I'm doing. You're the best.

During the creation of DEL, I have come to a point where I need to do Dynamic memory allocation! I have been reading about different methods for handling memory management and how to minimize memory fragmentation, but I want to try something... different.

In the Nabla ASM I created something kind of interesting, but not quite unique. Reentrant functions. What does this mean ? Typically when you call a function, it executes all of its instructions, then returns. Awesome. A reentrant function executes some of its instructions, yields its execution back to the caller, and upon being called again it picks up where it left off. Simple. Wild. Unique. Awesome. So how on earth does this relate to memory? I'm glad you asked. I am going to explore leveraging this behavior to actually store data.

If you recall from this post I describe how functions in Nabla ASM are actually representations of individual units of space that store their own data and instruction set. They aren't your typical "function" they are only called functions because its a name that is familiar and matches what they do close enough. Now, when these functions return, their local storage is wiped and the next time you call that function its just like new. No memory allocated, with an instruction set just waiting to be executed. However, if you look at this you'll see that I've included a special instruction called "yield" that upon being executed will tell the function to stop executing, and give back execution to the caller. When a yield instruction is hit, the local storage of the function is not cleared, and upon being called again the function will pick up where it left off. Wild.

So how can this behavior be leveraged for memory storage? Well, the details are being worked out on how it will work specifically for DEL but I've already done a small POC that shows its possible. Take the following code for instance :


<storage:

    ; interp commands for construction

top:

    ; Read commands from a global space to 
    ;  Add, remove, update, clear, etc

    ; example :
    pushw ls r9 ; Stores some word from r9 into the local stack
    
    ; We could also pull in data from GS 

    yield   ; Returns without clearing local object memory

    ; This is where the next entry comes when called again

    bne r1 r2 top   
    ; If method is told to terminate, then it returns for real and memory is cleared
    
    ; (Note: the bne probably isn't how I'll handle instructing a return, its just an example)
>

This shows a rough idea on how we can instruct one of these function units to store / load / move data internally, and how we can keep it alive until we're done using it.

The code that will be written to fully support this in Del will not be something I ever post here. It will be long and require a lot of walk-through to understand whats going on. In a tldr; format, its this :

A section of global memory will be used for book keeping, and a section used for passing instructions / data.
We call the function to execute memory operations

We now have dynamic memory without having to implement algorithms that track used / freed bytes, and keep memory fragmentation down to only that which occurs because of the C++ compiler that the VM is built with.

It isn't all roses though

With a moment of thought its clear that this will be a pain to implement if something like this happens :

loop {
    int * p = create int[10];
}

The compiler will have to do a lot of semantic analysis to determine (if it can) how many of these units will need to be generated, or if it should tell the user it just won't do it. In the above case the compiler wouldn't be able to comply because we would need an infinite number of memory units to comply.

This means I will need to introduce some sort of specifier that tells the compiler some information regarding the intention of the user.

If, for instance we get the following (The syntax for loops and user input is not yet defined for Del, this is pseudo code):

int x = <user_input>;

loop i for 0..x {
    int * p = create int[1];
    &p = i * 2;
}

The compiler has NO idea how many units to make, BUT it does know that its finite. Interesting. What if we added a specifier ?

int x:range(0,100) = <user_input>; 

....

That could work!

There are a few more cases that will have to be considered for this to work, but I think it is definitely doable.

Conditional Code; A Small Update - 6/18/2020

One of the most important pieces of any given language is flow control. Making things compute expressions sure is important, but arguably more important is the ability to make sure that things are done in a correct order when there are multiple pieces to a problem. Controlling the flow of instruction execution is typically done with things like loops, calls, and if statements.

Why am I saying this? Because Del now has if statements, and it's kind of a big deal. Of the aforementioned "things" that are used for flow control, implementing "if statements" is one of the slightly more complex things. Sure, using them isn't too complex, but ensuring that this :


if(conditional_expression) {
    // Do something fancy
} elif (other_conditional_expression) {
    // Do something even more fancy
} else {
    // Don't do anything fancy, do something cool
}

Is turned into executable code that has the standard scope mechanics enforced to to ensure we don't double-spend memory locations for variables, and a variety of other rules along the way isn't too easy. I wouldn't call it a triumph, but it definitely took me about 7 hours to get implemented correctly before any optimization.

NVM Development Update - 6/25/2020

Working on this project the last few months has been awesome. I've learned a lot about VM / Compiler design. At this moment the current language I've been working on for high-level usage has begun to stall. I created a grammar, preprocessor, error handler, semantic analyzer, code generator, and implemented quite a bit of functionality you'd expect to see in a language. Right now loops (named while, and for) are implemented, if/elif/else statements, function calls, and using a preprocessor directive, you can include files to extend a program.

So with all of this done, why is development stalling?

Primarily it comes down to me finding deep problems with my design decisions. Here is how the Del language breaks down :

0. Entry file is passed into the driver
1. File is pre-processed and handles importing of external files
2. The Lexer / Parser begin to consume the resulting program
3. As statements are found, an AST is built and passed to the "Analyzer"
4. Semantic analysis ran,  and preparation for "Intermediate" layer
5. The Intermediate layer forms commands for the code generator
6. Code generator uses an "Aggregator" that populates ASM code based on "Code Blocks"
7. Resulting code given back to the driver
8. Driver calls the libnabla assembler to generate bytecode based on resulting ASM
9. ASM and Bytecode written to file.
10. Program exit (I just wanted a 10th point here)

Problem 1

Starting around point 3.5 is the first problem; How I handle expressions. As the AST is walked to verify type setting of assignments a postfix string is constructed.

int x = 4 + 4; => ASSIGN 4 4 +

In the beginning, when I was only working with expressions this worked great. The postfix expression and memory information for the assignment is handed to the intermediate layer so code generator instructions can be generated. Where this began to break down is when there are calls inside of expression, such as :

int x = my_func() + 3;

In order to facilitate the encoding of the function call into a postfix string I created the EnDeocoder. The purpose of this object was to encode the call to a string that stored passed in parameters, call address, and a few other pieces of data required to instruct the code generator to call the function, get the result, and include it into the calculation. The problem here is that once encoded, the intermediate layer had to decode it again before moving on. Had I chose to use an emitter instead of string concatenation, I could have bypassed the intermediate layer entirely and had instructions generated off of the tree walk. Its just downright inefficient, I don't know what I was thinking.

Problem 2

Memory management. Since the NablaVM isn't your standard VM setup I have to monitor an "artificial" stack pointer that dictates where in the global stack we are regarding function frames. This isn't too big of a deal, and easily manageable, BUT, how that impacted variable storage messed me up. In the memory manager of the compiler each item is stored as an "offset" from the starting point of the function that the variable is stored in memory. Not too big of a deal. Where that becomes a big deal is how variable length data needs to be handled.  

What I decided to do was.. cheat a bit. I made a VM Device called a "DataStore". Similar to how the other devices work, it executes its own byte code put together by Nabla ASM. This made it so I could store all variables "External" to the global stack, and each entry in the global stack is an address that directs you tot he data in the data store. This requires more instructions for load/store than before, which is fine, but as more complex data types come into play things will get tricky and verbose to load / store causing a bloom of code to simply retrieve a value.. Not ideal.

Problem 3

Types. Starting off I decided to have "int", "char" and "real". Ints are int32 or uint64, reals are a fancy name for double, and a char.. is an 8 byte single character... which isn't ideal, but I figured "Its just a starting point." These aren't so much an issue as much as how I model data types. Its pretty much restricted to the base types and allowing user-defined types will require editing almost every single object from the lexer to the generator. Maybe unavoidable, but definitely not ideal. I discovered this while wanting to implement a "struct". To make things worse, the means by which I store data in the external device makes the required amount of code to access elements here beyond belief. Its ridiculous and bad.

Problem 4

A fixable, but cumbersome-to-implement problem. Code generation. Right now things are working fine, but it isn't ideal. As the code generator receives instructions from the intermediate layer "blocks" are created that generate Nabla ASM that get stored in "aggregators" that create specific sets of code for functionality that get placed into logical units "functions" for the end code result.
If it works what is the problem? Its bulky and tightly coupled to the syntax of Nabla ASM. If thats a problem, whats the fix? Abstraction. I knew this would be an issue "for later" but I truly regret not taking this into account in the beginning. Similar to other tools out there it would have been nice to have an "intermediate representation" of ASM-like code that is generated, and then, at the end using some abstractions  converting the intermediate representation directly to Nabla bytecode. This would allow us to bypass the Assembler in libnabla and allow us to easily reflect changes to the underlying instruction set by simply updating the Nabla Abstraction class for the compiler.

Which way is forward?

I don't know. The Nabla VM seems pretty solid. Its fully tested and works great on its own. It has TCP/UDP File IO and all that jazz, but, with its strange (unique?) setup its difficult to realize a HLL for it. I considered learning some LLVM to see if I couldn't target it as a back end but after a day or so of looking into it I believe that the VM its self is too different from conventional hardware to get it to work.

At the moment I am considering locking development on the current NablaVM and Del implementations and starting off on an implementation with the current lessons learned.