This part of the tutorial will be longer than most, because we will examine each component of our program until we understand every single element of it. It won't always be like this (I will attempt only to explain new concepts as they arise in future posts). But for this groundwork, it is essential that we look in detail at everything.
If you have done assembly language coding before, you may want to skim-read the code itself, and then move onto the next Part... Otherwise... take a deep breath!
Writing the code
Open up Notepad++ and type the following (note! line-beginning whitespace is important here - but you can use the spacebar or tab key for adding that whitespace)
<space>processor 6502
<space>org $1000
my_label:
Note the line of separation between the org and my_label? This is purely for layout purposes, so the code is easier to read. Use line spaces to separate logically distinct areas of code - when you come back to your code in six months time, you'll thank me for this!
NB: In future, I will not specify whitespace with <space> - just remember that all instructions expect a preceding space on their line, except for labels (e.g. my_label), and code relocation statements (see later).
So, keep an eye out for such formatting in all future code samples!
Save the file you have created in Notepad++ (File menu, Save) - when you are prompted for a filename, use the filter drop down to select ".asm" as the extension / file type. Then choose a name, e.g. "MyFirstPrg"
(Once you have performed the initial save, you can use the save toolbar button or Ctrl S to save your file to the same place without a folder browser, just like in any other Windows app).
Now add the following lines to the end of your .asm file. Don't worry about their meaning yet - we shall cover this shortly.
lda #$08
sta $0400
lda #$05
sta $0401
lda #$0C
sta $0402
lda #$0C
sta $0403
lda #$0F
sta $0404
lda #$20
sta $0405
lda #$17
sta $0406
lda #$0F
sta $0407
lda #$12
sta $0408
lda #$0C
sta $0409
lda #$04
sta $040A
rts
Save the file.
Now, from the command prompt (ensuring you are in the E:\C64\src folder), type:
dasm MyFirstPrg.asm -oMyFirstPrg.prg
and press Enter.
If all has gone to plan, you should be returned to the command prompt with no errors. There will now be a "MyFirstPrg.prg" file in the src folder.
From the command prompt once more, type:
x64 MyFirstPrg.prg
This should launch the C64 emulator, VICE, with your newly created program.
Next, we need to execute the code we have loaded into the emulator. Type:
SYS 4096
into the running C64 and press Enter. You should see this:
Notice the text that has appeared on the first line of the screen!
Congratulations! You have just written your first Commodore 64 assembly language program :D
What does it all mean?
So, you would probably like to know how that all worked?The starting two lines of our .asm file are fairly simple.
The first line tells 'dasm': "We are targetting the 6502 processor", which is what the C64 uses (actually, the C64 uses the 6510, but the assembly code is the same - so we choose this).
The second line says: "We are starting our code at memory location $1000".
The dollar sign tells the assembler that the number that follows is in Hexadecimal. If you are unsure what this means, take a look at this wikipedia page.
Hexadecimal (or Hex) numbers are handy for computer programming, because they concisely represent numbers in a form appropriate for machines that, at their lowest level, use binary. Binary is extremely long-winded to type out, but conveniently hex numbers are both shorter while also organising numbers into logical groups of 4-bits.
For this reason, most numbers in assembly language programs will be represented in Hex, rather than either binary (the machine's notation) or Decimal (our usual numbering system). It's basically a "best" of both worlds.
$1000 in decimal works out as "4096".
The C64's BASIC interpreter (what you start in when powering up a C64) uses decimal numbering to be human friendly. This is why we type "SYS 4096" to launch our code (SYS simply means "execute SYStem code at the following location").
NB: If any of this is confusing at this point, it would be a good idea to take a break from this tutorial and read up on number bases (specifcally the Hex and Binary representations, and their relation to our usual Decimal system) before proceeding. All that follows will assume familiarity with these number forms.
The Actual Code
Right, so memory code placement location and processor choice out of the way... What does the actual code do?
Our first line of code is:
lda #$08
This translates as: "Load the Accumulator with the value #$08".
The accumulator is the 6502 processor's adding/subtraction register. It is the primary storage location for values during most calculations.
Shorthand for this register is the letter "a". Hence, "lda" is the same as saying "load a".
A register is a storage location within the CPU (the 6502 in this case). Because it is within the processor, it is extremely fast to access - unlike system memory (i.e. the storage cells in the RAM chips).
If you wish to perform calculations, you should prefer using registers over memory when the choice is available.
Now, we know from our "org" statement that $ means Hex, so $08 clearly means 8-in-Hex - which happens to be the same as 8 in Decimal (or 00001000 in binary).
But what about the # character?
CPUs have several different ways to address memory. If instead we had written:
lda $08
what this would mean would be "Load the Accumulator with the value stored at memory location $08". That is to say, load a with the contents of memory location 8.
In a newly started system, a memory value could be anything (though it will most likely be either part of the built-in ROM code, or reset to zero by the operating system).
This is not what we want - we wish to choose not the location, but the value itself.
The # character says "the value" instead of "the value of memory location", turning our command into:
lda #$08
or "load the accumulator with the value 8".
(NB: This form of CPU addressing is referred to as immediate mode, because the immediate value following the instruction is used, rather than the memory location or address).
Phew! If you have never touched assembly language before (or even programmed a computer before), all of this might seem a lot to take in - so relax, and ponder what you've read up until now before proceeding with the rest of this post.
***
So back with us? Good!
The next command is:
sta $0400
This means "store the contents of the accumulator at memory location $0400".
As it happens, in a newly started C64, memory location (or address) $0400 is the start of the screen memory. The C64 starts off in "character mode", which means the mode used to draw letters to the screen.
Given our previous command, this in combination effectively means: "store the value 8 at location $0400".
The value 8 turns out to be the character code for the letter "H". So in human terms, we are actually saying: "store the letter H at the first location in character screen memory", which displays the letter H on the screen!
If I then tell you that the C64's character codes start at #$01 for A, #$02 for B, and so on - can you now guess what the rest of the (mostly similar) code is doing?
(Clue: value #$20 is a SPACE character...)
Hopefully, you can see now exactly what is happening here. Each line of code sets the next memory location with the character we require to make up the text "HELLO WORLD".
There is just one last thing to clear up: what does "rts" mean?
Well, when we call our code using the "SYS" command, under the hood this uses an instruction called "jsr" to jump into our code (in this case at 4096, or $1000).
"jsr" means "Jump into SubRoutine".
A sub-routine is a self-contained block of code that, prior to entry, remembers where it was launched from (we will cover this much later in the course, but for now, assume the previous location is stored somewhere by the computer, to be retrieved when needed).
"rts" means "Return from SubRoutine". Since our code was called from BASIC via the SYS command, "rts" tells the C64 to "Return to BASIC".
So all that instruction means is: finish our program now, and go back to where we came from!
What a lot of effort just to say Hello World!
Well... yes! It might look that way upfront. It is possible you have heard of assembly language being spoken of in hushed tones by experienced programmers, who insist that assembly language is "hard".
In fact, assembly language itself is very simple - in the sense that each instruction does a very small amount of work. It is in fact this that makes assembly language trickier than higher level languages such as BASIC or C/C++. Assembly language is, ironically, "hard" because it's "simple"!
You have to string together many smaller commands to make up the code for what you want to achieve. That is (possibly) the downside.
But the upside is that you have total control over the CPU in your machine. You are speaking the language of the machine itself, and that gives you unprecedented power over what you wish to achieve.
As with all things, with such power comes great responsibility. Use it wisely!
Which is what we shall now try to do in the following posts :D