There was a reason...
Just about every corner was cut in producing the ZX Spectrum: it had no custom hardware of any kind, short of a very simple ULA chip that did as little as it could get away with in order to have a working computer system, and the 16K ROM chip, with BASIC in it.
One consequence of this cost-cutting exercise was the strange, non-linear screen memory layout...
The Spectrum provided 256 pixels along the x-axis, and 192 lines along the y-axis.
Each byte in screen memory contained eight pixels, with no colour information. The colour was set in the attributes, a sequence of single bytes that were placed in memory immediately after the pixel information, and which specified the foreground and background colours for each 8x8 pixel cell.
Though the attributes are placed linearly, the pixels themselves are not. You can see this quite clearly when Spectrum games load their title screens:
The loader here is simply reading all the bytes from $4000 (16384) to the end of screen RAM at $5AFF, in order. As you can see, the pixels are arranged in a very "interesting" fashion...
This makes accessing screen memory a little bit more involved than simply adding Y lots of TotalXBytes to find the line.
Finding screen line addresses from a Y Coordinate
Given a Y coordinate stored in an 8-bit register:
Y Coordinate bit layout (MSB to LSB):
zzxx xnnn
we can produce a 16-bit address in a register pair by placing these bits as follows:
Register pair bit layout (R are the bits for the x-axis offset, 010 provides the base address $4000):
010z znnn xxxR RRRR
Who knows why on earth the engineers designing the Spectrum decided to swap the position of the nnn and xxx bits... but they did.
(I'm sure it was the same kind of reasoning as providing a console with half an altivec unit... ahem.)
Geek Moment - Some code to use this stuff
Here is some Z80 code that makes use of this knowledge, to clear the screen - not in the order as shown in the loader above, but sequentially (from the user's point of view).
; Clear the screen in an ordered, top to bottom fashion
;
; Entry: None
; Exit: A, BC, D, HL all trashed
; (push on stack if needed)
; (push on stack if needed)
;
; Notes:
;
; This is obviously slower than simply blatting from
; $4000 to $5800, but that would reveal the three sectors
; of the screen AND the alternate line pattern
; (both seen when loading screens)
; (both seen when loading screens)
; This is an example method to show how to address screen
; RAM using a Y coordinate instead, so we can move from
; top to bottom, line by line.
; RAM using a Y coordinate instead, so we can move from
; top to bottom, line by line.
;
clearscreen_ordered:
LD B,192 ; num y lines
XOR A ; clear the accumulator
; (MUCH faster than LD A,$00)
; (MUCH faster than LD A,$00)
LD C,A ; c == y coord
; Here, we create the screen line pointer in HL,
; based upon the given y-coordinate in C.
; based upon the given y-coordinate in C.
; The y coordinate needs to be shuffled about a bit to be
; in the correct format for the Spectrum's bizarre
; hardware.
; in the correct format for the Spectrum's bizarre
; hardware.
ylinesloop:
LD A,C
AND $7 ; get first three bits,
; in the same position in H
; in the same position in H
LD H,A
LD A,C
AND $38 ; next three bits need to be
; shifted left twice,
; and placed in L
; shifted left twice,
; and placed in L
RLA
RLA
LD L,A
LD A,C
AND $C0 ; last two bits need to be
; shifted right three times,
; and placed in H
; shifted right three times,
; and placed in H
RRA
RRA
RRA
OR H
OR $40 ; also include the base address
; (screen mem starts at $4000)
; (screen mem starts at $4000)
LD H,A ; HL = ptr to line
; Individual lines are thankfully arranged in a linear
; fashion in memory. We can simply increase the
; pointer by one each time to clear a single line.
; fashion in memory. We can simply increase the
; pointer by one each time to clear a single line.
LD D,32 ; bytes per line (256 pixels)
xbytesloop:
LD (HL),$00 ; all pixels cleared
INC HL
DEC D
JP NZ,xbytesloop
INC C
DEC B
JP NZ,ylinesloop
ret
This code ignores the attribute memory for convenience's sake (that is linear, and would be a straight blat anyway... though it might be nice to do it in sync with each group of 8 lines... ;-))
I bet the hardware guys were thinking 'This is trivial problem to solve in software, we'll leave it to them to sort out' Hehe
ReplyDeleteIirc, the Speccy inherited much of its screen layout from the zx80/81. With 2k of RAM, they simply couldn't afford to be wasteful.
DeleteThe Speccy had more leeway, but presumably they wanted to keep it reasonably compatible while adding functionality.
Nah, literally none of the Speccy's screen layout is from the ZX81.
DeleteActually it is a trivial problem to solve in software and it shows what geniuses the Speccy designers were...
ReplyDeleteIn the words of Toni Baker (if you don't know who she is google "Mastering Machine Code on your ZX Spectrum"):
"If the print position for a given square has address HL then the eight bytes representing the “pixel expansion” of a character must
be stored at addresses HL, HL + 0100, HL + 0200, HL + 0300, HL +0400, HL + 0500, HL + 0600 and HL + 0700. The instruction INC H is
effectively the same thing as ADD HL,0100."
Ah, but that only works for 8 pixel line groups (key phrase being "given square") - I know I didn't show this in this example, but I was well aware you can do this.
ReplyDeleteIt is less useful for more general pixel placement, however, because if I start at an arbitrary Y coordinate, I cannot simply use INC H 16 times to show my sprite.
I still need to know precisely where I am before I can do such operations. It did occur to me that I could test for that instead of the more general method above, but I was doing the article as a retro piece for people who (mostly) never had a Spectrum, but are interested in hardware :-)
Good to see interest in the article, though!
Easiest way to solve this would be a lookup table that maps y coord to an address. It may even be one of the faster ways as well.
ReplyDeleteI always wondered if it was a design fault, or intended.
I did in fact try this as an alternative (I also used that technique quite frequently on the Amiga - in spite of its linear frame buffer, it still helped enormously with isometric calculations, for example).
ReplyDeleteBut with the Z80, it actually took longer with a lookup due to the limited number of registers that can be used for memory addressing and for caching coordinate values (I only needed one more register pair, annoyingly - I considered (ab)using the IX/IY registers for the purpose, but they are far too slow).
I ended up having to swap values about between registers and memory, which negated the advantage of the approach.
There may still be a more cunning way to achieve this, though - let me know if you find it! (Preferably with a code example :-))
It’s great to see that younger developers are still interested in learning this kind of stuff. Not only is this the best way of becoming a competent developer, but these older ROMS are also packed with software-math you simply cannot learn by programming an FPU alone.
ReplyDeleteAs for calculating a screen address from an X,Y coordinate pair, the following routine is slightly tighter than the one presented above:
;Input:
; D = Y Coordinate
; E = X Coordinate
;
;Output:
; DE = Screen Address
;
calculate_screen_address:
ld a,d
rla
rla
and 224
or e
ld e,a
ld a,d
rra
rra
or 128
rra
xor d
and 248
xor d
ld d,a
ret
Note the xor-and-xor bit-merge code above - this was a common technique for avoiding additional round-trip loads on CPUs where ALU ops were bound to a common accumulator.
If you already have a valid screen address in DE, then the following routine can be used to increment its Y location by one:
;Input:
; DE = Current screen address
;
;Output:
; DE = (Y + 1) screen address
;
increment_y:
inc d
ld a,d
and 7
ret nz
ld a,e
add a,32
ld e,a
ret c
ld a,d
sub 8
ld d,a
ret
Whilst there may be few practical reasons to actually clear the screen in linear fashion, the following routine illustrates a slightly different approach from that shown above:
clear_screen_linear:
ld hl,16384
ld a,l
ld d,h
ld e,1
ld b,a
ld (hl),a
ld c,30
exx
ld b,192
next_scan:
exx
ld l,b
ldir
ld c,32
ldd
ld e,a
inc d
ld a,d
and 7
jr z,check_block_end
ld a,e
exx
djnz next_scan
ret
check_block_end:
ld a,e
add a,32
ld e,a
jr c,check_scan
ld a,d
sub 8
ld d,a
ld a,e
check_scan:
exx
djnz next_scan
ret
NOTE: JR (not JP) is employed in the above code, in order to capitalize on its faster fall-through.
Whilst I'm sure the above code is far from perfect - it's been ~30 years since I last coded Z80 - I have certainly enjoyed this little trip down memory lane.
Oh...I forgot to mention that if you employ the EXX command, BASIC expects the alternate HL' register to be intact upon return.
ReplyDeleteGreat stuff! I had a similar routine for working out the memory location, but yours is faster (xor d parts primarily). Also, I didn't think to make a single line increment routine for valid addresses - much nicer than always using the do-it-all method.
ReplyDeleteThere seems to be a fair bit of interest in this stuff still, and as you say there is a lot to learn even today from the ROM code (I used to have a fantastic ROM disassembly book that explained everything line by line, and I learned more from that that any other source on Z80 coding).
I'm personally probably not as young as you might think, though - I originally learnt the beginnings of coding on the Speccy when very young, but moved on rapidly to other machines. It's interesting to me to go back to it now, with a more solid programming foundation, and see what can be done under those tight constraints.
I think the life this one small article seems to have probably makes a complete project worthwhile, if only for academic purposes! (The Raspberry Pi might make this kind of endeavour very timely, in fact.)
I would love to see the feedback from such a thing (say, a small game, which was my original intent with all this). I really hope I can find the time to do it - watch this space!
The real reason they chose this layout is to speed up VRAM access.
ReplyDeleteLet's suppose 8x1 bitline address bits are 010ttwww yyyxxxxx, then it's color attributes address is 010110tt yyyxxxxx. Note that low byte is the same.
This allows to do so called 'page mode' of DRAM chips of era. It's the principle of DRAM chips: first we feed it with 'column address part' and then 'row address part' before we can get or set data bits. 'Page mode' means we can skip feeding 'column part' several times per quick series of access and feed only 'row part'. This saves a time and makes process quicker. For first models of ZX Spectrum it was cruisial to speed up VRAM access due to it was shared with CPU. To minimize stalls VRAM structure was chose to keep bitplane bits and their corresponging color attributes within the same 'DRAM page' - their low 7 bits of addresses are the same, so they have the same 'row address part'.
That's the trick. Inside this requirement such a layout was chosen to simplify char access - as was said above - incrementing high byte of address choses next pixel line of the same char. That is it.