Backup of
https://joelrees-novels.blogspot.com/2020/09/33209-rocks-moving-ahead.html.
Chapter 14.1 Rocks -- Bit Multiply
Chapter 14.2: Rocks -- Moving Ahead
[Please pardon the layout change. Google is being the 800 pound
prima donna and making all blogspot users use a buggy blog editor now.]
"Are
you sure that there's nothing in what you and Julia just went over that
you won't be claiming as IP?" Bill wrinkled his forehead.
I shrugged. "Trying to claim IP on this kind of thing is only one step
beyond trying to claim IP on binary addition. No real circuitry to base
claims on, nothing but ideas and math."
(We won't mention a very famous, wealthy corporation that did, in fact,
attempt patent claims on binary addition in an ALU, buried in its claims
concerning a programming language and programming environment they
developed and sold. We also won't dwell too much here on the fact that,
once upon a time, ideas, math, and algorithms were considered outside
the domain of patents in the USA.)
"Would it be possible," I asked, pressing my own agenda, "to reduce the
cycle count to one on reads and two on writes in the direct page RAM,
without blowing the transistor budget on a 6805 or 6801? That alone
could better than double the speed of software multiply and divide."
There was a bit of uncomfortable chuckling and clearing of throats.
"No?"
Several engineers looked at Pete. He shrugged.
Tobias tilted his head apologetically. "We'd have to fix the prefetch/decode circuit so it's a real pipeline of depth one."
"It's not a real pipeline?"
"The
eight-bit designs don't have a place to keep the instruction in its
partial and fully-decoded states, so we go back and redo the prefetch if
we don't use it immediately."
"Oh."
"And then we'd
have to test it. Testing is what we get stuck on budgeting time for. You
should talk with your brother about that."
"Denny's not in charge of test, is he?"
"No, but he could tell you something about the backlog."
Our Bob spoke up, "Could interns help with the grunt work?"
Motorola's Bob exchanged glances with Bill, then turned to Jesse. "Should we look at that?"
"Maybe we should," Jesse frowned. "I'll discuss it with my group on
Monday, see if we can separate something out that a non-engineering tech
could handle."
"Remember that these guys seem to have a bit more of a handle on the tech than our usual crop of interns."
"We'll take that into consideration."
I ventured a bit further. "I'm not just thinking of fast direct-page
RAM, though. The 6809 and the 68000 have enough index registers to
support separating the parameter stack from the return pointer stack,
and that means one might profitably attach a hysteric cache to both
pointers, with the appropriate control signals."
That got me looks of confusion and amusement.
"A cache that tracks the stack pointer with hysteresis." I borrowed Julia's notepad again and sketched out something like this:
Ms. Philips reached over and lifted the notepad and waved it at me. "I am sure this is IP."
"It's
just a lousy diagram of spill-fill cache tied to a stack pointer.
Calling it hysteric is a bit of a pun, is all. Not even a really good
pun, at that."
Jesse started chuckling. "If it works," he commented, "it'd be more appropriate to call it an anti-histrionic stack cache."
A number of other engineers echoed his chuckles of appreciation.
Ms.
Philips and Ms. Steward put their heads together and started working on
something. Bill and Motorola's Bob refrained from comment, keeping an
unobtrusive eye on what they were working out.
I added, "If
the cache could also be accessed in single-cycle reads and two-cycle
writes, local variables would be almost as good as registers."
Bill
leaned forward. "We've taken a lot of your time on this blue-sky
brainstorming, but Bob and I wanted to get your opinion on something."
I let the amusing, but perhaps meaningful mixed metaphor pass and nodded.
"If you were designing a mass-market personal computer using an existing CPU, would you use Intel's 8086 or 8088?"
It
was my turn to be confused. "Sloppy segments. Instruction set is an
improvement over the 8080, but not much. No. I'd use the 6809 for the
instruction set and register set before I'd use the 8088, even though
it's a bit slower on multiplies and a lot slower on divides, and would
require bank-switching or the 6844 MMU for a PC.
(PC? I had
become familiar with the abbreviation in Japan while I was there as a
missionary. How quickly people forgot, in our real world, that there was
more than half a decade of PCs before the IBM PC.)
"And I'd
use the 68000 over the 8086 even though the 68000 costs significantly
more, because the 8086 just doesn't make sense. It requires 16 bit wide
memory, but it still gives only 16 bit addresses without playing bad
programming practice games with your code. Sloppy segments are a
security booby-trap, and a bug generator."
Bob nodded. "Are you
sure your antipathies are not colored by family loyalties? Tech doesn't
forgive misplaced family loyalties."
"Family loyalties may
induce some of the heat, but, really, if they want to map 16-bit logical
addresses into a 20-bit physical address space, they should make the
segments fully 20 bits wide. 24 or 32 bits wide would make more sense,
even if the top four or twelve bits aren't brought out of the package or
don't even physically exist. And the segments should have limit
registers, as well, if they're going to mean anything besides crude
bank-switching with the improvement of being able to tie specific banks
of memory to specific index registers, including the instruction
pointer. Half-baked MMU."
"But potentially useful, no?"
"With extreme caution."
"How about segment registers for the 6809 or 68000?"
"You
can use the 68000's address registers for segmentation if you want,
although the segment limit problem remains, and there is a memory cycle
penalty if you don't handle the segments well."
I stopped to think my next words through.
"If
I were adding segmentation to the 6809, I'd want full 32-bit segment
registers. The limit registers would be as wide as the index registers,
so if you had a derivative with only 16-bit wide index registers, the
limit registers would also be 16-bit. Instead of a segment override
prefix like the 8086, I'd just have the register-to-register transfer
instructions move the segment and limit registers, as well."
Bill and Bob were both nodding. Bill asked, "You've taken a look at the 68008, haven't you?"
"Yeah. But I'm letting Mike be the one to have fun with it."
Mike snickered.
"If it were available in, say, three months, in small lots, would you use it?"
"There
are a lot of things that a 4 megahertz 68000 is going to be no faster
doing than a 1 megahertz 6809, because of the memory cycle speed, the
extra width of instructions, and other things. Many of those things are
precisely what a personal computer is going to be used for. A 4
megahertz 68008 is going to be about half to two thirds of the speed of
the 68000, I think. The only advantage is the megabyte address space,
which really won't be quite enough in the near future."
Bill and Bob both frowned.
I
continued, "Now, if we had a further evolution of the 6801 with an
additional 8 bits on the top of the index register and program counter, a
long jump, and either a long load of X or a transfer A to XHi or some
such, at a price not too much higher than the 6801, that would make a
good cheap personal computer. Or that evolved 6809 with PC, X, Y, U, and
S extended by 16 bits, and new load effective address modes to make the
long addresses accessible, again, at a price not much higher than the
6809, that would be ideal for the current market."
"One megabyte is too tight?" Bob asked.
"64 kilobytes is too tight?" Bill asked.
"Look
at the 6847. Julia and I and my sister write reports using that because
we are patient with the narrow window on the text, and we like the
ability to type, think, erase, and type again. But my mom just gets
frustrated, and my dad barely avoids going to sleep. People with no
reason to be patient won't get it, and they are the ones who will be
buying most of the personal computers sold. A personal computer has to
be able to show the equivalent of a typewritten page on its screen, at
minimum, or at least have a clear upgrade path to get there. That's
what's stalling Radio Shack's Color Computer in the market right now.
Besides lack of MMU."
Pete said, "But that would only be a 2
kilobyte screen buffer. I've seen the Japanese personal computers, and
they're pretty functional with only 16 bits of address."
"How functional?"
"All the useful characters."
"Not
by a long shot. Less than two thousand. The real count for a good
newspaper is estimated at over 3,000 characters, but they aren't taking
into account that what will be included in that 3000 will vary from
month to month. And even newspapers will use really oddball characters
regularly, when they need something more precise in meaning, and if you
include the ability to display all the oddball characters, you're well
into 9,000 characters or more. Add historical characters and you easily
triple that count. Chinese is on the order of a hundred thousand
characters. Sixteen bits doesn't cut it, except for very limited
purposes like cash register receipts and utility bills."
"You can't be serious."
"I've
lived over there. I know the hype they give the current crop of PCs and
the sell-job they give the new student of the language, and I know the
reality when you start reading serious literature."
"How does anyone remember them all?"
"They
don't, but that's going to be one of the things a real personal
computer will be good for, helping them find and use the ones that they
have trouble remembering. The personal computers they have now are very
limited in scope relative to what they need, and what they will have in
the future. They sell because they don't have anything better."
I
continued after a moments' thought, "If the characters are to have
decently defined glyphs, you want bit-mapped characters that are 32 by
32 pixels, not 16 by 16. 10,000 characters at 128 bytes per glyph is
going to eat up a megabyte of address spaced pretty quickly." (Vector
glyphs were still a bit exotic for a conversation like this, that year.)
"And graphics." I pointed at the TV. "How many kilobytes is
the graphics mode screen buffer on the 6847, for just fuzzy monochrome
on a color TV?"
"Six."
"How would the same resolution
graphics in four colors be, if the 6847 supported it, or if you modified
the output and added the RAM?"
"An extra bit per pixel, so twelve."
"That
takes 12K out of the program space on the 6801 or 6809, just for four
colors, and everyone will want a much bigger gamut of color. And
resolution at least double what the 6847 offers. 64K was tight to start
with, and a megabyte will soon be tight for color graphics. One
advantage, I guess, to the 68008 is the implicit upgrade path to the
68000, but 24 bits of address will shortly be too few, also."
"16 megabytes too tight? RAM is expensive," Sharon pointed out.
"If
you don't want to be a foundry for other companies' designs, you have
to have a base technology where you develop your testing and
manufacturing techniques. That's RAM. It pays for itself without even
being on the market by helping you get your other products right,
faster."
"That kind of thinking'll push the price of RAM right through the floor," Motorola's Bob said with a frown.
"But
you won't care, because RAM pays for itself in shortening your
development cycles for your profitability products. RAM should be like
candy, anyway."
"RAM should be like candy." Bill harrumphed.
"I think you've said that before." He reached into his briefcase and
pulled out an advanced information datasheet and handed it to me. "Has
Denny shown you this?"
The datasheet described the 68010. I scanned it quickly. "No. Can Julia and Mike also take a look at this?"
"Sure. And anyone else in this room, really."
I showed Julia the changes in the addressing mode, allowing 32 bit constant offsets, and the short loop cache mode.
She tilted her head grinned apologetically. "I guess it's an improvement?"
"Definitely. And the exception frame looks more manageable."
I passed it to Mike, and Bob and Jennifer looked over his shoulder.
After
a quick scan, he looked up. "Why isn't the 68008 based on this? The
short loop execution mode would be especially useful when memory's only
eight bits wide."
"Timing. Market and management." Bob shrugged.
"If
I were you guys, I'd hold the 68008 off until I could make it an 8-bit
version of the 68010. In spite of the fact that I personally really want
to get my hands on one."
I nodded my agreement with Mike. "Or, if
you just have to have an eight-bit 68000 now and this allows testing to
complete more quickly, plan and advertise a 68018 that will be an 8-bit
68010."
"What if we have plans for adding more addressing modes
and wider math, and dropping the loop mode for a small general cache, in
a CPU in the early planning stages?" Bill's face was unreadable. "Not
saying we do, but what if?"
I took a deep breath. "You know,
extended mode was added to the index post-byte for doing memory indirect
on absolute addresses. I'm wondering how much more it would have cost
to included direct page in the index post-byte, as well. That would
allow using the load effective address instruction to get the address of
a direct page variable without using the accumulator. But adding much
more would get into negative trade-offs."
"You're not talking about the 6809?"
"Exactly.
The 6502 needs two kinds of memory indirect because it's so register
poor. And those two kinds were a very strategic choice. The 68000
already effectively has both kinds, because it has lots of indexable
registers. It doesn't need more, not considering how much it will cost
to test and get right. And it especially doesn't need addressing modes
that can be as quickly executed using existing instructions and a
register or two. Sure, eight address registers is a shade tight for some
uses, but you don't want to clutter the upgrade path to a 64-bit CPU
with a bunch of untestable addressing mode."
There was a chorus of cleared throats and exchanged glances.
"Instead,
would it cost too much to somehow allow engineers to experiment with
variations of your primary designs, to push the envelope?"
"What do you mean?" asked Bill.
"Like a skunkworks, but officially supported."
Motorola's Bob leaned forward. "Assuming we dare put our fab facilities at risk, where are we going to get the manpower?"
"Just let your engineers take up to eight hours a week on blue-sky projects on company time, no questions asked."
Sharon shook her head. "We're already short of time."
"Blue-sky
projects give you a chance to figure out better ways to do things.
You'll end up being more efficient and closer to on-schedule."
"Hard to believe," Pete complained.
I shrugged. "Well, you guys have the experience, not me. I've said my opinion."
"Okay,
we have another addendum." Ms. Philips and Ms. Steward looked up from
their writing and interrupted, and Ms. Philips showed Bill what they
had. He passed the addendum to Bob, and Bob looked it over and passed it
to me.
It consisted of mutual permission to use ideas and
concepts we had talked about over the course of a couple of hours that
night with a promise of best effort to offer each other consideration.
The five of us figured that was more than agreeable, and added it to our
agreement contracts.
As we wrapped up, Jesse asked me, "Could you put a Forth interpreter on a 6805?"
"Self-hosted?"
"Of course."
Julia looked up from the notes she and Ms. Steward were arranging to make copies of.
"Self-hosted?" she asked. "That's where the language
runs on the same processor that compiles the code, kind of the opposite
of the cross-assembler that runs on the 6800 but produces code for the
6805?"
I nodded. "Yeah. Maybe self-hosted could be done, if you have enough ROM and RAM. The virtual
instruction pointer needs more than 8 bits, but self-modifying code
might work -- using an extended mode jump where the code writes over the
jump address before executing the jump. Cheating, but it might work."
Jesse smirked and I chuckled.
Julia asked, "Can you show me an example?"
She handed me her pad again, and I wrote out some code:
NEXTIP
LDA IP+1
STA SELFMO+2 ; direct-threaded
LDA IP
STA SELFMO+1
SELFMO
JMP $EEEE ; provisional target address
* The 16 bit address $EEEE just got overwritten by the target address.
She looked at it with a frown. "What's the purpose in this?"
"It's the part of the virtual machine emulator where the CPU calls the
code to emulate each virtual instruction. And each emulation routine
ends in a jump back to NEXT."
She tilted her head. "Sorry. I'm totally lost."
"For example, the routine to add two numbers on the stack would look something like this:
PLUS
LDX USP ; parameter stack
LDA 3,X ; low bytes
ADDA 1,X
STA 3,X
LDA 2,X ; high bytes
ADCA ,X
STA 2,X
INX ; drop argument
INX
STX USP ; update the stack pointer
JMP NEXT
"The routine for a jump would look something like this:"
BRANCH
LDX IP ; IP is pointing at the in-line offset.
LDA IP+1
ADDA #2 ; bump past offset
BCC BRANC0
INC IP
BRANC0
ADDA 1,X ; add the low byte of the offset
STA IP+1
LDA IP
ADCA ,X ; and the high byte
STA IP
JMP NEXT
"And the routine for nesting calls would look something like this:"
CALL
LDX RSP ; return address stack
DEX ; room for old IP
DEX
STX RSP
LDA IP+1
ADDA #2 ; bump past call address
BCC CALL0
INC IP
CALL0
STA 1,X ; tuck the address to return to away
LDA IP
STA ,X
And then I was stuck.
"Wait. This isn't going to work."
Jesse chuckled again.
I
went back to the NEXT routine. "Yep. I'm forgetting to actually get the
jump address in the NEXT routine, and maybe a bit more."
Jesse agreed with a grunt.
I
shook my head and laughed. After staring at the code for NEXTIP for a
minute or two while Jesse smirked and Julia looked puzzled, I shook my
head. "Not having a sixteen-bit pointer is a real pain."
Julia
met my eyes and sighed. "Don't worry about it. I don't think the eight
kilobyte maximum address space is going to leave much room for a program
to run in, anyway."
"Yeah, but they're going to eventually make a chip with a full sixteen-bit wide CPU. I want to convince myself of this."
Her forehead creased.
"We need to grab two bytes pointed at by the sixteen bit IP in the direct page."
NEXTIP
CLR NXADD1+1
LDA IP+1
STA NXADD1+2
INCA
STA NXADD2+2
BNE NEXT00
INC NXADD2+1
NEXT00
LDA IP
STA NXADD1+1
ADDA NXADD2+1
STA NXADD2+1
NXADD1
LDA #$EEEE
STA NXJMP+1
NXADD2
LDA #$EEEE
STA NXJMP+2
NXJMP
JMP $EEEE ; provisional target address
* Had to overwrite lots of addresses.
I sighed. "And that's going to run us out of RAM."
Jesse let out a horse laugh.
"I guess this needs to be done a bit more simply."
"No.
I think you nailed it. But put the code up to NXADD1 in ROM, followed
by a jump to NXADD1 in RAM." He continued to chuckle.
Julia said, "It's okay. I don't care. We're all tired. Let's go home, or, well, back to your brother's place."
"But I want to work the rest of this out. Borrow from ..."
She
took the Forth listing I had picked back up and her pencil and the
sheet of paper I was trying to work on out of my hands while Jesse
laughed.
"You got a real jewel there, Joe," he said. "You better
listen to her. And don't worry about the Forth on the 6805. That's about
as good as it gets, and as Julia says, it's not much use until we have a
6805 MPU with fourteen bits of address. And I look forward to working
with you as an intern, and having you join us when you graduate. I like
the way you think. I think we all do." He looked around at the engineers
and his managers, and everyone nodded in agreement.
I suddenly turned Japanese and ducked my head. "Sorry. I mean, thanks."
Chapter 14.3: Rocks -- what?
[Backed up at
https://joel-rees-economics.blogspot.com/2020/09/bk-33209-rocks-moving-ahead.html.]