After probing the waters in this thread:
https://www.applefritter.com/content/how-important-are-iwm-features-apple-iic
I finally got enough motivation to proceed with reverse engineering the IWM myself. I'm well aware that other have done that long ago, as obvious in the above thread, but everything I found on the usual sites like github is Verilog or VHDL based, and I can't use that with my much older (early 1980s era) tools for the smaller PLDs I use. ispLEVER of course could do Verilog / VHDL but that tower PC is down and needs repair. And having an IWM in a ispLSI1016 would not help me much, as I only have a few of them. And no, I never again buy any ICs from Chinese vendors. Most are fake and a total waste of time and money. Smaller PLDs I have in huge numbers. And they never left the USA since they were made.
Here is the current state of the IWM reverse engineering work:
Over the weekend, I've finally built a hardware rig to exercise the real IWM, see here:
Illuminated yellow LED No. "0" means stepper motor phase 0 is on, and the green LED illuminated means that drive #1 is selected and its motor is running (of course, there is no floppy disk drive attached). You may notice that the hardware is based on the MMU test rig I used for this work:
https://www.applefritter.com/content/uncle-bernies-iou-and-mmu-substitutes-plds
... just that I pulled some TTLs, and added a DIL-28 socket for the IWM. What is left in terms of TTLs is a 74LS14 hex Schmitt Trigger and a 74LS374 octal DFF (hidden under the grey probe clip). The '374 produces all the slower input signals for the IWM, such like RESET, RDDATA, SENSE, A3, A2, A1, A0. The faster signals like the FCLK and Q3 clocks and the /DEV device enable signal are generated directly from the control port of the LPT printer port, and go through the 74LS14. Here is how FCLK and Q3 look on a scope:
The time base is 2us, so the clock period of 4us at FLCK is as fast as I can get with generating all the timing by software. What you can see on the scope is one CPU cycle. The lower trace is a marker signal for CPU cycle start. The CPU cycle ends with the last FCLK falling.
The software running on a notebook computer under DOS contains a routine which feeds any desired RDDATA bit stream to the IWM. By reading the shift register of the IWM (every CPU cycle) I can tell exactly how the data seperator works.
This primitive rig comprising only two TTLs and a DIL-28 socket for the IWM has all the capabilities needed to explore all modes of the IWM, which are: 7M or 8M clock, SLOW or FAST bit cells, synchronous or asynchronous mode (and PORT mode), and reading and writing of floppy disk data streams.
So from my original plan to reverse engineer the IWM in two weeks, 6 1/2 days already have been consumed (the work started in earnest on 5th of March, 2024). Alas, I had some other things to do in and around the house so it may have been only 3 days of work on the IWM reverse engineering, and the rest were household chores. But within this small amount of time, I have built the test rig and wrote the low level driver software for it. I'm already getting data out which looks almost as if it works as expected. But a deeper analysis of course involves much more programming work.
Once I have confirmed / corrected my state diagrams for the control state machines I suspect to be within the IWM, I can proceed to synthesize the logic for the IWM substitute and integrate the gate level logic equations into the software driving the rig. Then, the original IWM and my equations will run in lockstep and any difference will be automatically discovered and flagged.
This reverse engineering approach is a little bit different from the MMU approach, because I already have the RTL (and the logic equations) for the DISK II floppy disk controller. So what is left is to put in the changes for the extra functions within the IWM, and not starting from scratch. This knowledge base also is the reason why I only allocated 2 weeks for reverse engineering the IWM. It's basically the same as the DISK II controller but with two octal latches / pipeline registers added: one to hold the data byte to be written to floppy disk, and one to hold the data byte that was read from floppy disk. In between the two octal latches sits the same shift register as in the DISK II controller (function wise). The octal latches are in transparent mode when the IWM is in the synchronous (legacy) mode where it mimics the DISK II, with a small exception related to the "freeze" period where the shifting stops so the 6502 gets enough time to grab the data byte. These are the little added details I need to figure out.
I'm very curious about what I will find out about the data separator of the IWM vs. the DISK II. If it's not the same STG in legacy mode, then they (Apple) did shoot themselves in the foot (again). Because the original STG seen in the DISK II has some ugly warts which may or may not be exploited by copyprotection formats. In my exploration work of the DISK II I have discovered that these "ugly warts" can be removed from the STG and the resulting floppy disk controller is cleaner and works fine with all "official" Apple GCR, but I doubt it may work with all copyprotections out there, so I kept these "warts" in my PLD substitutes for the original DISK II controller. And I also intend to put the same STG into my own flavor of IWM substitute. And switch it to another STG in all the other modes (all non-DISK II modes).
Stay tuned !
- Uncle Bernie
https://github.com/mamedev/mame/blob/master/src/devices/machine/iwm.cpp
... but it's C++ and not any synthesizable RTL. It could be analyzed and then rewritten as synthesizable RTL, but this may take longer than starting from scratch, based on the real hardware. It's also about the (software) infrastructure needed around the iwm.cpp core to make it work. No way to run MAME on a 640 kByte RAM DOS machine.
What I want is a synthesizable IWM which is based on slight changes/additions to my existing RTL and test code base for the DISK II. The work of the MAME coders is too far away from the actual hardware to make that a viable path for me.
But maybe one of those MAME coders could answer the question about the compatibility of the IWM vs. the DISK II controller when it comes to copy protected Apple II floppy disks. I already have found enough differences that I suspect there may be an issue with at least some copy protections not working on the IWM (see my next post).
- Uncle Bernie
Running the first disk nibble / data separator tests, I already found evidence that the IWM is a different animal then the DISK II controller. This may have profound and dire consequences for copyprotected Apple II disks.
The most striking difference is the behaviour of the MSB when reading from floppy disk. In the DISK II, the shift register ("SR") shifts in data to the left until it is "full", meaning the MSB is set. A program looking at the SR contents can see the MSB coming as it gets shifted through the SR. When the MSB is set, the shift register in the DISK II stops shifting while the data seperator waits for the next "1" bit coming from the floppy disk. This typically takes 4 us. Then it looks for the next 6 us (12 Q3 cycles) if another "1" comes. And only after that time it clears the SR and shift in "11" or "10", depending on what it saw. This temporary "stall" of the SR gives the 6502 just enough time to capture the finished disk byte which always has the MSB set.
The IWM does it differently. Once the leftmost bit (bit #6) should be shifted into the MSB (bit #7), it disappears for exactly one FCLK cycle. And then it reappears. And then the shift register stalls as in the DISK II.
Assuming we read in a bit sequence 11111001 (0xF9) from the floppy disk, the DISK II will produce:
0x7C 0xF9 0xF9 .... (stall)
while the IWM will produce:
0x7C 0x79 0xF9 .... (stall)
I can make the conjecture that this behaviour is intentional, because there is a "port mode" in the IWM in which /DEV (device select) is permanently asserted and the MSB (D7) is used as a clock signal whose rising edge will pump the data byte stream into some downstream data pipeline. The "disappearing" MSB generates a data setup time for the first register of this data path, which is exactly one FCLK cycle long. Otherwise the "port mode" would not work or it would need an external DFF to delay the MSB rising edge by the appropriate amount.
In Apple document 19840288 which can be found on the web, there is a warning: "... IWM will not work reliably in synchronous mode when using a NMOS 6502 ... but it will work with the WDC 65C02." The next paragraph of said document then goes on syaing that "... IWM may change DB7 just before the end of the read window of the CPU ..."
I think that this bug is rooted in the design decision to have the "disappearing" MSB to make "port mode" work without an external DFF. What a shame ! This weird behaviour should only be present in "port mode" which is defined as an asychronous mode (ASYNC = 1) and LATCH = 0, where ASYNC and LATCH are bits #1 and #0 of the IWM mode register that is so cunningly hidden from inadvertant access by legacy Apple II software.
I would think that in legacy modes (ASYNC = 0, LATCH = 0) which are supposed to be DISK II compatible the "disappearing" MSB should not happen. But it does.
IMHO Apple should have fixed that bug in a later revision of the IWM. So that the IWM would also work with NMOS 6502. But in the specimen I have (Apple part #344-0041-A) this evil MSB behaviour definitely is present also in synchronous mode.
There also seem to be some other, subtle timing differences how the IWM SR shifts, but so far my software is not yet in shape to probe into that.
Not sure if such differences, however small, could affect copyprotections and make the load of legacy Apple II copyprotected floppy disks fail. Maybe somebody more knowledgeable about this topic could make a comment. In other words: are there Apple II copy protections known which fail to load on an Apple IIc but work with an Apple IIe, and not being caused by differences in the firmware between the two machines ?
- Uncle Bernie
Here is a screen shot showing the disappearing MSB:
PDRM4589.JPG
All IMWs (about 5 pcs. incl. one burnt) I have around in several serviced devices are 344-0041-B . I wonder where did you take your -A version from? On my opinion -A variant is very rare and this most likely is a result of corrected bug(s) by Apple.
In post #5, retro_devices wrote:
" All IMWs (about 5 pcs. incl. one burnt) I have around in several serviced devices are 344-0041-B . I wonder where did you take your -A version from? On my opinion -A variant is very rare and this most likely is a result of corrected bug(s) by Apple. "
Uncle Bernie answers:
My IWM (A version) came out of a very early Apple IIc. I was not aware that these are so rare. Of course, further down the road I also need a "B" version to see what the differences between "A" and "B" might be. But at the moment I'm fine with the "A" version, it is good enough to develop and debug my code. You see, part of the "secret sauce" I use for such reverse engineering tasks is to have a hardware rig in which the original resides, driven by software running under DOS (no Windows or Linux can do that ;-) and within the software there also lives the RTL / logic equations which, when the software is complete, run concurrently with the real IC. And any differences get flagged. This setup (once complete) allows a very thorough automatic compare of the RTL to the real hardware.
I ran about 50 copyprotected games (WOZ files on a BMOW floppy emu) and about 20 copyprotected real floppy disks (all originals !) on this Apple IIc before I took out the IWM it had. And as far as I can tell, none of them refused to work, except for the original of "Centipede" by Atarisoft. Which seems to be damaged despite I came out of an unopened shrink wrapped box. The WOZ file of the same game works.
So the "A" version of the IWM may not be too bad, at least in the legacy mode. The other modes may have bugs, I'm not there yet. But on the long run I will need to find a way to get a "B" version of the IWM and an early Mac in which this IWM was used. I could care less about my IWM substitute having working "Mac" modes never used in an Apple IIe or IIc (after all this is for my Replica 2e project) but as long as it's not excessive additional work, I prefer to do a complete job before I take the test rig apart again. Its destiny is to morph into a card that plugs into the Apple-1 and will have the final versions of all my Apple-1 add-ons (DRAM expansion, Apple II compatible color graphics, and the floppy disk controller). Normally I don't repurpose such cards but this one is rare and hard to find in good unused condition.
- Uncle Bernie
" You see, part of the "secret sauce" I use for such reverse engineering tasks is to have a hardware rig in which the original resides, driven by software running under DOS (no Windows or Linux can do that ;-)"
Actually you can do anything under Linux you could do with MS-DOS, you'd just need to do things a little differently. Something like an LKM (Loadable Kernel Module) would be one approach for doing low level hardware interfacing. Theoretically you could also write a Windows device driver, but Microsoft's closed source model basically makes that out of reach for anyone other than big corporations these days.
FWIW, part of the reason I know it can be done in Linux is all the test rigs that the place I am currently working at (Major fabless semiconductor company) run Linux. They only use Windows for end stage testing because they can't tailor and control it enough to do the early stage low level hardware testing on CPUs and GPUs.
The pen is not important for the writer, it is the written that is important.
Linux is not a RTOS, which is not critical with the analysis of this static chip but there is no point one's task of bitbanging LPT to coexist with a super-puper complicated multitasking OS. DOS is morte than fine and has less execution jitter than linux. I would have used DOS too. No need to comly with and study the huge number of restrictions of a full-fledged multitasking OS, moreover to study its way of writing kernel-level drivers to achieve a few simple IO operations that in essence are equal to single CPU instructions.
In post #8, 'retro_devices' wrote:
" DOS is more than fine and has less execution jitter than linux ..."
Uncle Bernie comments:
Spot on. The execution jitter caused by premptive multitasking systems is really bad, and foils oscilloscope work, this I why I don't use Linux for this kind of reverse engineering work, despite I know how to bit-bang a LPT port under Linux, it does not even need a special kernel, just some system calls to gain access to the port (only possible if the program has root privileges). But there is no way to disable the interrupts in Linux (at least I did not find a way to do that).
DOS, with all its limitiations, is perfect for that kind of work. I use MS C7.00 and it even has a library function to enable/disable interrupts.
Forget Windows for any such low level hardware work. Not worth the effort / learning curve dealing with their closed system. It's mental poison.
Yet another topic is building hardware add-ons which require fast data transfer rates. I have a cute little development from the 1990s which was later named "Ratweasel" after I learned that a "Greaseweasel" exists. The pun is intended. Cross a rat with a weasel and you get something that is really lean and mean. The downside with this development is that it requires ECP / DMA mode for the parallel printer port from so-called "Super I/O" chipsets to make it work. These can only be found in older laptops. It's a pity that technical progress has made this development obsolete. It can read and write flux transistion streams with 50ns timing resolution (plenty of resolution even for 2us bit cells / MFM) and interfaces to Apple II disk drives (so all the half tracks and quarter tracks can be reached). It could read and write any format if the software to analyze and synthesize flux streams would be written or ported to it. I never had the time or motivation to do that. So I'll wait until the Applesauce guys make some new hardware that is cheap enough to buy. "Ratweasel" is only 3 cheap TTLs and 2 GAL16V8. This is what I think is "lean and mean" ;-)
This said, here is the problem: if you want to do anything like that with today's computers you need USB and then you need a complete 32 bit computer in that hardware to run the USB protocol stack. The "Greaseweasel" uses a cheap ST microcontroller but does not support Apple II floppy disk drive interfaces. What I fear that whatever next generation of Applesauce they might brew up, it might be too expensive for casual use. Over the past decades, there were heinously expensive flux engines around, huge PCBs with lots of ICs, and an FPGA in it, I wonder why they made it so expensive and used more than 3 TTL and 2 GALs, all costing less than $5. Seems that lack of brains can be compensated by throwing lots of expensive ICs at a mission. The "Greaseweasel" is definitely the smartest solution, only brain, almost no hardware needed. Hence, cheap.
But as it turned out, for the IWM reverse engineering work I don't need a flux engine. I have another ongoing project of lesser priority, and this is reverse engineering of the Western Digital FDCs, including those with analog PLLs. This can't be done without a flux engine. At the moment the rig is such that "Ratweasel" plugged into a first DOS notebook makes a flux stream which goes into a WDC177x plugged into a second DOS notebook. This allows me to explore the digital data seperator of the WDC177x family and once that is done, I can proceed to the FDCs with analog PLL based data seperators. Of course, the WDC177x exploration couldbe done with a similar rig as for the IWM. But when I make the clocks by software (at a much lower speed) then I can't use it to read a real floppy disk, which is part of the mission.
You can see that I am very interested in this floppy disk controller stuff. I hope I can complete all the work before all my floppy disks have deteriorated to a point where they are useless.
As for the IWM work, I'm currently exploring the read windows and already have found a lot of head scratching behavior patterns NOT compatible with what the DISK II controller does.
- Uncle Bernie
Even when the IWM reproduced copy-protection signatures correctly, sometimes the detection logic needed to be tweaked to get the desired result from the IWM.
When Jordan Mechner archived the source code for Prince of Persia, it included the copy-protection logic. In POPBOOT0.S at line 364 there's a comment from someone (Roland?) who was troubleshooting triple-E7 copy protection with an Apple //c. (Source code here.)
PoP IWM comment.png
That comment on line 364 suggests that the copy-protection had to be modified to make its data-register-synchronization trick suitable for the Apple //c. And that shouldn't be a surprise because the Disk II had a single register for both data and WP-status, whereas the IWM has separate registers. On the Disk II, merely activating WP-sense (with a write-protected disk) would instantly reset the state machine and force the data register to be cleared. Apparently the IWM was ultimately capable of doing that, but it took a little extra tweaking to make it work.
For what it's worth, Bank Street Music Writer used that same triple-E7 copy protection scheme, but it booted unreliably on my roommate's ROM 255 Apple //c. I'll bet they used a detection routine that omitted the fix [hack] shown in the source code above, and thus their detection only worked intermittently with the IWM.
I remember back in the 80s that the Apple IIc was "not 100% compatible" with the Apple IIe.
This may have had something to do with it.
In post #10, 'S.Elliott' wrote:
" For what it's worth, Bank Street Music Writer used that same triple-E7 copy protection scheme, but it booted unreliably on my roommate's ROM 255 Apple //c. I'll bet they used a detection routine that omitted the fix [hack] shown in the source code above, and thus their detection only worked intermittently with the IWM."
Uncle Bernie thanks you !
This insight from a past long ago is very valuable for my work on the IWM substitute. Because I now know that I better include a "DISK II" compatibility mode. Along with a fix for the unreliable operation of the IWM with a NMOS 6502, the reason of which is well documented by Apple. This is not a design bug, as I see it, it was put in intentionally to make the PORT MODE work reliably with good data setup/hold time. They (Apple) shot themselves in the foot with that noble intent because the same timing trick to make PORT MODE work without an extra external DFF (how much would that have cost them ?) screwed up the data setup/hold time for the NMOS 6502. The CMOS 6502 would 'regenerate' the crumbling data on the data bus internally, and so it worked with the IWM ... barely, I think.
During the two days of snow storm here in Colorado Springs I could not leave the house and so I could not do the required research on the internet to solve another riddle I found in the IWM. As I planned in post #1, reverse engineering the IWM should have been smooth sailing, no more than 2 weeks planned for it, because of two factors:
a) I have a complete RTL solution for the DISK II with clock cycle exact "C" language models and test benches
b) I had a printout of U.S.-Pat. 4,742,448 which describes the IWM in great detail, and had already coded the "stuff" seen in the patent as "C" models
So I thought it's gonna be a quick job.
But I soon discovered erratic behaviour of the IWM in the test rig, the same test run several times would fail some times, which was traced to the fact that the RESET pin of the IWM does NOT reset everything and does NOT bring it into a defined state. In hindsight, this should have been obvious from the '448 patent Fig. 2 --- in the figure, RESET only goes to two blocks, and the read circuitry has no RESET (ouch).
What is worse is that it does not use Q3 as a timing signal for the SLOW modes. Instead, it has an internal divider (most likely, a toggle FF) which divides FCLK by 2. The state of this internal FF is erratic and unknown and it is not affected by RESET. So some software tricks must deduce the state of this FF by observing the response of the IWM to the challenges. And then force synchronization of the "C" models to the presumed internal state of the IWM.
And this is greatly complicated by the fact that the update of the read data register from the internal shift register IS NOT happening as described by the '448 patent. The '448 patent claims that the update happens after each shift operation. This is not seen in the real IWM on the lab bench. It does indeed update the read data register after each "0" shifted in, but the update after a "1" shifted in happens with a delay. Which so far I did not find out with 100% certainty how that is controlled (but I have an idea). This behaviour for "1" shifted in however is more consistent with the behaviour of the DISK II controller, which takes 4 x Q3 cycles (2 CPU cycles) from detection of a negative transition on RDDATA to the "1" being shifted into the shift register. However, the delay observed in the IWM is much longer than that. (Note that the "rules" for the case when a disk byte is complete, MSB of SR set, are different --- the above discussion is for what happens before the MSB is set).
So the IWM Rev "A" does not do what is claimed in the '448 patent. This patent has quite an interesting story that was not told (AFAIK):
U.S.-Pat. 4,742,448
Inventors: Wendell B. Sander Robert Bailey
Assignee: Apple Computer, Inc.
Filed: Dec 18, 1986
Continuation of Ser.No. 573,067 Jan 24, 1984, abandoned.
The '448 patent was granted May 3, 1988, shortly before the Apple IIc was discontinued in August the same year. (ouch !)
I am going to look into the Application of Jan 24, 1984 to see if there are any differences in the description of the inner workings (it turned out the new search system of the USPTO does not find it, but there are other ways). This is the literature research work that was greatly hampered by the snow storm. But as long as internet service providers spy on their customers, collect all the webpages visited, and build a "personality profile" they sell to third parties, I refuse to have internet at home, linked to my person, and must use anonymous public wifi. Of course with a specially tailored notebook computer which has zero personal stuff on it. Not good enough to protect me against malicious state actors, but probably good enough that internet providers can't build a 'personality profile' for me which they then sell to marketing or plain criminal organizations. The internet is both a curse and a blessing ... I remember a past when patent research involved travel to the patent office.
- Uncle Bernie
I've now a 'C' model which runs in lockstep with the original IWM in the test rig, when being fed with 'possible' GCR codes (including gross timing variations, the data seperators finally seem to be the same). But it still fails (different data seen on the outputs) when I feed it with 'impossible' GCR codes. It seems that the system is particularly sensitive to '1' bit cells being too close together. This can't happen on normal magnetic media based floppy disks due to the limitations of the read amplifier, but it might happen with certain copy protections based on 'flux holes', zones with no flux change or no magnetization. This is bad news because it means there still is a difference between the real IWM and my "C" model. The good news is that despite of the different outputs seen under these strange conditions, for most cases, the 'lockstep' will be re-established until the MSB is set and indicates a disk byte to be ready for the RTWS code. I'm now focusing on finding the reasons / exact mechanisms where this recovery does not happen. This should give enough insight into how the "C" model must be improved until 100% state machine equivalence is reached for the read mode.
For the write mode, I already have a "C" model that nicely runs in lockstep with the original IWM. No issues there seen yet.
Alas, the two weeks I've allocated for the reverse engineering of the IWM are over now, and I must spend more time on other things. And I even don't have RTL yet, it's still a "C" models. If I had a boss, he would now start yelling at me. If he was a psycho. (Most bosses are, only few are OK).
- Uncle Bernie
As mentioned above in the prior post, I do have a "C" model of the IWM read channel which matches the behaviour of the 'real' IWM in the test rig 100%, as long as I feed it with valid, honest, real-world Apple-ish GCR flux streams. I can even add some complications like longer streams of no flux, and the "C" model still stays in lockstep with the 'real' IWM.
Where it gets weird is when pathological flux patterns are being fed into the RDDATA input of the IWM. These violate Apple GCR rules by providing more flux changes (active transistions on RDDATA) as would be "normal". It is, however, possible to produce such flux change / RDDATA pulse patterns by using a MFM speed rated floppy disk drive. Or one which is fit for the 'FAST' mode used by Apple in the Mac (2 us bit cells).
'Normal', that is, 'SLOW' mode Apple GCR has 4us bit cells which correspond to 28 FCLK cycles of the IWM when it runs in 'SLOW' and '7M' mode. I call this the 'legacy' mode as it is supposed to behave the same as the DISK II controller (it doesn't, though). This is the configuration of the IWM in the Apple IIc.
It boils down the the question: "What happens if the IWM 'sees' faster flux changes (same as closer RDDATA pulses) than expected by the recording.
I found that under the above conditions (SLOW and 7M) everything behaves as expected as long as the distance between RDDATA pulses is >= 14 FCLK pulses, which corresponds to 14 cycles of 7.159 MHz, or 1.96us . . . you won't get two RDDATA pulses spaced close together like that out of a legacy / single density floppy disk drive. But a higher density floppy disk drive could produce these ~2us RDDATA pulse distances, and any 'gadget' which attaches to the external floppy disk drive connector could pump data at much faster rates, and much slower distances of RDDATA pulses. This is why I called these patterns 'pathological'. They should never occur with real floppy disk operations. But the IWM may be used with a 'gadget' not being a floppy disk drive.
So I deemed it necessary to explore what the IWM does when such shorter RDDATA distances occur, and I found something weird:
It seems that the 'bit cell timing window' generator has a memory for more than one flux transition. Which means it's likely not a simple counter or state machine. There is more going on within the IWM.
Here is an example:
If I put in RDDATA pulses at FCLK = 45 and 57, then the expected two "1" bits will be seen in the IWM data output byte (which may not be the state of the read shift register at that same instant of time) at FCLK = 67 and 79. The first "0" will appear at FCLK = 107. This is 28 FCLK after the last "1" was seen. So far this behaviour is not weird, it is expected.
But If I put in RDDATA pulses at FCLK = 45 and 55, then the expected two "1" bits will be seen in the IWM data output byte (which may not be the state of the read shift register) at FCLK = 67 and 77. The first "0" will appear at FCLK = 95 (67+28) and the next "0" will be seen at FCLK = 105 (77 + 28). This is 28 FCLK after the respective "1" were seen. This is weird because it would imply each "1" seen starts its own 28 FLCK wide window after it. If it was a simple counter (or state machine) to do the window, I would expect that the 2nd shift "1" at FCLK = 77 would kill / restart the window opened by the 1st shift "1" at FCLK = 67. But it doesn't do that. I have some tentative conjecture on how the circuit doing that may look, and it's either an earlier update of the shift register (less latency to the actual shift operation) combined with a further delayed update of the read data holding register from the shift register, or they did use shift registers to make the variable timing window - which in NMOS are much easier to do than programmable counters.
This weird behavior is valid for further reductions of the RDDATA pulse distance down to 6 FCLK cycles. After that the behaviour gets even weirder, at a distance of only 4 FCLK the two RDDATA pulses are still "seen" but it the readout from the IWM jumps from $0E to $3B. Without the intermediate state $1D. Since a normal deserializer shift register with one input can't jump by 2 positions within one shift register clock, this hints that whatever the IWM puts out for read is not updated after every shift, but only every so often, controlled by a (yet unknown) logic. The IWM patent describes the logic which blocks the update of the read data register while the MSB is set, but this is NOT the same thing as a delayed update while MSB is not yet set.
RDDATA pulses with a distance of 4 FLCK or less are treated as if only the first pulse is expected. This can be explained how the synchronizer / edge detector works.
I'm not sure if it is worth to pursue this weird behaviour for pathological RDDATA patterns any further. I think it may not matter at all for regular floppy disk operations (or floppy disk emulators). IMHO, they simply should not produce these pathological patterns to be used with SLOW/7M mode. But in the strict sense of 'state machine equivalence' this exact behaviour of the real IWM should also be present in any substitute claiming to be a faithful reproduction of the original IWM behavior.
Seen in a more abstract way, imagine the following:
You have a black box, and a panel with switches and lamps attached to the black box. You are not allowed to look into the black box. But you can put any number of switch combinations in and observe which lamps are lit.
You also know that there could be a specimen "A" or a specimen "B" of a digital device in the black box.
Can you find out which one is in the black box ? Well, put stimuli in (set switches and give a clock pulse) and observe the lights. If you can show a sequence of inputs producing a different output, you have proof that that specimen is different from the other.
The first question, of course, is whether you think it's worth your time to find such a sequence. And the next question, if you choose to spend the time to find such a sequence proving a difference, can you ignore that finding without endangering your whole mission ?
(These are trick questions, of course, please comment what you think)
CONCLUSION (for now)
One takeaway is for sure: the IWM is NOT state machine equivalent to the DISK II controller, even when the IWM is in SLOW/7M mode. What's worse is that the IWM does NOT have a 'dead time' after detection of a RDDATA pulse, like the DISK II has. All of the above was discovered when looking for that 'dead time' and instead of the expected behavior (dead time) I found something weird.
It will take me a while to ponder over these findings. Maybe somebody else who has already made a 'IWM substitute' can make a comment if this weird behaviour also was seen and whether it is deemed to be worth to replicate it. (Note that my IWM specimen is the "A" version, not the "B" version --- so far I did not find anyhthing on the internet explaining what the differences between "A" and "B" were).
Comments invited !
- Uncle Bernie
(Continuation of post #14)
Such weird behaviour of IC in the dark corner cases outside their (known) specifications can have several root causes:
1. It could be unintentional behaviour not actively sought by the designers. Unintentional behaviour patterns can creep in when logic reduction algorithms are used. Some states (here, pulses being too close together) could enter the reduction stage as having "Don't Cares" on inputs, because it's assumed that condition never could happen in the real world, or, if it happens, something is broken / defective anyways. For non-mission critical systems it then may be allowed to do anything it wants. So when explicitely probing (shining light) into the dark corners of the design, weird things may be found.
2. It could be intentional behaviour sought by the designers for some envisioned extra functionality, which however was not documented in the datasheets, for which can be various reasons: hiding the extra function from the world until the company uses it in an upcoming product, or hiding it from sight because it didn't work as expected. The latter happens with some 'last minute' improvements more often than not: "See, if we put in this extra gate(s), costs us almost nothing, we can have this magical extra feature ..." and because it should cost almost nothing. they put it in with minimum effort, not properly simulating the purported new feature, and then it doesn't work in first silicon and is never mentioned again. As if it has never existed ... and 40 years later we scratch our heads what that may be.
3. It could be an extra feature designed in for test purposes. Speeding up tests (least amount of test vectors on the production testers) is essential to decrease test costs and increase profits. Every second counts because every second of tester time costs money. Some test modes may be entered by supervoltages, others by input patterns which never occur in the real application (i.e. asserting read and write signals at the same time - IWM does not have such signals, but the RDDATA patterns leading to the weird behavior are not normal Apple II GCR and should not occur in the application, as long as a real floppy disk drive is used.)
For the purpose of making IWM substitutes, in all three cases, the behaviour of the 'dark corners' or 'undocumented features' could be ignored and could be different, because the reason(s) for having them do not apply anymore.
But what if somebody tinkering with the IWM in the past 40 years had found out about these 'undocumented features' and used them in some product ?
Such as fast serial communications ? Or some floppy disk copy protection ?
Who knows ? I can't know because I only have a very limited collection of Apple II stuff, and back in the day did not care much about the Apple II (except wanting to understand how the DISK II really worked, as it always has fascinated me, since it came out in 1978, it always looked like 'magic' due to the low IC count and no use of LSI FDCs which everybody else did use.)
Now, 45 years later, I know everything about the DISK II (hardware level and up to the RWTS, never cared about mundane boring things like the DOS, this is for pure software guys) and something about the IWM, but still not enough about the latter.
I can't start to write the RTL for the IWM substitute before I decided if I should put the 'undocumented features' in or not. Note this is the "A" version of the IWM. They may be gone or different in the "B" version. Which may be irrelevant for the Apple II world, but may matter in the Mac world. I just don't know.
This is why i'm asking for comments by people "in the know". The first question is if there was any device made by Apple which could (or did) exploit these 'undocumented features'. The next question is if the original 'Liron' controller had a 'A' or 'B' version of the IWM. And if anybody out there know about weird applications for the IWM where its RDDATA input is not directly hooked to a floppy disk drive.
Comments invited !
- Uncle Bernie
P.S.: It is of course always possible to update a "C" model to exhibit the same behaviour as seen in the real IC. I did that, and got a 100% match even for the "weird cases" - which proves they are not just "random" events, but systematic. However this "C" model would not yield a clean, lean, mean RTL implementation. It's more like a 'Rube Goldberg machine'.
I also do have a version which covers almost all of the weird cases, and would lead to a very elegant RTL state machine. So it's the most likely candidate. But it does not cover the most bizarre cases unless I put in "Rube Goldberg machine" features again. So some detail / wart is missing.
All this work is greatly hampered by the fact that the real IWM 'hides' the contents of the shift register sometimes, as it reloads the read data register only at certain internal states. This reloading logic is not described in the patent - they only explain the case where the MSB is set, indicating completion of a GCR disk byte (all Apple II GCR codes start with a "1" bit cell). But I observe the lack or delay of read data register reloads even before the MSB gets set. The best way to dodge this is to enable compare operations between the "C" model and the real IWM only when the real one changes its read data register. At that point, when something becomes observable on the real IC, the data must match. But it does not tell everything about the internal states which are not directly observable.
Over the weekend I ran some automatic exhaustive tests covering all the cases with RDDATA pulses being too close (closer than the Apple GCR for SLOW mode would allow. SLOW mode has 4us bit cells, or 28 FCLK cycles).
I think from that data collected it can be proven that whatever the sequential logic (or state machine) responsible for defining the bit cell windows and to initiate a shift a "0" or "1" into the deserializer shift register may be, it cannot be a single, simple counter, or a single, simple state machine with an internal state S and an input RPD (Read Pulse Detected), similar to the one seen in the DISK II controller.
Because what happens with such a simple state machine is that once it decides to make a transition (based on the current state S and the RPD input signal, RPD = read pulse detect), it 'forgets' the prior history to some extent. So unless a much larger state space is provided than necessary for handling a single RPD event within the bit cell window, it would have no way to 'remember' any RPD event that preceded the current RPD event.
But from the data I have I can see that it remembers all RPD events until they (or the lack thereof, meaning no new read pulse) have been processed into shift commands to the deserializer. Each RPD seems to initiate its own sequence of events timed by the FCLK: after a short delay (depending on the latency of the synchronizer / edge detector) each pulse causes a "1" to be shifted in the deserializer shift register, and a new timing window of 28 FCLKs is opened after which a "0" is being shifted in unless a new RPD event comes along in that window. Which would change the bit value to be shifted in at the end of that window to "1".
The astounding behavior seen is that each RPD event seems to trigger a completely independent sequence of timing windows and shift events, with the delay between the events in terms of FCLK cycles being the same.
There are two ways to model this behavior: use a sort of stack mechanism to store the FCLK count value X when RPD happens, and then do actions at X + N, X + M, and so on, until the sequence expires after 28 FCLKs. Or to have shift registers for the actions themselves. In other words, some form of memory must be there other than the four state bits a DISK II style state machine would need.
I don't see any of such contraptions and complications in the die photograph of the IWM, although there are some regular structures which may be shift registers (in the red frames):
Apple IWM_middle.jpg
Not having the time nor the motivation to make wall sized printouts of the IWM die photo and annotate every transistor and every net, the die photo is of not much help. So the only thing I can do to proceed is to analyze the data I have obtained further. I did not write the filter yet for the "three close RPD" event cases. Once I have that, I can see what happens with three RPD events within one bit cell window. If only two are recognized then I know there is a simple circuit level implementation to handle all cases of two RPD events in 28 FCLKs. But if all three are recognized, then most likely it is a shift register which triggers the events down the time line. It could just be fed with the RPD events and have taps to cause shift actions. Since the 7/8 mode bit could be handled by selecting different sets of taps by pass transistors, this may allow a much simpler and flexible circuit for sequencing the events than a programmable counter with two sets of preload values and two sets of decoders for actions. Last but not least there is the possibility that they really did use a 6 bit counter as implied by the IWM documentation found on the web. But the real IWM does not wait for 48 counts to decide to shift in a '100' sequence. It shifts these 0's and 1's at certain times related to the 28 FLCK wide timing windows defined by the RPD events.
Is there anybody out there who also has seen this weird behaviour of the real IWM ? (just want to avoid hunting ghosts).
Comments invited !
- Uncle Bernie
Could it be that Apple implemented for some reason a linear feedback shift register (LFSR)? If so the stack memory you are looking for won't be in the IWM.
In post #17, retro_devices asks:
" Could it be that Apple implemented for some reason a linear feedback shift register (LFSR)?
If so the stack memory you are looking for won't be in the IWM. "
Uncle Bernie answers:
A LFSR of this length could be used as a large counter with almost no logic, just shift register stages. The only candidate for such a contraption I can think of is the 1 second timer for the ENBL1 / ENBL2 change delays. It has to count 8 million FCLKs (assuming 8 Mhz clock). This could be done with 23 bits and only two feedbacks to the XOR. But most likely, they used some prescaler. I'm not so much interested in how that delay is being made, as in my IWM substitute I will use a RC time delay to do that.
No, I'm not looking for a "stack memory" in the IWM. I just mentioned how the peculiar behaviour for closely spaced RDDATA pulses could be implemented in a "C" model. The fact is that the IWM 'remembers' not only the position (in time) of the latest RDDATA pulse but also the position of the RDDATA pulses before it, unless the distance between the pulses is larger than the bit cell window. All these are "pathological cases" not found in real world floppy disk data streams (absent of configuration mistakes, such as using a double density drive / media with 2us bit cells in the SLOW mode of the IWM).
So far I have found sixteen different possible "C" models, all of which can run in lockstep with the real IWM for all non-pathological RDDATA streams. But only a few can do the same (running in lockstep) with pathological RDDATA streams having RDDATA pulses with lower than normal spacing. And none of these yield an elegant RTL implementation which is lean and mean. I am quite sure from the calibre of the designers involved in the IWM that their logic is elegant, lean and mean.
You might wonder why I'm so nit-picking about these "pathological" cases to be modelled correctly. This is because at some point down the road I want to invoke automatic reverse engineering tools which are not "AI" so they are dumb and they will, at times, inevitably produce RDDATA test streams which trigger these pathological cases. I wrote these tools more than 35 years ago and forgot too much about how they work that I can't add features which would avoid generation of certain 'prohibited' patterns. Back in the day there never has been a case where such 'prohibited' patterns had to be handled. A whole new formal description language would be required to implement that. Unless I would hard code the 'prohibited' cases right into the backtracking algorithms. Which is not a viable option. I'm running out of time.
More about the current state of work in the next post which I prepared offline.
- Uncle Bernie
Some more insights into the IWM reverse engineering process and the progress with it (or the lack thereof):
Superficially, as seen in the IWM patent, the read channel in the IWM looks deceptively simple: like in the DISK II, it is a deserializer shift register controlled by a state machine.
Complications, complications.
One complication over the DISK II is that the IWM adds a data hold register which is clocked at certain times to capture the contents of the deserializer shift register. This hampers any attempt to observe the deserializer shift register contents in real time. There is always a delay between a shift and when the new contents can be "seen" from the outside. This delay can be anywhere between one and several clock cycles after a shift. Or, if the data hold register is a latch that can be transparent, zero clock cycles after a shift.
The main difference to the DISK II is the way the "freeze" of the data upon the MSB being set is implemented. This happens when a disk byte is complete. The data seen by the 6502 cannot change for the duration of the sampling software loop, otherwise data may get unnoticed / lost. In the DISK II, this is done by a side branch of the state machine which does not shift but still waits for a "10" or "11" bit cell sequence. This takes 7-8 CPU clock cycles (a bit cell is 4 us, 4 CPU cycles, but spindle speed variations can shorten the time available). Shortly after the "10" or "11" bit cell sequence was detected, the shift register gets cleared, and the "10" or "11" gets shifted in, ending the "freeze".
In the IWM, there is no such trickery. There is a counter which gets preset (or reset) for each incomung RDDATA pulse. It counts up (or down) every CLK cycle. At certain values of the counter, a shift is initiated. Which may be a "0" or "1" depending on whether a RDDATA pulse occured since the last shift ("1") or not ("0"). In the IWM, there never is a "freeze" of the deserializer shift register, but the same effect as in the DISK II is accomplished by just not updating the data hold register for a while after the MSB has been set. The update commences after a "1" has been shifted into bit #1 (the second bit, first one is #0) of the previously cleared deserializer shift register, plus a few CLKs of further delay. Looking from the outside, this has almost the same effect as the "freeze" of the DISK II, except that the first new value ($02 or $03) appears instantaneously, without the intermediate values $00 and $01, which cannot be observed on the IWM outputs, but can be seen on the DISK II (although software would struggle to capture all of those for the same MSB event ... a logic analyzer of course would be able to show exactly what happens).
To complete the IWM read channel, there is a RDDATA synchronizer and edge detector, which is another shift register clocked by CLK. The first few stages (actual number yet unknown) just synchronize the RDDATA to the CLK. The following stages have at least one gate which discerns a negative edge and provides a signal of one CLK width to a state machine controlling the deserializer shifting, and the update of the data hold register.
The above is what is known for certain, as it is described in the IWM patent, U.S.-Pat. 4,742,448, which was granted on May 3, 1988 - shortly before the Apple IIc was discontinued in August of the same year (oh the irony). But the IWM lived on in the first Macs. Alas, the description in the '448 patent is lacking a lot of the finer details of how the actual logic in the IWM was implemented.
Lack of observability of inner nodes
You see, that the central issue with the various function blocks mentioned above is that they are not directly observable from the outside (other than the data hold register) and they involve a multitude of clocked circuitry with a unknown delay in terms of "number of clocks" between a RDDATA event and some internal action happening - such as presetting / resetting the counter and shifting the deserializer shift register.
Proposed exercise to see that multiple solutions for logic implementation exist
As an exercise, you can draw the block diagram and make some assumptions about what happens when. Then you add a number of stages in the synchronizer and adjust counter preload values and counter states which trigger shifts accordingly, to get the same sequence of events as observed by reading out the data hold register, for the same input sequence at RDDATA.
You can then see that there are many possible solutions. Some of which conform better to border cases (or the "dark corners" of the design spec - what happens if Apple's GCR rules are violated). Alas, none of the documents from Apple which can found on the web specifies exactly what should happen in such cases.
Border cases
As an example for such a border case, I found out that the real IWM "stalls" once three "0" have been shifted into the deserializer. Apple GCR rules only allow for no more than two "0" in a row. All Apple firmware and formatting conforms to this rule even for the SYNC bytes. There are no exceptions. But still, the IWM accepts three "0" in a row. And then, 28 FCLKs later (in SLOW/7M mode), when the 4th "0" would be shifted into the deserializer shift register, it refuses to do that. It "stalls". No shift happens. But again 28 FCLKs later, it does accept the "0" and shifts it in.
The bit cell window counter
How come ? The simplest explanation is that the IWM has a bit cell window counter which is longer than a bit cell. This is implied by some IWM spec documents (Apple drawing 343-0041-B seems to be the last revision available on the web). This seems to describe the "B" revision of the IWM (I'm working with the "A" revision now). This revison business is to be dicussed later ... the important piece of information to explain the three "0" reacion is the table at page 6 of said document, "Read data bit cell window":
IWM_bitcell_Window_snip.JPG
For SLOW/7M mode, the counter intervals are spaced 14 CLKs (28 FCLKs) apart. The table shows only the intervals for valid Apple GCR sequences having no more than two "0" in a row. But by adding "14" to the last row shown (for "100") we can guess how the "1000" came about:
35-48 "100"
49-62 "1000" (conjecture, added)
63 no shift. Next "0" shift scheduled 14 CLKs = 28 FCLKs later, unless a "1" comes along)
It is seen that is the counter would count further than as shown in the spec, the additional 49-62 window could indeed allow for the creation of a "1000" sequence in the deserializer shift register. But what happens at 63 ? And to which value does the counter roll over ? To 0 or to 7 ? We know that a "0" bit cell after crossing 63 does not cause a shift. The next "0" is only shifted in 28 FCLKs later. which would imply the counter rolls over from 63 to 7 (1 CLK) and the shift would happen at the transition from count 20 to 21 (14 CLKs from 63). The number of FCLKs is twice that, 14 x 2 = 28 (same as the "28" above).
This would explain the observed behaviour with the "stall" after shifting in three "0" in a row. Note that the spec says nothing about the rollover behaviour of the counter, nor does it mention how a "1000" pattern could ever get into the deserializer shift register. At least I could not find that spelled out. These "holes" in a spec are the bane of every IC designer and every test engineer (and reverse engineer, too ;-)
About the "B" revision of the IWM:
Page 4 of said IWM spec spells out that the "B" revision adds a "window" to the edge detector, which I initially suspected to be there in any case, being familiar with various FDC designs. Even the DISK II controller has "dead times" where the state machine ("Woz Machine") does not react to transitions on RDDATA. Here is a snip from the spec:
IWM_B_Revision.JPG
For me, this is good and bad news. The bad news first: this certainly affects some of the "bizarre" behaviour patterns observed with "pathological" GCR sequences (mentioned in post #13 of this thread). The good news is that this difference between the "A" and "B" revisions is only small, and an easy add-on after I have a faithful substitute for the "A" revision I have on the lab bench right now. Still, I would need a working specimen of a B" revision for this step (any donations ? --- Note that even a "defective" IWM might still work for checking the read channel --- I found that all "blown up" IWMs I desoldered from broken Apple IIc had damaged stepper motor control outputs, and the rest worked fine).
IWM bug not fixed by Apple.
What I find somewhat disturbing is that in the "B" revision Apple did not fix the bug (or "feature") of the IWM which makes it useless for NMOS 6502 based systems. They just put a warning in the spec, explaining what happens. It's a shame. As I see it, it's most likely a side effect of a "PORT MODE" feature which allows them to use the MSB as a clock for a downstream data path, a feature which is only useful for "PORT MODE".
"PORT MODE" is defined as ASYNC mode bit ON and LATCH mode bit OFF. So it would have been easy to disable this feature for the "legacy" (DISK II compatible) mode where the ASYNC mode bit always is OFF.
Too many possible solutions (or non-solutions) how to build an IWM substitute.
So far I have found 16 (sixteen !) different ways to implement the IWM read channel as a "C" model ("C" means the programming language, not yet another IWM revision). All of which pass a "lockstep" test with the real IWM as long as there are no two RDDATA pulses too close together (the "pathological" cases which should not happen with a real floppy disk drive, but who knows which weird things the primitive floppy disk drives Apple has stripped down to the bare bones could do under certain borderline conditions, such as in some copy protection schemes).
This is a real issue. "Full feature" floppy disk drives typically have elaborate digital signal condition logic after the read amplifier which guarantee a certain, constant pulse width of the RDDATA pulses, and the absence of erratic pulses coming too soon after a valid one. Apple floppy disk drives (from the DISK II system) don't do that, they route the output signal from the MC3470 read amplifier directly to RDDATA without any such conditioning (the 74LS125 tristate driver does not "condition" anything). But don't get me wrong: this criticism is not meant to slam Apple's floppy disk system design as being bad. Stripping things down to the bare bones makes great sense to get the lowest cost. Which then makes the product more competitive.
Some comment on the merits of the DISK II approach (paragraph may be skipped for those in the know)
The DISK II system, when it came out, was truly revolutionary because it was the cheapest 5.25" floppy disk system on the marketplace, and not by a small margin. This was the first time in known human history that a floppy disk system was affordable for typical microcomputer owners who were private persons or small businesses. Every other computer manufacturer had much more expensive and elaborate solutions. The low cost of the DISK II combined with the Visicalc software was pivotal for the success of the Apple II in the small business world. Without these two winning factors Apple probably would have gone out of business like so many of the other microcomputer companies of the time period did. Viable small business solutions was where the real money could be made. Those microcomputer companies who only catered to hobbyists and gamers were losers, not enough money to be made from that clientele. This is still true today for the hobbyist market segment, but gamers of course turned into a huge market worldwide. Somehow the unemployed masses of adults living in the basement of their parents have to be entertained other than with mind altering drugs of all sorts (legal and illegal). Aldous Huxley predicted this ("Soma" in "Brave New World"). But enough of that. Just wanted to put this in a context of current society, or, to be more precise, the decline of society. Readers of this post in 200 years (if mankind still exists) then can understand under which conditions my work was done. Back to the IWM reverse engineering.
How the verification runs are done.
Here is a screen shot of the result of such a verification run:
PDRM4598_Score.JPG
The "scoreboard" at the bottom shows that after 1 Billion (1e9) FCLK cycles, using random (but non-pathological) RDDATA streams, almost all possible bit combinations have been read from that RDDATA stream, some of which are not even valid Apple GCR. But all of them of course must have MSB set. This is enforced by both the hardware and the firmware in any Apple II system, and the reason why the table starts with $80. You can also see that certain values never get hit. These typically contain more than four "0" in a row - definitely not valid Apple GCR.
Limitations of the approach / theoretical background
With the current test rig I can't do much more clock cycles because of runtime. To reach 1 Trillion (1e12) clock cycles, a run would take several years. With a rig running at full clock speed of 8 Mhz this could be reduced to a few months, running 24/7. I once developed a methematical theory which would estimate the number of clock cycles needed to exercise every state and every transition of a given STG with a given residual uncertainty, when using quality pseudorandom input stimuli, assuming there are no "lockup" states reachable (which would be "booby traps" planted by the designers either intentionally to thwart reverse engineering attempts or unintentionally out of sheer incompetence).
This theory was verified with a number of example state machines, and published in an electronics magazine, and I think it's mathematically sound (there was never a rebuttal from academia), but back in the 1980s computing power was too low to tackle complexities like the IWM. I mention this to give some reason why I chose to use 1 Billion clock cycles for that run, and no less. It's based on my theory, and not on a "gut" feeling. Alas, I had to use some estimates for key parameters of the formula, because the actual implementation of the various state machines and their interaction in the IWM is yet unknown. A "black box". The fact that this state machine essentially has only one input (RDDATA) and that only two bits of the deserializer shift register influence the transitions (other then the trivial shifting itself) allows a reduction of state variables entering the equation. Otherwise the approach using random number based stimuli would be hopeless - for more complex state machines, this brute force approach would have runtimes in the order of the age of the known Universe, billions of years, or even more. Much more. Thus is the devastating power of the exponential function which only few people understand (this is why so many people get enslaved by compound interest, just saying "MAFF IS HARD", and lack of brains to tackle "EVIL MAFF" can always be compensated by doing more slave labor --- not my words: "The borrower is the slave of the lender." citation from: Proverbs 22:7-9 English Standard Version 2016 (ESV) --- so the scam / topic is 1000's of years old, and people have been warned by scripture).
In conclusion, it is worth mentioning that any such pseudorandom number based approach to exercising unknown state machines is futile once the state machine gets too complex. The power of the exponential function makes the search space grow beyond the capabilities of our computers.
Possible improvements of the method in case of small PLDs
The algorithm can be greatly improved when the state variables can be readily observed after each clock. Such as on small early PLDs (16R4, 16R6, 16R8, ..., 22V10). This allows the deployment of self-learning algorithms which apply prior learned knowledge about the STG to speed up the search for yet undiscovered transitions and states. The algorithm can also use sophisticated logic reduction algorithms (like ESPRESSO) from time to time to "clean up" the transition functions it has found. Then, based on knowledge about the limitations of the given PLD in terms of product term count per macrocell / output, it can further reduce the search space.
Proprietary software tools for reverse engineering
Back in the 1980s / early 1990s I had written some proprietary CAD tools using these techniques, and some of them were sold commercially with great success. My automatic "black box" reverse engineering tools however never reached the maturity to be a product. And the rise of the complex PLDs put an end to that automatic reverse engineering approach anyways. But it worked fine for simpler PLDs.
Apple TMG HAL automatically reverse engineered
I used these tools to (almost) automatically reverse engineer Apple's TMG HAL, which is a 16R8, and published the results on Applefritter. For more complex designs like the IWM it's hopeless to use the self-learning algorithms which work so fine for small PLDs.
Approach for more complex ICs (like the IWM)
Once I have a match with the brute force approach, and a netlist for that, my tools can automatically produce test vector sets which exercise every aspect of the state machines in the netlist. These test vector sets then can be fed into the real IWM on the test rig to discover differences, if any. Alas, not seeing differences does not prove that the state machine(s) within the IWM are the same as in the netlist (or the RTL which was synthesized into the netlist). All it proves is that the state machine in the netlist/RTL is at least a subset of the STG of the state machine in the real IWM. But if there is a difference found, it's not even a valid subset, but has a flaw. Back to the "drawing board" ... modifying the RTL.
Reason for also implementing "pathological" border cases
Hope this explains why I'm so nit-picking about the "pathological" cases of RDDATA sequences. Without having a match including those, I can't unleash these tools on the subject at hand. First I need a "C" model which gives me a 100% match with the IWM in the test rig, then I can hand code synthesizable RTL following the "C" model, then I can synthesize the logic into a netlist, unleash the tools on that netlist, and feed the result (test vector sets) back into the real IWM sitting in the test rig, to see if there is a difference (= flaw).
This is a lot of work. But I'm getting closer every day.
Comments invited !
(Especially comments from those 2-3 people in the world who alrady have reverse engineered IWMs - but please don't send me your RTL. Don't be a spoil sport. For me it's a welcome mental exercise for my skills and a good pastime. But if I made a gross mistake you can see from the above descriptions, any comment along the line - "It's not like that" would be welcome. But don't spill the beans !)
- Uncle Bernie
I am inclined to think the IWM emualtors some people are claiming to have done (so far) are only partial, they cover only some aspects of the IWM, for example sufficient functionality to be used in //c's. What is the NMOS 6502 problem? In Liron Smartport controllers the IWM works properly in the NMOS 6502 equipped computers.
In post #20, retro_devices asked:
" What is the NMOS 6502 problem ? "
Uncle Bernie answers:
It appears to be a data setup (or hold) time issue involving the MSB when the RX register is read, looking for a valid data byte. The Apple spec for the "B" revision (and some other documents) mention that the 65C02 regenerates insufficient logic levels on the (internal) data bus, but the NMOS 6502 does not (which I can confirm from its transistor level schematic).
These documents claim that this timing bug make the operation of the IWM with a NMOS 6502 "unreliable".
I don't have the Liron controller card schematic but I think they could have added some ICs to dodge the issue. Look for some register between the IWM and the data bus. They also might have chosen to just fix the MSB (Data bus bit DB7).
- Uncle Bernie
The IWM data bus is in parallel with the card's ROM data bus and passes thru LS245 to the slot's data bus. The LS245 delay could be sufficient to act as register but since the IWM issue is still not entirely clear (to me?) this cannot be concluded.
Just pinging in to say that I really enjoy your brain dumps UncleBernie! I've only come close the the disk controller when writting my emulator lately, so my knowledge is JUST enough to make sense of a little of it, but, I enjoy reading it anyway, thanks for typing it all ;-)
This is quite interesting. According to Apple's IWM spec for the "B" revision, the "B" revision just adds a "read window" after each RDDATA pulse, and this window, which also could be called a "dead zone" which prohibits any further RDDATA pulse to enter the internal IWM machinery. In other words, it ignores any 2nd RDDATA pulse coming too soon after an earlier pulse. (see the above posts on the topic for more details).
This morning I added the "dead zone" to my existing "C" model and did a few experiments with its effects.
The interesting find is that having such a "dead zone" will remove most of the weird behavior I see with closely spaced RDDATA pulses on the real IWM in the test rig (an "A" revision not having that dead zone / windowing function yet).
It is very difficult to find some digital logic which would make sense (from an IC designers point of view) and produce the exactly same behaviour as seen in the "A" revision of the IWM. The best solution for that riddle which I found was a long shift register into which RDDATA is being fed, clocked with the same CLK = FCLK/2 as in usual slow mode, and have several taps on that shift register which trigger certain actions, such as "shift in a 0" or "shift in a "1". As I mentioned in previous posts, there is a long shift register seen in the IWM die photo. Without spending way too much time to analyze the die photo any further, I can't say if that is the shift register I need in my "C" model to produce the same behaviour. It also could be just a long LFSR counter, i.e. to make the 1 second delay. In MOS technology, making shift registers costs much less transistors and die area than making counters, so this could be a reason for using shift registers. "State machines" in the more general sense were universally hated by IC designers back in the 1970s and 1980s because there were no automatic software tools to synthesize them and to do the layout. Everything had to be done by hand and the best way to make "state machines" was to use a PLA (a programmable logic array) which comprises a regular matrix of transistors that can form a sum-of-products array. By putting in a MOS transistor at a certain crosspoint of the matrix, the "product term" or "sum term" (corresponding to AND and OR) would get another input. The act of putting transistors in at specific places is the "programming" of the logic array. Any other attempt to make a "state machine" using random logic would get you into a quagmire of irregular combinatorial logic sprinkled with flipflops (or dynamic storage nodes). The design and layout of this style of implementation is very error prone, and consumes more time (= money) for the layout. But depending on the logic equations, it may be smaller and faster than a PLA. A lot of the early microprocessors did use this more costly approach for the control part outside the data path. But if you look closely on the die photo of the 6502, you can see a very regular array of transistors which looks like a PLA. This is the "instruction decoder" of the 6502. But it's not a true PLA. It's only product terms. There is no regular "sum" array. Instead, there is a very irregular mess of random logic between the "instruction decoder" and the data path (which comprises eight regular bit slices). This, of course, was a design decision to make the 6502 small and fast. But it certainly did cost them a lot of money to do that design and the layout.
Based on this historical perspective, I could see a good reason for the designers of the IWM to use a shift register based design style to implement the timing sequences in the IWM. Which then leads to the weird behaviour observed, where it seems to "remember" earlier RDDATA pulse positions despite one or two further RDDATA pulses have entered the digital machinery. This cannot be explained easily without invoking shift registers in lieu of "normal" state machines. (Of course, from a mathematical standpoint, even a shift register can be treated as a state machine, but believe me, you will not be able to draw the STG :-) . . . unless it's a very short / trivial shift register. For two bits it can be done, for three it's messy, and for more bits it gets intractable.
However, with the "dead zone" or "read window" seen in the "B" revision of the IWM, most (if not all) of the bizarre behavior for closer-than-normal spaced RDDATA pulses goes away. Which leads to a very clean, lean, mean "C" model. Which can be implemented with a trivial state machine having only four state bits / 16 states, and which would be mostly compatible with the state machine seen in the DISK II.
So this is my preferred solution. No need to waste time on implementing bizarre behaviour patterns which most likely are just a side effect of the logic level implementation method used in the original IWM, revision "A". The 4 bit state machine based implementation possible for an implementation of the 'cleaned up' revision "B" is also much better suited to be implemented in a PLD or CPLD. (Imagine wasting 16 macrocells on a stupid shift register just for the timing sequences).
So you can see where this is heading: I will only re-implement the revision "B" of the IWM.
But to do that I first need to find a "B" revision. I only have two "A" revisions. Does anyone want to swap one of his "B" revisions for an "A" revision ?
- Uncle Bernie
The only IWM that I have is on a LiRON card (card for connecting a UniDisk 3.5 drive or a SmartPort device to an Apple II). Those are fairly rare and valuable (though not as much as they were before the Yellowstone card came out) and I only have one but I might be willing to loan it for non-destructive testing. I'm pretty sure it is one of the later revisions but I'd have to dig it out and check the chip to be sure it is a -B.
I can help building the same LPT tester and send you the logs from an IWM-B version with your DOS software. The LIRON card's IWM is not socketed.
I seem to recall that there is a DIP-28 IWM chip aboard early all-in-one Macs, too.
Mac 128, 512, Plus, SE, SE/30...
There's a far greater liklihood to find a dead Mac for harvsting than there is to pull one from a dead IIc or Liron card.
In post #26, 'retro_devices' wrote:
" I can help building the same LPT tester and send you the logs from an IWM-B version with your DOS software.
The LIRON card's IWM is not socketed. "
Uncle Bernie answers:
Thanks for the offer, but I don't want to waste your time with that. Building such a LPT tester takes a while and unless you later turn it into something more useful (as I intend to do, it will go into the Apple-1 as a memory expansion/graphics card/floppy disk card) it's a waste of time.
The difference between "A" and "B" is very small. I can turn the "A" in my test rig into the "B" situation by just changing one constant which defines the minimum RDDATA pulse distance. So it's a very trivial thing not worth to invest a lot of time or money for.
There is only one catch / possible pitfall, which is known in the field as the "fencepost bug": if you have to build a fence over a distance of N units of length, how many fence posts do you need if their distance is the length unit ? The obvious answer is: N + 1 fenceposts are needed.
The problem applied to the IWM is the lack of precision in the language they (Apple) use in their documentation about the 'read windowing': if they say it's 6 CLKs wide, what does that really mean ? Six clocks beginning with the clock which detected the RDDATA edge or six clocks after that clock or six clocks after the RDDATA pulse goes away ?
This can only be examined using a real IWM, rev. "B".
If I had one specimen, it would take me 10 minutes to find out. But as you mentioned, the "Liron" card (and the Macs) always has the IWM soldered in, and the same is true for all the Apple IIc I ever repaired due to defective DRAMs or IWMs (there is a link, when the DRAM fails, but the machine still runs, as the bootstrap loader is in ROM, then it may say "Disk Error" and then users try to mess around with the floppy disk drive, blowing up the IWM in the process).
I'm looking for a IWM which does not come out of a very rare and valuable card.
Could buy one from UTSOURCE but as these are Chinese sellers, it may be fake (more likely fake than not).
I really appreciate your kind offer to build such a LPT card, but I don't even have a complete schematic for it: I build these simple little gadgets from 'schematics' in my head, as I have done it often enough over the past 40 years to remember all the pin numbers on the LPT connector and the pin numbers on these 74xxx ICs (it's always the same types I use). But in case of the IWM part, I drew a schematic just to make 100% I don't make a wiring mistake which would blow up a precious IWM which are so notoriously hard to find. I could send you a photocopy of that. Send me a PM if you are interested.
- Uncle Bernie
@Uncle Bernie, you can send me whatever handwritten schematic you have. Two TTL ICs and one 28 DIP socket connected to the LPT is rather simple to me. Did that kind of programmers, readers, dongles back in the day a lot.
On the other hand if just one pulse width must be measured maybe this can be done with a scope and a LIRON card without desoldering anything from it? Just by controlling the IWM from the Apple2 itself?
In post #29, 'retro_devices' wrote:
" On the other hand if just one pulse width must be measured maybe this can be done with a scope and a LIRON card without desoldering anything from it? Just by controlling the IWM from the Apple2 itself ? "
Uncle Bernie answers:
It's not just 'pulse width'. I'm not sure if 'pulse width' matters in the IWM. My software allows to set the RDDATA pulse width, and as I found out, there is no different behaviour of the IWM if the RDATA pulse width is varied. The minimum pulse width is one FLCK. This means they run the synchronizer at FCLK (~7 Mhz or 8 Mhz in the Mac). But all the other state machines run at FCLK/2 in case of SLOW (legacy) mode I'm most interested in.
Except for some downstream details, like the MSB being delayed (this is always a FLCK event).
It is true that a LIRON card could be exercised in an Apple II without desoldering the IWM, but a specific hardware would need to be built to generate the RDDATA pattern in real time, dictated by the clocks made by the Apple II itself.
This could be done but IMHO is a waste of time. It is quicker to add a 50-pin "slot socket" to such a test rig to be able to exercise the IWM in a LIRON card without desoldering the IWM.
BTW, if you use truly professional desoldering equipment (a soldering iron with a hollow tip with suction provided by an electrical vacuum pump) the risk to destroy the IWM in the process is low, but alas, it is not zero. These a 40 year old ICs now, and formation of intermetallic compounds at the point where the gold bond wire meets the aluminum bond pad has progressed for 40 years. This is a solution process which cannot be stopped, and the thin layer of intermetallic compound is very brittle. So a little bit of mechanical / thermal stress can cause a crack to develop, and then this pin as an intermittent contact (or no contact) anymore.
For rare "unobtainum" ICs like the IWM it's better to leave them where they are. Note that even extraction from an IC socket causes enough mechanical stress to crack these nasty intermetallics.
Desoldering of a defective IC of course is a different story. It will go into the trash can anyways.
So at the moment I'm exploring ways how to a complete Liron card could be plugged into my test rig. I can test the hardware and the software out with a DISK II controller card. Then I need a Liron card with a Rev. "B" IWM as a loaner. I did some search for Liron card photos on the web and it seems that many have Rev "B" IWMs. Ironically, from the photos on the web, early 128 kByte Macs seem to have Rev "A" IWMs. Not sure if the photos on the web are a good representation of the statistical distribution of the IWM revisions on these cards, but at least we know that Liron cards with Rev "B" IWMs do exist and that a blindly bought 128k Mac may not contain the wanted Rev "B" IWM.
This Liron card interfacing work of course slows me down. The only schematics I found on the web for the Liron card are in Eagle PCB which first needs to be installed. They were put on github by Steve Chamberlin of BMOW fame, when he gave up on the 'Yellowstone' controller design for a while. This was in 2019. Now, five years later, he has completed the design and sells the 'Yellowstone' controller on his website. Which is good news because it substitutes the 'Liron' card, and this means that the 'value' of real Liron cards will fall. Maybe to the point where I can risk to desolder a Rev "B" IWM from one ?
- Uncle Bernie
P.S.: will send you the schematics for the test rig as a PM
Hi. Unlike your prediction the single burnt IWM I have does not work at all, not only its phase outputs are failing. It is generally high impedance on almost all of its pins. Luckily I have a -B variant that is in its PCB (device) but is removable and I am willing to try with it. I sent you a couple of questions directly to your email because I am worried about the way the data bus input is accomplished in your test circuit.
Hi fans -
unfortunately this project made no progress since I found out about the "dead zone" feature of the IWM Rev. B
Since I can't get my hands on a functional desoldered IWM Rev. B, but I might be able to do the experiment with a Liron card (without desoldering anything on it) my plan was to extend my test rig with a 50-pin slot socket and plug a Liron card in.
But for planning this I need a schematic of the original Liron card - just to inspect it and make sure how to drive the IWM on it correctly.
Now, Steve Chamberlin of BMOW fame has put his early (Y2019) work on reverse engineering the Liron card on github, see here:
https://github.com/steve-chamberlin/fpga-disk-controller
it contains his reverse engineered Liron schematic in the eagle/Liron - original subdirectory. So my grand plan was to
a) re-install Windows XP on an old machine
b) re-install Eagle
c) start Eagle, load the schematic, and print it.
Once I wasted several days with this effort, it turned out to be futile because the Eagle I have is a version 4 and not a version 6.6.0, so it would refuse to read the schematic. Eagle 4 does not even recognize the new file format they used later.
So all my precious RQLT was wasted on this futile effort and I still don't have the schematic.
So here is a humble request:
If anyone out there who follows this project, and has Eagle 6.6.0 and up, please download Steve's work from github and print the schematic of the original Liron card as a pdf, and send it to me.
I can't continue this project without that schematic, so any help would be much appreciated.
- Uncle Bernie
P.S.: There is an important lesson from this waste of time
If you decide to put something up for grabs ("open source"), either on github or anywhere else, always add a pdf of your schematics, Gerbers for the PCB, plain text files (editable with vi) for the source code, etc., so that interested people not having the license to proprietary CAD software still can read, print, and use it.
Anything using just proprietary file formats is useless (unless people would want to buy a license for said proprietary software), and when it's useless it's worthless and wastes everybody's time. Imagine what happens when somebody wants to read or use your work in 10, 25, 50 years and the proprietary software is unobtainium.
(as far as Eagle is concerned, no, I'm not going to buy yet another license, and worse, Eagle was sold to some larger outfit and these greedy bastards only offer time limited software licenses. I predict that this will be the death of Eagle. Which was a fine PCB layout software before it was sold. I bought a DIPTRACE license instead, because it is not time limited. I would never, ever buy any software which has a time limit or recurring license fees. Not even Cadence (it's too expensive for hobbyists or small businesses, but that doesn't matter because typical hobbyists can't do full custom IC design and small businesses could not pay the salary of an experienced full custom IC designer anyways).
Is this what you're looking for?
https://www.applefritter.com/files/2024/04/05/liron-schematic.pdf
Justin
In post #33, justinmc wrote:
" is this what you're looking for ?
https://www.applefritter.com/files/2024/04/05/liron-schematic.pdf "
Uncle Bernie thanks you !
And, to be honest, I did not look for it on Applefritter. What I did is a Google search which lead me to Steve Chamberlin's github site.
It's the same issue all the time:
- almost everything you want to look at or need is on the internet, somewhere.
- with the right search engine and the right keywords you can find it.
- use the "wrong" search engine or the "wrong" keywords and it won't be found, or buried after 100's of pages of bogus hits.
I had that happen to me more than once. Happens all the time !
Now I'm glad to have the schematic and from a quick inspection I see no reason why interfacing the Liron card to my test rig should be difficult.
- Uncle Bernie
Also, you can use their online viewer here: https://viewer.autodesk.com/
You need to "Sign up for free" though. But I was able to use the .sch files from the github repo you mentionned and generate the schematics.
You're welcome. I had eagle 9.6 already installed so I just printed the pdf and attached it to my reply to you. So it wasn't on here before 15 minutes ago.
Justin
In case they are not already known, there are several Apple documents about the IWM here:
http://mirrors.apple2.org.za/Apple%20II%20Documentation%20Project/Chips/IWM/Documentation/
In particular, it seems the notes by Bob Bailey (iwm_19831129.pdf) may pertain to questions about the read chain timing.
There is another doc by Bailey (apple2_IWM_INFO_19840228.pdf) that describes the one-shot multivibrator used to adjust to out-of-spec magnetic domain flips on the floppy disk.
Not sure if this is helpful at all, but the Lisa 2/5 used a derivative of the Apple ii Floppy Controller design, that I feel is closer in design to the IWM.
https://github.com/alexthecat123/Lisa-PCBs/blob/main/2%3A5%20I%3AO%20Board%20Schematic.pdf
Page 4 shows the floppy controller interface.
.... and I'm currently trying to figure out how the motor turn off timer is implemented. I have some strong evidence that the shift register like structures seen in post #16 are indeed a LFSR for the motor timer (as mentioned in post #18) and not necessarily double used for the read bit cell timing, which still is a mystery (I already had the pdfs mentioned by 'robespierre' in post #37). That read one shot can be modelled as a simple 4 bit counter, which, with the appropriate slow or fast clocks, does the same thing as the real IWM (and runs in lockstep with it, proven by the long run in post #19), so I'm almost there. But this run had limited the minimum distance between RDDATA pulses to 'sane' values which occur in real floppy disk drives using 4us bit cells. The exact mechanism of what can be observed for shorter (pathological) RDDATA pulse distances is still unknown, and very weird, because so far I can explain it only with yet another shift register to do the timing ... think an extension of the RDDATA synchronizer shift register. These pathological cases may go away with the Rev. B's added 'dead zone' logic. At least I'm sure that there will be fewer 'pathological' cases to implement.
The motor timer is yet another mystery. I'm quite sure it's said LFSR. Here is a bit of theory and my line of reasoning:
* binary counters are very expensive to implement in that NMOS technology, in terms of area and transistor count. LFSR based counters are the most efficient. Just as a fun sidenote, those of you who had the LED version of the Texas Instruments TI-30 calculator of the 1970s and marveled at the nice 'spinning' digit while it was calculating transcendental functions, you now know why: the program counter of that very primitive CPU was based on a LFSR and did double duty to drive the segments of the display.
* The IWM has an internal CLK which is FCLK/2, so we can assume the designers of the IWM used that, to shorten the LFSR by one bit.
* I ran some experiments after IWM power up and found that the motor timer would produce inconsistent delays (in terms of FCLK count) unless the motor control bit (L4) was set for more than a certain number of clock cycles (did not explore the exact threshold yet). The most logical explanation for this is that a set L4 just shifts '1's into the LFSR, to initialize it. Once the L4 gets cleared, the LFSR goes through its pseudorandom sequence until it reaches a value where some logic detects an end value. This then would be interpreted as a motor timer who has run out, and the motor would be turned off (unless the IWM is configured with motor timer disabled).
* There is evidence for this logic on the shift register. If you look at the IWM die photo at a magnification you can see that the (alleged) shift register has two horizontal parallel metal lines carrying the internally generated non-overlapping clock phases (PHI1, PHI2, typical for dynamic NMOS logic of the time) but there also are two further horizontal parallel metal lines which form two NOR gates with many inputs, coming out of the shift register. Here is an annotated snip of the die photo: (please note that I did not look to long at it, so take this with a grain of salt)
IWM_shifter.JPG
The "L" I drew in red is a poly line which forms a gate of a NMOS transistor, the source being grounded (the GND ? means "tentative") and the drain going to the summing metal line of one of the NORs. The usual depletion load transistor to pull up the line (and complete the NOR) is elsewhere.
This type of circuit can detect any state of the shift register. But I can't know if the shift register is negative logic or not, nor its length, nor its feedback terms (the polynomial). This would require a more thorough and time consuming analysis of the die photo. Which I don't want to do.
* Now, if we assume that the designers of the IWM wanted minimum transistor count, the most likely candidate would be a 22 bit LFSR with the polynomial P(x) = x^22 + x^21 + 1. This would require a two input XOR gate with inputs driven by the last and the previous last shift register stage, feeding back to the first shift register stage.
When initialized with all '1', this will reach all '1' after 2^22 - 1 = 4,194,303 shifts. This would be 8,388,606 FLCKs. But the motor timer in the IWM runs for 8,322,817 FLCKs. So they either use a different polynomial, or a different start value, or they detect the state of the LFSR at (or around) 4,161,408 shifts using said NORs. (There is an uncertainty of a few FCLKs give or take, as there may be additional clocked dynamic logic stages downstream, until the motor turns off).
* My take on this (unless I'm completely mistaken or in the woods, and I see ghosts, aka LFSRs which are not there) is that this complication over a plain vanilla LFSR wih P(x) = x^22 + x^21 + 1 is, most likely, intentional. You may ask why this makes sense (or not).
A LOGICAL REASON FOR THE MOTOR TURN OFF TIMER BEING COMPLICATED / OBFUSCATED
Well, we all know that Apple was plagued by copycats since the Apple II came out. The Apple II, in its original TTL-only based form, was ridiculously easy to copy. The Taiwanese knock-off artists just had to desolder all components from an original Apple motherboard (or slot card) and then digitize it to get the films to make clones of the PCB. The contents of the ROMs and PROMs was copied verbatim. The rest of the components were industry standard ICs. Bingo, after maybe two weeks of reverse engineering work invested, they had a Apple II clone fabrication line up and running, and exported the copycat product to all markets wordwide (Apple was really hurting from that). I remember that back in the 1970s and early 1980s, most "Apple II" I saw were Taiwanese clones, and not the real deal.
So it is quite obvious that Apple sought to put a stop to that, and the result was the Apple IIe and IIc with the MMU, IOU and IWM full custom ICs. And it was (and still is) quite common for full custom ICs to have a few little added complications in them which are not documented anywhere, down to outright "boobytraps" (like in the early Z80) which render any attempt to copy the mask layout futile. So why would the IWM designers not spend a few extra transistors to make the motor timer less easy to duplicate. Software made by Apple could check the number of CPU cycles needed until the timer expires, and detect any IWM copy where the timing is off by a few clocks.
I don't know if my somewhat paranoid conjectures about the motor timer's other potential purpose (detect knockoff IWMs) are justified, and if such Apple software (dealership / service center diagnostics ?) checking the motor timeout that stringently indeed exists. But it would be the only way to check the authenticity of an IWM. And in my career as a IC designer I have seen things in terms of IC "copycat poisoning" you woudn't believe. This may also explain why some Chinese knockoffs of Western ICs can't ever reach the specs of the originals. Even their voltage regulators suck. Their opamps are catastrophically bad, too. Which does not hinder Chinese IC counterfeiters to re-label this trash with the type number and manufacturer logo known for excellent opamps with stellar specs. These counterfeits are so good looking that they may slip through the incoming inspection of OEMs and end up on their circuit boards. Which then fail tests. And the "bad" ICs ended up at our company's QA department. Customer claimed it failed our datasheet spec. Sure enough, QA found a 741 type die in it, and a poorly performing 741 at that. No wonder the fake could not meet our specs (for our super high performance opamp, selling at boutique prices).
As we have seen with the Bulgarian clones of the Apple IIe using knockoff MMU and IOU ICs, use of full custom ICs for protection against copycats does not protect against nation state actors, especially Communist ones, where the work of their slaves (everybody other than inner party members who live in luxury are slaves) is essentially free for the parasitic government who enslaves them (leading to the slogan: "they pretend to pay us, we pretend to work", which, alas, is also coming to "Western" countries in which wages can't keep up with inflation. Take this just as a warning for things to come.).
THE MYSTERIOUS IWM TEST MODE
Part of that complexion is the software programmable TEST mode of the IWM which shortens the motor timeout to ~65,570 FCLK cycles. It also may do funny things with reconfiguring some other logic in it. I don't think they have overdone that (in terms of added transistors), because all this costs money, but it's another thing to deal with.
CONCLUSION
Faithful reproduction of fully custom ICs is not trival, even for simple ones like the IWM. But there are reasons to believe that a faithful reproduction that is clock cycle exact in all functions may be justified to avoid nasty surprises. I've naively thought I could use a RC time constant to do the motor off timing delay, but now I have second thoughts. And a 22 bit LFSR made with PLDs is costly. FPGAs fare better but once a real binary counter is invoked by the HDL, it gets costly, too, at least for small FPGAs. Maybe it should be an optional feature, just in case: add the exact delay IC as an option, and only if needed. But I also think that no third party software (other than Apple's own software) would try to authenticate that the IWM is the original. Because third party software manufacturers could care less if their software runs on authentic hardware or on a knockoff.
So far for today.
Comments invited !
- Uncle Bernie
Hi fans -
this work has been stalled (and still is) because I waiting for getting a Rev B IWM in my hands. Being bored I decided to write up my findings on the inner workings of the IWM which explain why the IWM works with both NMOS and CMOS 6502. There has been a confusion / controversy over this for a long time, as evidenced by many posts on the topic. I'm now able to solve the mystery (for those who are interested).
First, read mode never has been questioned. On the software side, it's a just a loop which checks for MSB of the disk data deserializer shift register getting set. If it is set, the shift register contains a complete disk byte (or disk "nibble", confusing Apple terminology which I want to avoid - it came from the very first DISK II system where any "disk byte" only contained 4 bits = a "nibble". Alas, they kept this terminology when the improved encoding could do 6 bits per "disk byte").
THE READ MODE BUG
Not questioning read mode however was a fallacy, as it turned out. The IWM has a bug which renders its synchronous read mode (the default after power up) too unreliable for the NMOS 6502. Apple document 343-0041-B (same number as the IWM Rev.B) also known as the Rev 19 IWM spec) has a warning about this bug on page 2. It boils down to the MSB being handled differently from the other 7 bits, probably a leftover from the "port mode", and so the read timing requirements of the NMOS 6502 get violated occasionally. The document also claims that the 65C02 has regenerative feedback on the internal data latch and hence, does not suffer from the bug. I think this is NOT the proper way to swipe such a problem under the rug. But all Apple IIc in the field prove that their "solution", or better, "non-solution", works. All Apple IIc use a 65C02.
THE WRITE MODE PROBLEM - and how they solved it
It was believed by many Apple II aficionados that the IWM would not work in write mode if the 65C02 was replaced with a NMOS 6502. As it turned out, this was a fallacy. Because the IWM designers found a subtle trick to make it work with both the NMOS 6502 a n d the CMOS 65C02.
This is what was suspected: the DISK II floppy disk controller critically depends on the "phantom read" cycle of the STA abs,x instruction to make the write mode work. In the "phantom read cycle", which is CPU cycle #4 if the instruction's opcode fetch is CPU cycle #1, the 6502 produces a read to the effective address EA without accounting for a potential page crossing. This gives the 6502 enough time to bump up the EA to the next page, if the index X caused a page crossing. In the next cycle, CPU cycle #5, the write occurs and the 6502 drives the data on the data bus. Where the state machine in the DISK II immediately grabs it from and loads it into the shift register, without even looking if it was a write cycle at all. In fact, in that CPU cycle #5 the DISK II does not even look if it was addressed. Because it "saw" the L6 control bit to be set in CPU cycle #4 and consequently assumes that the very next CPU cycle is the one where the data byte to be written occurs on the data bus. Based in this assumption, it fetches it blindly. The very next instruction in the RWTS usually turns this "load" mode off again by resetting L6. The state machine ("Woz machine") now is in shift mode and will shift out a bit each 4 CPU cycles, for a total of 4 x 8 = 32 CPU cycles. At exactly that point, in cycle #33, the next STA abs,x must have provided the next "disk byte" to be loaded into the shift register. For which L6 must have been set to get into LOAD mode (as described before). If L6 was not set, the state machine will write two more bits to the floppy disk, both zero. This makes a total of 40 CPU cycles and produces the "SYNC" bytes on the floppy disk.
WHY THE PHANTOM READ CYCLE IS NEEDED IN THE DISK II
The problem is this: the state machine is clocked with a gated Q3, and the gate is a NAND. So for each CPU cycle, the state machine gets two clocks, but, alas, the active positive clock edge from the NAND occurs when Q3 falls, close to the end of the PHI1 and PHI2 phases of the CPU clock (actually, three 14M cycles before the end). For a timing diagram, refer to the "SYSTEM TIMING" section of the "Apple II Reference Manual" of 1979, which came with every Apple II.
In the state machine clock of PHI1, the state machine can't "see" the upcoming change of L6 because the 74LS259 which contains L6 is gated by PHI2 (via the device select line, pin #41 of the slots). So it can "see" the change of L6 no sooner than the PHI2 phase, and it can change state when Q3 fallsnear the end of this cycle. But this is too late to tell the shift register to load a data byte from the bus ! The control signals to the shift register can be changed to LOAD on that state machine clock, but the LOAD can only happen on the next clock of the state machine (and shift register) and this happens towards the end of the following PHI1 cycle, when the data on the data bus is long gone (or only is 'bit vapors' which disappear in the mist ...)
The "Phantom read" cycle, CPU cycle #4, comes to the rescue. It will set the L6 in its PHI2 phase, the state machine will advance to the load state and set the shift register up for the actual load. The following CPU cycle #5 is the write cycle and the shift register will be loaded exactly at the time when the wanted data is present on the data bus.
This is the most simplified description I could cook up. The actual process is a little but more complicated, because once in write mode (L7 set) there is no provision to synchronize the state machine to the 32/40 CPU cycle sequence. Instead, the write mode is always entered from the "write protect sense" mode and this, with proper CPU cycle counting, will synchronize the state machine to the write loop. It is of utmost importance that from that point in time on where the write mode is entered, a l l the write cycles from the STA abs,x must be e x a c t l y 32 CPU cycles or 40 CPU cycles apart. Otherwise the software side will lose synchronization with the state machine happily huffing and puffing along and picking up whatever random data is found at the data bus when the state amchine reaches its LOAD state. Needless to say, this is catastrophic because once synchonization is lost, only trash will be written to the floppy disk. But as long as no interrupt intervenes, the system works robustly.
NO PHANTOM READ IN THE CMOS 6502
Alas, the CMOS 6502 fixes several "bugs" in the NMOS 6502, and one group of these "bugs" is related to the handling of page crossings. I think it's not really "bugs" but a "side effect" which is known and documented and rooted in the limits of what the NMOS 6502 designers at MOS Technology could do at the time being (Y1975). But in the NMOS 6502, any adressing mode with page crossing causes access to a bogus address, and the designers of the CMOS 6502 deemed this to be a threat. This has to do with flags in peripheral ICs which may be reset by access to certain addresses, and so some users may have fallen into that trap and their system didn't work as expected. The fix is easy - avoid page crossings for such operations - but programmers must read the manual and if they don't, they are not aware of the possible pitfall. I know the 6502 well, since it came out, and despite of this, I fell into such a trap myself: when designing the improved ACI for the Apple-1, it happened that the RTS at the end of my added code page would do the prefetch of a bogus opcode (it does not execute) in the next page, which due to the minimalistic address decoding would toggle TAPE OUT, ruining the recording. This is a "side effect" or "bug" that is so deeply rooted in the whole 6502 system architecture that even the CMOS versions did not fix it. Oh, these were the times where CPUs were simple ... today such behaviour would cause segmentation faults galore.
So to defuse the perceived threat, the designers of the CMOS 6502 eliminated the "phantom read" by forcing cycle #4 of a STA abs,x to never access a bogus address. Instead, they made the 65C02 to output a valid, guaranteed harmless address instead. Such as the address of the opcode of the STA. Or the address of the opcode of the following instruction. Different flavors of 65C02 may be different here, I didn't look all of them up, there were too many manufacturers and not all datasheets are available or show the cycle diagrams. But whatever solution they adopted, the "phantom read" is gone and the DISK II state machine will not enter the LOAD mode at the correct time. Unless the RWTS software is changed. But this code then would not work with a NMOS 6502. Note that I don't say it cannot be done. Because a 6502 program can detect if it runs on a NMOS or CMOS 6502 and then use the approporiate write routines with different timing to make the DISK II state machine do the right thing. It's a very slight adjustment, just at the entry of the write mode.
HOW THE PROBLEM WITH NO PHANTOM READ WAS FIXED IN THE IWM
Well, first, the IWM runs its inner state machines with the 7M clock (7 Mhz) so it can change states more often per CPU cycle phase. This is true even in the synchronous write mode which is the power up reset default configuration of the IWM, despite this configuration has the added twist that the disk write bit cell timing is controlled by the Q3 clock input. So you can't use this configuration to write 2us bit cells, only 4us bit cells are possible. But here is the trick they used to make the "phantom read" optional: the IWM never loads its shift register directly from the data bus. Instead, it has a write data hold register. And whenever the "data register" address of the IWM is accessed with address line A0 = 1, which is a write command, the data byte from the data bus will be loaded into the data hold register first. While the shift register is still shifting out data ! According to the IWM spec, here also is a "window" in which the data hold register accepts data, and this is two CPU clock cycles wide. The IWMs inner state machine, when reaching the LOAD state, after 32 or 40 CPU cycles, will load the shift register from that data hold register, and not from the data bus. The contents of the data hold register is whatever data byte the CPU wrote last during the two cycle long write window. This relaxes the critical timing over the DISK II and allows the IWM to work with both the NMOS 6502 and the CMOS 65C02, regardless if the "phantom read" cycle is there or not.
CONLUSION
The mystery about the IWM working or not working with the NMOS 6502 vs. the CMOS 65C02 has been solved. It turned out that it's the opposite of what some people (including me) had suspected:
It turned out that the IWM synchronous write mode (the power on reset default) works well with both the NMOS and CMOS 6502, because its designers have accounted for the "phantom read" cycle being preset or not. The IWM works in both cases.
But it also turned out that the IWM synchronous read mode (the power on reset default) does not work reliably with the NMOS 6502, due to a bug in the IWM which Apple did not even correct in the IWM revison B.
So the perception of the problem shifted to losing the synchronous read mode (vs. the suspected / debunked loss of the synchronous write mode) when the IWM is used with a NMOS 6502.
Note that Apple only tells us that the synchronous read mode of the IWM just gets "unreliable" when a NMOS 6502 is used, so it may work under certain circumstances and may not work under other circumstances. I intend to do some experiments on this (plugging various NMOS 6502 in an Apple IIc) to find out more. Alas, the firmware ROM of the IIc also must be changed to a version which does not use the extra instructions of the CMOS 65C02..
Also note that all the asychronous read and write modes of the IWM work with any flavor of the 6502. Alas, despite of the holding registers for both read and write mode, and a further relaxed timing over the sychronous modes, the FAST configuration with 2 us wide bit cells (double the storage capacity, equivalent to the step from FM to MFM in industry standard floppy disk systems) has only 16 us per disk byte and a 6502 running at 1 Mhz can't handle that unless coding tricks are used which increase the RWTS code space to a point where it becomes nonsensical to do that.
So far my intermediate report on the IWM reverse engineering findings. I will add more once I have a Rev B. IWM at hand. I'm not motivated to faithfully reproduce the weird effects seen in the IWM Rev A read mode when the distance of the RDDATA pulses gets far too low. This is a useless dirt effect which - hopefully - goes away with the Rev. B having the added read pulse windowing function.
- Uncle Bernie
Hi Fans -
after 4+ weeks of this project being on the back burner because of lack of a Rev. B IWM, the postman finally brought me a Mac Plus motherboard which was advertised on Ebay as "non working", and so it cost only $25 ... but with postage and tax it was more than $40. Damn. Under these conditions, the famous SEARS mail order house never would have gotten out of their startup phase more than 100 years ago. (I'm still looking for a non-musty Y1968 SEARS catalog to marvel at the prices back then).
I desoldered the IWM Rev B. and it turned out to be working ! So I was lucky. The sad rest of the Mac Plus motherboard looks like this:
PDRM4642_med.jpg
Yellow circle is where the IWM was. Now I have to wonder what I shall do with that motherboard. The circuitry looks deceptively simple, mostly a bunch of TTLs and PALs around the 68000 CPU and the DRAM strips. So it could be possible to reverse engineer that one, too. But for the moment, I'm occupied with the IWM reverse engineering and the further development of the Replica 2e project.
Now having a Rev. B IWM, a lot of the headaches with the Rev. A IWM went away, thanks to the "read pulse windowing" which was added to the Rev. A to make the Rev. B. All the weird and quite intractable behaviour for too closely spaced RDDATA pulses now is gone. Because thanks to the read pulse windowing, these outliers just get ignored.
However, I do see some other differences vs. the Rev A IWM, which throws my read mode RTL which worked fine with the Rev. A out of lockstep with the real silicon. This may be a small change the designers of the IWM made somewhere in the clocking scheme, or it may have been a larger change to the whole RX state machines.
I'm currently working to figure out what is going on there.
A bug which they did not fix is the lack of a clear init state for the internal clock divider, which, in all the SLOW modes, divides FCLK by 2, which then yields the 4us wide bit cell timing. This clock divider is not initialized by a RESET. Which means that all the internal activity in terms of state machines in SLOW mode may occur in two different incarnations, odd FLCK cycles or even FCLK cycles. I consider this to be a bug, because it means complications for the wafer sort and end test involving test vectors. While the more expensive testers can be coded to look for events and then figure out how to proceed, these increase test costs per second. So semiconductor companies try to have the cheapest testers on the manufacturing test floor, which, at laest back in the 1970s, could not do that. They would blindly apply a rigid set of test vectors and ink the DUT as bad whenever a difference was found. So I wonder how they got around this issue in mass production. Maybe their testers could do a branch. Maybe they relied on a power up state. Who knows. For me it's a major obstacle because to get the IWM into a known state, I have to clock it and apply a RDDATA input pattern to get it out of a "hang" state that comes from waiting for a RDDATA pulse when the MSB has been set - this is normal behaviour, and is needed to make the synchronous read mode work. The problem is that nothing can be seen when reading the IWM data register until it gets out of this state. And only then it can be found out in which internal clock phase the IWM is. And so far I've not found a good way to "correct" that phase by adding another FCLK. Because this seems to advance some other states within the IWM, and so the outcome (the total internal state) is never the same. Which then may throw other things off track down the road. At the moment I "cheat" by synchronizing the FCLK divider in my RTL to the observed changes of the IWM read data register, while keeping the RDDATA synchronizer shift register blanked. This allows to acquire lockstep of the RTL and the real IWM. It's an ugly kludge. I would prefer they had wired the RESET pin to at least this one clock divider flipflop. But I have a hunch that doing so might have foiled their RESET scheme for the rest of the circuit, which may depend on the output of this divider toggling. This is why "asychronous reset" inputs are allowed to exist even in fully synchronous systems. The whole issue of getting a digital system in a known RESET state is not trivial, especially if a clock divider is involved. For my CPLD implementation I will run all state machines at FCLK with no such divider for SLOW and FAST. And, of course, I will synchronize the RESET input to FCLK before allowing it to influence any state machine. IMHO this is the proper way to solve the problem with partial reset of the IWM. Somehow the logic designers at Synertek didn't do it that way.
So far for today !
Stay tuned !
- Uncle Bernie
We all are tuned bring it on :-)
Hi Fans -
it's been now about a week since I have the Rev. B IWM and a lot of progress has been made based on it.
Alas, I had to find out that they did change a lot of things in the read channel (from IWM Rev A. to Rev. B) so I had to rework my whole read channel RTL. In the process I discovered a few oddities:
- in synchronous read mode (mostly used as a legacy DISK II compatible read mode, example: Apple IIc) the latency in the read data path is four FCLK cycles longer (from a RDDATA event to output change seen when reading the IWM), compared to the asynchronous read mode. This is weird. From a digital logic design standpoint it makes no sense, because it should be the opposite (synchronous read mode if LATCH = 0 can make the read data hold register transparent, so no further clock delays would occur between a shift register change and the output seen when reading the IWM data port).
- in asynchronous read mode, the time duration of the MSB staying active after a read is NOT always 14 FCLK cycles as described in the IWM patent and in the Rev 19 IWM spec (which was 'leaked' and can be found on the web). In some cases, it's 15 FCLK cycles. I've traced that back to a RDDATA event (a "1" bit read) during the MSB timer running. But it's not very consistent. It also seems to be dependent on which state the bit cell state machine is in, when the RDDATA event occurs.
- motor turn off delay is inconsistent. It is only consistent after the IWM has been powered up the first time (and presumably, the internal state is all zeroed out). Whenever there has been any read or write activity, the motor turn off delay is shorter. (It is further shortened if the TEST mode bit is set). Note that the Rev 19 IWM spec also does not specify a fixed number of FCLK cycles for motor turn off, but a range.
So what is the possible explanation for this weird behavior ? The most likely explanation is that the digital designers at Synertek used some foul tricks to save logic and active area by using the same counters/flipflops/sub-state-machines for multiple purposes, depending on the configuration of the IWM. For instance, the motor turn off timer has 23 bits (most likely, a LFSR, as mentioned in posts above) but these 23 shift register stages just sit around uselessly until the motor timeout is called for. So why not use the lower 8 bits i.e. as the serializer/deserializer shift register ? I'm not sure yet if they actually did that, because the design and implementation of tests to find out is time consuming. But this could explain why the motor turn off delay depends on the previous history of what the IWM did in terms of reading and writing.
The inconsistent duration of the MSB clear delay (from a read data with MSB set to MSB cleared) of 14 or 15 FCLK cycles could come from a trick which I suspect to be there, but again, due to the time expenditure needed to design and implement tests for that trick, not proven yet. Here is a line of reasoning:
They have to count 14 FCLK cycles. Which means a 4 bit counter. But what they do have is a 3 bit counter which in the asynchronous write modes counts the 8 bits to be shifted out. It so happens - if they repurposed this 3 bit counter to do the MSB turn off delay in asynchronous read modes - that they could set the counter to 111 and then count it down to 000 at which point the MSB would be cleared. They might use a state bit of the bit cell state machine to clock this 3 bit counter. As long as no RDDATA event happens during the count down, all is fine, and they get the 14 FCLK MSB clear delay as specified in the Rev. 19 IWM spec (and spelled out in the patent). But if a RDDATA event causes a change in the state sequence of the bit cell state machine, then one clock of the timer count down may get lost, and the MSB clear delay is prolonged to 15 FLCK cycles. Which in the light of the Rev. 19 IWM spec must be considered a bug in the implementation.
All the above of course is just based on unproven conjectures of mine, which are meant to make sense of the observations in silicon.
More time is needed to design and execute experiments which could prove these conjectures.
So you can see that reverse engineering an onknown blob of silicon is not as straightforward as it seems. Modern (that means: 40 years later) IWM implementations based on VHDL or Verilog would be 'clean' in the sense that the logic synthesis software would not dare to share / re-use different state machines in the design for other purposes. There would be a 23 bit motor off delay counter, a 4 bit MSB turn off delay counter, a 3 bit BIT counter, and - likely, unless the designer explicitely writes it differently - two 8 bit shift registers, one for reading, one for writing. Nothing would be shared or re-used. Wasting lots of flipflops ! (but at 5nm CMOS, noone cares ... the whole IWM would be smaller than a single bond pad, much smaller, I did not try, but maybe it would look like a speck in some corner of a bond pad). This, BTW, is the reason why there are functions which simply cannot be produced on such advanced technologies: you would end up with a huge bond pad frame and a very tiny speck of actual logic somewhere in the center, and most of the die area would be empty. This is the reason why trailing edge process technologies are still being used.
For CPLD implementations of the IWM, being wasteful with macrocells is not wanted. The IWM might not fit in one of the smaller CPLDs able to run on 5V supply voltage. The later, larger, CPLDs often are only fit for 3.3V, which brings further complications, and hence, is unwanted.
For my reverse engineering method, this weird behaviour must be understood and reproduced faithfully, because otherwise, I could not use my prorietary software tools for the automatic compare phase of the work. Which is bad. Because it forces me to analyze and duplicate these anomalous behaviours despite I know that this would not be needed for a clean slate design. So the work will take longer than expected, sorry.
So far for today. If any designer of the original IWM reads this, you may send me a PM because your knowledge of these tricks to save logic would greatly speed up my work. (I hate to waste my precious, irreplacable RQLT on chasing down such useless phenomena. Or worse, wild goose chases).
Comments invited !
- Uncle Bernie
Hi, U.B.! I just stumbled upon this thread this morning, and what a riveting read! I'm eager to have a proper replacment for these chips as this comes up often when repairing vintage hardware. Though, your use case is of course intriguing as well. I'm following along closely, even although some of it definitely sails over my head.
I did stop by to ask, and perhaps I've missed that you mentioned it, have you confirmed the Rev B chip you've gleaned from the Macintosh board functions correctly in a working machine? The one you pulled your Rev A chip from, perhaps? I may be talking nonsense, but I have seen other silicon fail in ways that makes it behave incorrectly in only some modes. This might account for some of the baffling differences you're observing.
Thanks for the wonderful work you're doing. Exciting!!
In post #44, 'RadioC1ash' wrote:
" ... have you confirmed the Rev B chip you've gleaned from the Macintosh board functions correctly in a working machine? The one you pulled your Rev A chip from, perhaps ? "
Uncle Bernie comments:
Oh, I could not test the salvaged Rev. B IWM in a Mac, because I have no Mac. This will be a problem later on when I have the substitutes. I can only test them in an Apple IIc from which my Rev. A IWM came. And the Apple IIc does not use all modes of the IWM.
BTW, I did not test the Rev. B IWM in the Apple IIc, as this was not necessary. At that point in time my work has had advanced to a point where my RTL model was able to run in lockstep with a real IWM in my test rig in most modes (except one group, more on this later). So all I had to do is to plug the Rev. B IWM into my test rig and run my software. As it exercises all nooks and crannies of the IWM behaviour (as far as I knew at the time about them) it's a much better test for the IWM being OK / undamaged than just trying in an Apple IIc. The 'read window' was ignored at the time which is easy enough to do, it's just changing the minimum distance between RDDATA pulses to a value high enough to never generate a pulse while the IWM is in the 'read pulse ignore window' which only exists in the Rev B.
The current state of the work is that I have RTL models which run in lockstep with the real IWM in all modes for many millions of FCLK cycles with no flaw, including the exact behaviour of the Rev B 'read window'. Except for a remaining imperfection with the ASYNC LATCHED modes. The RTL models do match in ASYNC LATCHED modes under the following conditions:
- DEV always asserted (permanent read of the IWM)
or
- DEV deasserted briefly before the read shift register MSB is set (a disk byte is complete) and asserted just after the transfer of the shift register contents to the read hold latch happened.
These two cases work fine, so my state machine controlling these transfers can't be off all too badly from the mystery what is hidden inside the IWM. In other words, I'm close to have a perfect understanding of the IWM.
But there is another test which fails in ASYNC LATCHED and this is involves what I call 'spaced read cycles'. In this case, DEV is deasserted most of the time, but asserted for N FCLK cycles every M FCLKs, where M is preferably a prime number, so the 'event' will 'slide' over all internal states of the IWM, which always are cycles of 14, 16, 28 or 32 FCLKs. When DEV is asserted, the IWM read data hold register is read, and the result is compared to the RTL. Alas, every few thousand FCLKs there is a discrepancy in the MSB behaviour between the real IWM and my RTL model. The rest of the bits is good. So far I have not collected enough data to find out what is going on there. I suspect that the designers of the IWM may have done something to prevent MSB changes from '1' to '0' at certain times. The Rev. 19 IWM spec which can be found on the web says this:
" In asynchronous mode the data register will latch the shift register when a one is shifted into the MSB and will be cleared 14 FCLK periods after a valid data data read takes place (/DEV being low and D7 outputting a one from the data register for at least one FCLK period) "
Source: Sheet 5 of 14, Apple document 343-0041-B
But on the test rig I can definitely see that this is not the case. The real IWM will occasionally extend the period during which the MSB stays set to more then 14 FCLK cycles. This is not related to the similar extension when a RDDATA pulse hits the 'blackout' section of the read window while the 14 cycle MSB timer runs, because it also can be seen if the RDDATA pulse spacing is increased such that the 'blackout' section is never being hit.
This is the last riddle I need to solve before I can go to the next step of the IWM reverse engineering process, which would be synthesizing my logic and putting it into PLDs to run these in lockstep with the real IWM (much the same procedure which I did with the MMU and IOU substitutes, the PCB had the originals on them, too, and there was logic to compare the output states of the PLD substitutes with the original ICs).
I do have a hunch why they may have added this extension of the 14 FCLK period under certain circumstance which yet are unknown, and this may be related to the read data setup time of the 6502 or 68000. My reasoning for this conjecture is as follows: if the MSB timer starts with the MSB being read as a one (as described in the IWM spec) and always counts 14 FCLK cycles without any further looking at the DEV input, before it turns off the MSB, then what if - for some unknown reason - the read cycle still is active at that point, but towards its end ? Such that a turn-off of the MSB would violate the 6502 or 68000 read data setup time at the read cycle end because no data bit is allowed to change at that critical point in the read cycle ?
A counter argument against this conjecture is that so far I did not find any scenario where this could happen. Neither 6502 nor 68000 read cycles take 14 FCLKs.
So maybe I'm chasing phantoms here. But the undeniable fact is that the real IWM sometimes extends the 14 FCLK cycle period, and unless I find out how and why and can put the exact same behaviour into my RTL model, I'm stuck. Because the next step in my proven reverse engineering method involves automatic equivalence proofs using ATVG CAD tools, which do not allow coding of any exceptions in the test vectors. They stupidly look at the netlist and generate vectors for its perceived functionality, and any functionality that is not in the netlist but is in the 'device under test' will most likely cause a test vector fail, which would expose the difference being present. This is a 'no-go' for the process. No differences allowed ! (For the mathematically inclined who know the theory of state machine equivalence, the proof is not perfect if one of the two specimen is a black box for which the netlist is not known. A 100% perfect proof is only possible if both netlists are known. This is the case for logic synthesis tool chains who morph and re-time state machines to meet timing constraints, there will be a netlist 'before' and a netlist 'after' the changes and the equivalence prover can work on them. But in case of reverse engineering where one netlist is not known, all what can be proven by a slightly modified algorithm based on one netlist only is that the exact functionality of that netlist is present inside the 'real' silicon (if a 100% coverage test vector set runs without fail - for a 98% coverage test vector set there is a 2% uncertainty of uncovered terrain). But the limitation of this approach is that if the 'real' silicon has a superset of the functionality being known and implemented in said netlist, then these 'extras' will not necessarily be discovered by a test vector fail.
The bottom line here is that my reverse engineering method has some known limitations and other then the 'stupid' but very meticolous automated tools some human intellect and judgement is required to assess the validity of the results. Back in the late 1980s and early 1990s this reverse engineering method was adequate for the complexity of the simpler ICs of the time, but nowadays it's of course hopeless, given the fact that modern ICs contain whole worlds (so to speak as a metaphor) and you would need to probe them from the outside to draw all the maps and all the cities and all the buildings down to the last detail in every room in every building. So theft of the design database by industrial spies nowadays is the only financially viable way to 'reverse engineer' one of these complex 21st Century ICs having more transistors than humans on the planet. Which ain't honest reverse engineering as we used back in the day, 30-40 years ago, but just espionage and theft of IP. Decline of civilization ...
Anyways, I'm continuing to work on this last riddle. If anyone out there knows more about that FCLK count extension in ASYNC LATCHED mode, let me know, you may use "send PM" for initial contact, and I will provide you with a secure communications channel to discuss the details.
- Uncle Bernie
Since a while, I had a RTL model for the IWM which runs in lockstep with the real IWM in the test rig in all modes, reading and writing. There are a few fine points which I could not fathom, though, so this project was on the back burner for a few weeks. I have other things to do than electronics - such as figuring out why my Italian Parsley gets white spots, which under the microscope turn out to be no fungus, but just dead specks of leaf, devoid of chlorophyll / plant juice. I suspect tiny sucking insects, but resisted to 'nuke' everything with Malathion and the like, as professional growers would do, because they want to sell perfect looking Parsley but of course don't intend to eat it. For me it's the opposite: I want to eat it, and not sell it (see why most of our 'food' is poisoned for profit ?). But back to the IWM.
For a long time I did not look into the motor off timer because under certain conditions it gave me inconsistent results (different motor off delays), as mentioned in an earlier post. Only when NOTHING was done with the IWM just after power up and RESET, then I would get consistent motor off delays of 8,322,817 or 8,322,818 FCLK cycles. The one FCLK uncertainty hints that the timer actually is clocked with FCLK/2, and depending in which state of this clock divider the "MOTOR OFF" command is given, it takes one FCLK more or not.
This is how I approached the reverse engineering of this timer. I looked at the die photo and saw a shift register structure which appears to have two parts, a 15 bit part, and a 7 bit part, for a total of 22 bits, see here:
IWM_motor_timer.JPG
This shift register has no parallel inputs and no interruptions in the regular chain of cells, which hints at a so-called "Fibonacci LFSR". For those not familiar with LFSRs, but interested, here is a link to the relevant wikipedia page on LFSRs:
https://en.wikipedia.org/wiki/Linear-feedback_shift_register
I also saw a number of paralleled NMOS transistors whose gates are connected to each shift register cell. These gates are marked in the above snippet of the die photo. This seems to be an "all zeros" detector as it is, in the end, a NOR gate. LFSRs are known to get stuck in an all zeros state, so this gate brings the remedy. And is another hint that the motor off timer of the IWM was implemented as a LFSR. If the TEST mode bit gets set, the motor off delay will shrink to 65570 FCLK cycles, which would explain the split of the shift register into two sections. At least there is some random logic at the split location, so 'something logical' happens between these two sections: they are not directly connected.
Now, based on this conjecture, how to proceed ?
We can immediately see it's not a maximum sequence LFSR, because this would be 2^22 - 1 = 4,194,303 counts. But we one see 8,322,818 / 2 = 4,161,409 counts. Which, BTW, violates the 'spec' of the IWM as can be found on the web as Apple document / drawing number 343-0041-B from Y1983. To cite sheet #10:
" If the 1-second timer bit is a zero then the enable (/ENBL1 or /ENBL2) selected by Drive-Sel will be held low for 2^23 +/-100 FCLK periods (about 1 seconds) after the LMotor-On state bit is reset to zero. If the latch mode bit is set the timer is not guaranteed to count up to 2^23. "
Obviously, the "missing" ~65790 FCLKs are a gross violation of that 'spec'.
What is more disconcerting is the note about all bets being off when the LATCHed mode bit is turned on. This confused me for a long time and has not been fully sorted out yet, but I have a conjecture what is the reason for that behaviour.
HOW TO CONSTRUCT NON-MAXIMUM SEQUENCE LFSRs
Now, while the above wiki page is very helpful by telling us the polynomials for maximum sequence LFSRs of a given length (N stages, the maximum sequence is 2^N - 1 counts), it tells us nothing about polynomials for less-than-maximum sequences (also called 'periods' of the LFSR). So how to we find the polynomial which makes 4,161,409 counts in a N = 22 bit LFSR ?
By trial and error you will never find it. But here is the approach I used:
1. Find the prime factorisation of the counts. I used this online calculator:
https://www.calculatorsoup.com/calculators/math/prime-factors.php
It gave me: 7 x 31 x 127 x 151 = 4,161,409
Note that the factor '127' is the maximum sequence length of a 7-bit LFSR. Using the table from the wiki page of the first link, we get the polynomial:
P1(x) = x^7 + x^6 + 1
for our 7-bit section of the 22-bit LFSR. So only the 15 bit section is unknown yet.
2. Now let's see what the 15 bit section must do: 4,161,409 / 127 = 32767. Which happens to be 2^15 - 1, which is the maximum length sequence of a 15 bit LFSR. We pick the polynomial for the table on the LFSR wiki page:
P2(x) = x^15 + x^14 + 1
3. Calculate the final polynomial P3(x) = P2(x) P1(x) ... but pay attention that is done in a Galois Field, more precisely over GF(2). This is a special arithmetic on binary numbers (the '2') where the 'add' operation involves no carries into the next higher bit. It is also used for Cyclic Redundancy Checks (CRC).
Here is a trick I've worked out (or dimly remembered from many decades ago, I can't say):
P1(x) has the exponents: 7 6 0 (x^0 = 1)
P2(x) has the exponents: 15 14 0
Instead of tediously multiplying out the polynomials, we just add up all combinations of exponents which would happen in the tedious multiplication:
7 6 0
-----------------
+0 | 7 6 0
+14 | 21 20 14
+15 | 22 21 15
The exponent '21' happens twice, which is an even number, it's XOR is 0, so it's deleted. (Remember the non-carry rule for GF(2) mentioned above).
The resulting polynomial is:
P3(x) = x^22 + x^20 + x^15 + x^14 + x^7 + x^6 + 1
Which leads to the hexadecimal number describing the taps of 0x286060 that can be plugged into any suitable LFSR 'C' code. And it turns out we get exactly 4,161,409 counts. Times 2 is 8,322,818 FCLKs, which is the delay we sought. Note that this is not the only existing polynomial for that period in that 22 bit shift register length. There are many more. I quickly hacked together a small "C" program which systematically tries all possible products of 7-bit and 15-bit polynomials, the tables for which I stole via this link referenced in the wiki page on LFSRs:
https://users.ece.cmu.edu/~koopman/lfsr/
This program found more than a dozen polynomials which need one less XOR feedback terms than the one above.
But this is only an academic exercise as I do not intend to waste 22 macrocells (plus the ones needed for the 5...6 input XOR) in the final implementation of my IWM substitute. Alas, as long as I want stuff running in lockstep and automatic state machine equivalence proofs, I can't take such shortcuts. Replacing the 22+X macrocells with a RC based timer would allow me to use a much cheaper CPLD.
THE LATCHED MODE MYSTERY
Finally, what is the thing with the LATCHed mode compromizing the motor delay timer ? Ain't that weird ? I did not have the time to get to the bottom of this effect, but I do have the conjecture that the motor timer LFSR does double duty under certain conditions. I once had the conjecture that the read data shift register may become part of the motor timer, once the read mode is turned off, which would explain the different motor turn off delays once read mode was invoked, but now it's more likely that they just re-used part of the motor timer LFSR to generate the 14 FCLK delay controlling the MSB clear in LATCHed mode. At least I have found a way to "knock out" a new byte coming from the floppy disk with a certain data register read timing pattern closely before to the new byte being loaded. This can only be explained by having a shift register as a timer, because a timer based on a counter can't have two "MSB clear" events in it. Constructing a test case is a bit involved but will do this once I have the time to revisit this project again. For the next few weeks It will be on the back burner again, as I have other things to do.
- Uncle Bernie