25 July 2012

LOCTR - very powerful, and makes it very easy to shoot yourself in the foot

I promise that I will get Martin's blog up in a couple of days, and I hope that part 3 of "TCBs, SRBs, and FRRs, oh my!" will be posted in a couple of weeks (shifting work priorties), but I thought I'd throw this little tidbit out there for you.

I am updating our standard entry/exit macros to support AMODE 64, and support various types of AMODEs and ASCs on entry. As a part of this, I've had to add some more LOCTRs to the definition of our work areas. But let me demonstrate this little trap for you:

WORK     DSECT ,
WORKLOC1 LOCTR ,
DW10     DS    D
WORKLOC2 LOCTR ,
DW20     DS    D
WORKLOC1 LOCTR ,
DW11     DS    D
WORKLOC2 LOCTR ,
DW21     DS    D
WORKLOC1 LOCTR ,
DW13     DS    D
* All done!
WORKLEN  EQU   *-WORK

         LHI   R0,WORKLEN
         STORAGE OBTAIN,LENGTH=(0)

What value is loaded into R0 by the LHI instruction, and how many bytes of storage will be obtained? Raise your hand. How many say 40? OK, a couple. How many say 24? Aha, those are the ones who have worked with LOCTR and have been bit by this little gotcha.

Those of you who said 40, can you figure it why it's 24? Yep, that's right. I'll go ahead and redo the above with location counters, so the reason becomes clear:


00000000 WORK     DSECT ,

00000000 WORKLOC1 LOCTR ,
00000000 DW10     DS    D
00000018 WORKLOC2 LOCTR ,
00000018 DW20     DS    D
00000008 WORKLOC1 LOCTR ,
00000008 DW11     DS    D
00000020 WORKLOC2 LOCTR ,
00000020 DW21     DS    D
00000010 WORKLOC1 LOCTR ,
00000010 DW13     DS    D
* All done!
00000018 WORKLEN  EQU   *-WORK

The EQU is in WORKLOC1, and it correctly calculates the length of the WORKLOC1 LOCTR area. There are two ways to correct this: 1) insert a WORKLOC2 LOCTR before the equate; 2) my preferred solution, which is to introduce a new LOCTR (call it WORK_END) before the EQU. In either case,  WORKLEN will now be the actual length of the WORK DSECT, 40 bytes.

LOCTR is very powerful, as it allows you to define items in macros but yet segregate; such as putting required constants in a CONSTANT LOCTR area, out of the instruction pipeline. But, like any powerful tool in the system developer's arsenal, it must be used with great caution, and always take a sanity look at the assembler listing. The hour of debugging you save may be your own!

And a few more things...

  1. Please try to avoid hard-coded offsets. It may have made sense to the person who originally wrote this code 20 years ago, but when someone new comes along, having nice, identifiable labels and DSECTs will make that person's life easier when trying to decipher your control block chaining. The time you spend coding the DSECT and labels now may preserve someone's sanity down the road.
  2. When you do switch to DSECTs, remember to put the right register in the USING statement. (That's more of a *headdesk* moment for me.)
  3. Don't use PRINT OFF unless you really need it. Nothing ticks me off more that looking at a listing file and seeing that the entry macro has issued PRINT OFF, and nothing follows.

24 July 2012

Es freut mich sehr, euch Martin Trübner, unseren Gastblogger, vorzustellen


(That's enough German for today.)

Martin's blog entry will be published this week, but first I thought I would give you a little bit of information about him. 


Martin is a freelance technical software developer, with deep system roots in z/VSE, and he has carried over his vast CICS applications knowledge over into z/OS. He has also written some software for Windows platforms. He was born in 1953 and met his first bit (an EBCDIC one) in 1972. He speaks German, English, HLASM, Cobol, PLI, REXX and some lesser known dialects like Pascal, BASIC, and Perl (which, if you didn't know, is really a town near the Luxembourgish/French/German border, located near Borg and Apach(e))Before becoming a freelancer he worked for various shops including applications, central development, a vendor of system programs, and for a PCM (Plug-Compatible Manufacturer, a producer of hardware that supported the IBM System/370 and 390 architecture)


I met Martin during my days in Software AG R&D. I was a developer for Entire System Server, working on the two supported IBM platforms (the third being BS2000/OSD). I don't remember all the details, but I know VSE/ESA was involved. He and I became good friends, and we finally met when I travelled to Germany in 2003, and we got together again in 2005. (From that last visit, I seem to recall lots of good beer and yelling back in unison "Wir wissen!" at the GPS' repetitive "links fahren" as he drove me back to the hotel.)


Martin is one of a small group of fellow mainframe developers whose opinions and advice I wholeheartedly trust. I am glad he asked me if he could be a guest blogger, and as I stated in the title, es freut mich sehr.


Welcome aboard, Martin!


(Und vielen dank, Martin, for making my title more German-sounding. I was pretty close with the original, just a couple minor transgressions, like forgetting that vor- is a separable prefix, and the wrong case for "unser".)

20 July 2012

Guest blogger coming soon – and an open invite to others

So, after a few blog entries, it looks like that this blog is going to be a lot about the development and debugging experiences when writing system tools software, as well as the toils and tribulations of working with computers on what should be simple tasks (like updating Windows) from an old-school mainframe perspective. 


(GUIs? Ha! We don't need no stinkin' GUIs! When I was your age we had switches on the panel and blinking lights, and we liked it!)


A good friend of mine - or should I say ein gut Freund - Martin Trübner asked if he could contribute his experiences working with one of those other IBM mainframe operating systems, z/VSE. I said sure! I do miss working with z/VSE; it's been about 8 years since I worked with it on a regular basis. It requires a bit more hands-on management from a systems programming perspective, which is good, as it keeps the brain engaged. I wouldn't mind getting a chance to dip my fingers back in the platform.


Look for his first guest blog post in the next few days.


I also am extending an open invitation to all mainframe software developers to guest blog. You've seen what I've done so far. If you want to share your thought process on designing or debugging of your daily routine (nothing proprietary, of course!), please send me an email, drop me a note on FB, G+, LinkedIn, XING, or Twitter, or send smoke signals in the Sacramento vicinity, and please make sure they do not look like the recent spate of grass fires we've had.

16 July 2012

TCBs and SRBs and FRRs, oh my! - part 2

Debugging the SRB and associated FRR

UPDATE: See the end for a nice piece of helpful information that can remove hours, gray hairs, and wrinkles from the debugging process!


Debugging an SRB and an associated FRR can be a real PITA. You can't get a traditional SYSUDUMP/SYSABEND; the only thing you can do is try to get an SVC dump, as they are known. These are binary dumps of storage, similar to SYSMDUMP, and have to be processed with IPCS (or AMDPRDMP in the old days).

So, the first problem I ran into was that the SRB was abending with an S0C4. No problem, look at the FRR…but there's no FRR WTO output. There's also no WTO output from the SRB, even the one near the very beginning. Hmmm…so I do what all programmers do – I start limiting the code. I reduce my SRB to just the WTO. Now it shows up. I eye suspiciously my accessing of the product CVT via GPR1, which is the PARM= parameter of the IEAMSCHD macro. Oops. Apparently, I had a bout of fuzzy logic. I was interpreting GPR1 as being a pointer to the fullword specified in PARM=. Nope. GPR1 has the value specified in the area pointed to by PARM=. So from the example from part 1, GPR1 will have the value of GPR10, which was stored in SRBEPARM, which is then pointed to by the PARM=. So I fixed that, and I got the WTO. 


Now I want to drive the FRR, so I added a good old DC H'0' to force an S0C1. Instead, I got an S0C4. I also noticed garbage in the job name/job number fields for the WTO in the hardcopy log. These fields are obtained from the product CVT, so I realized that the product CVT might either be zero or initialized or otherwise garbage. I changed the code so that if the product CVT pointer was invalid, then use a eye-catching value for the job name and job number. That helped, but then I was still not getting the WTOs with the PSW and other information. I then noticed that a system dump had been produced from the WTO. I saved that off, but then I realized that I needed to make sure that the PARM= was actually the product CVT. I added a WTO to print the contents of the area and return, and, sure enough. it was fine. But I was still getting an S0C4 and my FRR was not firing…why was this happening? Then I thought that it must be occurring upon return from the FRR. But after a few more tests, I noticed that the FRR WTO was firing randomly.


During this time, I was involved in an IM conversation with my friend "R",  discussing my frustrations. He reminded me that WTOs are not synchronous by default, and in an SRB this can lead to timing issues. I added SYNCH=YES, and I also added a parameter to do testing inside my address space, not the external address space, as this will help reduce issues and exposure to screwing up the system. And then begins the part of debugging I hate the most – things suddenly start working.


I ran with an option that sets the FRR to be the wonderful z/XDC product from ColeSoft. And it hits my forced S0C1. That's good. I set it back to my FRR. It doesn't hit. I then begin to notice that the LPAR is having issues, and it looks like we will need to IPL. Time to quit for the weekend, the highlight of which was seeing the blues band that two of my high school classmates are in playing in the East Bay. Another classmate, who was on vacation, sat in as well. (All three, along with a fourth classmate, had had their own band during our high school era.)


Just another manic Monday…wish it were Sunday…

So, after a weekend of not looking at this, back to figuring out what's going on. But on Sunday, The Shower Principle hit again; I realized that by scheduling the SRB for testing purposes in the same address space, I can use GTF. Unfortunately, a SLIP Instruction Fetch trap has a limitation of 1M on the RANGE= keyword, so I have to think about to handle this.


Meanwhile, when I looked again at the S0C4 I had on Friday, I realized that it's not dying in the FRR, but on a QSAM (Queued Sequential Access Method) PUT. Since it doesn't occur when z/XDC is in the way, this is when GTF becomes useful. I moved to our sandbox LPAR so if my testing is messing with the LPAR, I will only shoot myself. Working with SRBs is like working with fire - you need to be very careful, because if you make a mistake, you can burn not only yourself, but the system. (And, as it turns out, I was. A wayward store in *MASTER* or in other common storage can have far reaching negative consequences.) Therefore, you need to be testing these types of routines on a sandbox system so you burn only yourself, and in case you need to IPL, you can quickly do so.

So I started GTF and reran the job…and of course, because this is how it's going, it sort of works. It finished with CC 16. That may prove useful, though, and I have output. Oh, wait, I forgot that the LPAR was IPLed on Friday night, so there is no APF authorization. I issue the appropriate APF commands, running a convenient batch job that issues operator commands that I run after an IPL. And the job completes successfully. The FRR fires, but the return information from the IEAMSCHD shows an S0C4. Remember, I have a forced S0C1 to drive the FRR, so the FRR itself must be abending with the S0C4.


Now I download it to my PC (remember, binary!) because I have a Perl script which parses a GTF trace data set and creates a CSV file that can be loaded into Excel for easier perusal and analysis. (I may make it available for download on my web site.) Two things cropped up, though; 1) MATCHLIM was too small; 2) HSM reallocated my TRACE data set and I only collected about 2500 records. So I fix those and try again, more to prove that my GTF parameters work, and they do.


Now how do I trace the GTF? Listings show that the ESQA allocation address is different every time, so I can't really guarantee a range. (ESQA allocation is volatile, so it's not surprising, really.) z/XDC can show me where it's allocated, so I make a test run under z/XDC but without using it as the SRB FRR. For various reasons which I won't go into, this doesn't work. And I'm beginning to think that maybe the S0C4 in the FRR isn't always triggering, and other times random storage is being overlaid. But it appears that I cannot reliably get a GTF trace of the SRB and FRR.


Since I'm dealing with something that is apparently random, I started looking at the FRR code after the two WTOs that I'm seeing. And, oh, boy. I made a horrible coding mistake. I can lay it at the foot of "it's been a while since I worked with SRBs and FRRs," but, still, I should not have made this.


What did I do that is worthy of a *headdesk*? The coding error was in the SETRP macro. If you do not specify WKAREA=, it defaults to assuming that GPR1 points to the SDWA. And which register is used by the WTO macro for the service's parameter list? Yep, you guessed it. And that is why it was purely random, as the address of the SRB code and data area moved around consistently.


Unfortunately, there also seems to be some sort of conflict with z/XDC and my SRB and FRR, so I'm going to take a suggestion that my friend "C" gave me during a conversation last week. Instead of obtaining ESQA and moving the code into the storage, I can split the SRB and FRR into its own load module. I can then use a LOAD GLOBAL=YES,EOM=YES and I then should be able to use GTF to trace any activity there. But there is a question of how I can set up the SLIP trap, because the only way to specify modules are LPA or private. LOAD GLOBAL loads them into CSA, which can be addressed by RANGE. But I don't know where it will be loaded until after the LOAD has executed, which makes it rather difficult to set up the SLIP, unless I do something like coding a WTOR with the address in it. I'd rather not do this, but it may come to that. But back to the head-scratching…


I'm beginning to suspect that the parameter coming in via GPR2 to the FRR isn't quite what I expect it to be. I added a WTO that writes out the value of GPR2 coming in, and my first run looks suspicious – in fact, it's X'0000C00'. That looks rather strange, so I run it again. It's the same. I check with a couple of people…and yes, that is a legitimate address. The FRR stack is located between X'800' and X'1000'. So something else is happening; maybe the SRB is building this 24-byte area wrong.


I added WTOs dumping the 24 bytes pointed to by GPR2. (Note that SDWAPARM also has this address.) And this shows what I've done. Somewhere along the way, the instruction to store the address of the product CVT has disappeared from the SRB, so the FRR was loading 0 instead. Big oops. Big, big oops. And this may explain a lot of the randomness as well.


So I fix that and rerun, and now the 24-byte area looks correct. But things still don't seem to be working right. I add a WTO here and there, and I realize that I've miscoded an NI instruction to turn off a recursion-prevention flag, forgetting the 255– part; thus it thought it was recursing. I fixed that and the FRR is working for the code I have there. I now add the full FRR code (dump the register/PSW info out of the SDWA), but something is still breaking. I narrow it down to a load of another part of the FRR parm area. Now I haven't purposely munched GPR2, but I add some WTOs to see if maybe branch entry WTO is changing it (the documentation says it does not). Nope, that's not it. 


I start an IM session with "R"; I can't understand why it fails if I include an instruction and works if I skip it. Finally, I move the code up to the very beginning, and suddenly my WTOs become garbage. I then look and I realize that I am munching my constant area pointer with one of these loads. This was left over from when I started working on this code and my register settings were in flux. I change the register and voilà, it works.


*headdesk* *headdesk* *headdesk*


Part of what made this so hard is that I was so familiar with the code, it looked right. I'd lost track of the fact that the original use of that register had changed. This is why when you are really at a loss, a second set of eyes can help immensely. The second set is not familiar with the code, and they can spot anomalies pretty quickly. Back in university, I became known for the person who looked at the code and could spot the issue quickly, and it was because I was the second set of eyes. I did pass this along to others, though, and gradually people picked up on this. Never be afraid to ask others to look at your code, and never be embarrassed if they find something really, really stupid. You just didn't see it, because your brain interpreted the code as you thought you wrote it, not how you did write it.


But, after all this, I have a working FRR routine. Now I can fully test the SRB code, which is connecting an ETX…and it works perfectly the first time out. (This is always suspicious, but it worked when it was not a part of the SRB, so I suspect it is fine.) Now I will need to add the destroy ETX code. Once this is working, I will pass this version onwards so my colleague can begin testing, and then I will code the ESTAEX, which is the subject of part 3.  I will then make some enhancements, including the aforementioned LOAD GLOBAL for the SRB, and also adding LOAD GLOBAL for the PC code.


So, a post-mortem on this pretty much lays it on three things which had overwriting consequences: 

  1. incorrectly interpreting the value in R1 when the SRB gets control;
  2. forgetting the WKAREA= on the SETRP macro; and
  3. wiping out my constant base register.
Oh, one piece of advice I received from "C" that I should not neglect: if you do move the FRR directly into SQA, put it on a doubleword boundary.

Part three, detailing adventures in ESTAEXing, will probably come later this week or early next.


UPDATE: Jim Mulder, who is one of the z/OS people in Pok, reminded me that normally when an FRR abends, a record is written to LOGREC. You can print this off and it will give you lots of information. I know this would have saved me several hours of teeth-gnashing.

13 July 2012

TCBs and SRBs and FRRs, oh my! (with ESTAEXs thrown in) - part 1

I'm writing this blog entry to help clear my head. Sometimes, just talking about a problem helps you find the solution to something that has been vexing you, or otherwise making your brain dizzy. It's akin to The Shower Principle, which I have subscribed to for decades.

The past few days I've been working on some interesting code, which involves scheduling an SRB (Service Request Block) in another address space. The work this SRB is doing is straight-forward, but it's the type of code that during development is pretty prone to abends. So, to properly protect it, and to help debug (if necessary), I'm also creating an FRR (Functional Recovery Routine). It's a pretty simple FRR, just WTOing the PSW (no, you don't get an explanation of that), interrupt code, instruction length code, and registers, and some additional identifying information about which SRB code is involved and what it was doing, then issues an SDUMPX. But this identifying information is where the rub begins. (And, yes, that is sort of an FRR retry pun.)
("Abend" here also includes program interrupts, your traditional S0C1s, S0C4s, etc., i.e., abends not triggered by SVC 13.) 

There are three sources of identifying information in this code: the scheduling TCB (Task Control Block), the product CVT (Communications Vector Table) (chained off our allocated vendor CVT entry), and the SRB code itself. IBM, in its infinite wisdom 40 years ago, limits the amount of memory passed around. For example, the SRB gets only one fullword parameter, whose address is passed via GPR1. An FRR receives a 24-byte parameter area whose address in this case comes from the IEAMSCHD macro, as under the covers it is issuing the SETFRR macro where you normally specify this area. The address of this FRR parm is also in GPR2 when the SRB gets control, assuming you put FRR=YES on the IEAMSCHD or SCHEDULE macro invocation. (If not, good luck with that.)


So, at a minimum, the list of addresses must have the address of the product CVT and the address of the SRB data area, and/or the SRB code itself (as I write this, the SRB is not reentrant, and does not obtain a reentrant work area; this may change). 

So now let's list what data my FRR desires, and where I can get it from...


Piece of dataFrom where?
PSW, registers at abendSDWA (System Diagnostic Work Area), passed via GPR1
Job name/numberProduct CVT (long architectural story)
SRB CSECT in controlSRB code (either grabbed from there or moved into another data area)
Tracing point in SRB executionSRB data area, or...?

The next question is how do I get the product CVT address and the SRB code address to the FRR. The answer is obvious, of course–use the FRR parm, as this info only takes up 8 bytes. But how do I populate the FRR parm? The scheduling TCB cannot do this when using IEAMSCHD (it does with SCHEDULE, but as it requires much more setup, I am not using it), so the SRB has to populate it. And how do we get data to the SRB? Only through that one fullword parameter.  And in this case, this is fine, because we only need the product CVT address; the SRB-specific data can be filled in by the SRB itself as the code starts executing.


There's an additional possible complication that I have alluded to, but not brought up yet. Remember when I said the SRB is being scheduled in another address space (through ENV=STOKEN)? That means that the parameter passed to the SRB has to be located in common storage, either CSA (Common System Area) or SQA (System Queue Area), or their extended (i.e., above the line) counterparts. (It cannot be 64-bit storage, but if you can put a 64-bit pointer in the area pointed to by the parm; it has to be a common object (IARV64 REQUEST=GETCOMMON), though.) But, as it turns out, this isn't an issue, The product CVT area is in ECSA already, since it needs to be accessible by any address space. And since there is nothing required (at this time) from the scheduling TCB, passing the product CVT to the SRB is sufficient.


Next is the tracing point. This will be presented in both WTOs and the SDUMPX-generated dump header to note which system call or other event was (going to be) executed when things went 'splodey, set by a simple MVC. The best place is for this to be moved directly into the FRR parm. So far, I'm only using 8 bytes of it (product CVT address and SRB code address), so using 8 bytes is not that bad. If it turns out I need more bytes, I can move it into the SRB data area. And, as it turns out, I like to be prepared, so we'll address it via the SRB code pointer, rather than using 8 bytes.


(Remember, my SRB is technically not reentrant, with the data area following the code area. If I need to make this reentrant some time down the road, then I can just store the reentrant work area address in the FRR parm area. I will note that FRRs are invoked as reentrant, as GPR0 contains the address of a 304-byte work area that the FRR can use as it pleases. In my case, I'm using it for editing and for MF=E areas.)


So, here's my final FRR input design:
  • PARM= on IEAMSCHD points to a fullword that contains the product CVT address;
  • I will use 2 fullwords of the FRR parm, one being the product CVT address, the other being the SRB code address;
  • Between the two, I can get all the data I want.
A note about one confusing thing about some of the keyword parameters on the IEAMSCHD macro: many are actually labels which are fullwords that contain the address of the item you want passed, for example, the all-important PARM and the various SYNCH...ADDR ones.  This means that you code something like:
SRBEPARM DS    A
SYNCCMP@ DS    A
SYNCCMP  DS    F

         LA    R0,SYNCCMP
         ST    R0,SYNCCMP@

         ST    R10,SRBEPARM
         IEAMSCHD …,PARM=SRBEPARM,SYNCHCMPADDR=SYNCCMP@,…
Once the FRR gets control and does its thing (WTOing and SDUMPXing), there are two possibilities–retry or percolate. Retry means that we specify an address of where to reestablish control in the code that abended. In this case, it does not make sense to retry in the SRB, because everything needs to execute successfully. If it doesn't, we will just stop and let the scheduling TCB know. So we will use SETRP RC=0, which indicates percolation, i.e., pass the abend along to whatever recovery routine precedes this one. 


And thus we are done with the FRR design. And, so far, writing this helped me consolidate a few things, and made some simpler implementations more obvious. Never underestimate the power of writing it down, or even bouncing it off a friend, or even to yourself in the shower. Just the act of thinking how you phrase something to communicate it to someone else can lead you to the realization of what is causing that bug you've looked at for a day. 


I will continue with this theme in part 2, along with debugging the SRB and FRR. That has its own set of issues that, if you haven't dealt with them before, or even in a few years, can vex you mightily.

08 July 2012

Windows 7 upgrade - the good, the ugly, and the really ugly

5 years ago, I bought an HP Pavilion dv9000 laptop with an AMD 64-bit processor. Now, yes, I made a few mistakes, such as not springing the extra $100 for a slightly faster processor. But I was not prepared for the fantastic lemon this laptop was.


First off, I found out why the machine came with 2 x 120GB HDDs...because the 64-bit Windows Vista Ultimate used over half of one of the drives, between the WoW and the SxS. Then, within a month, I bought 2GB more to get to the maximum of 4GB. Even with that, disabling Aero, splitting the paging across 2 hard disks, and adding a then-high-capacity 4GB SD card for Ready"Boost", it still spent over half its resources paging. BSODs were a regular occurrence. I was so happy when SP1 came out...and then I discovered that it wasn't released for my laptop, because of driver issues with the NVidia does-everything-poorly-including-the-kitchen-sink chip. It took a year before I could install SP1. The same thing happened with SP2; I had to wait a year for the same reasons. And the same for Windows 7.


I drive my computer hard - not for entertainment, but for development. And this machine was not up to the task. Eventually, as all computers do, it started slowing down, BSODs became more frequent (the always-inexplicable "DRIVER_IRQ_LESS_THAN_OR_EQ" was a personal favorite), took forever to boot (I would turn it on or restart and then wait 30 minutes until the HDD activity indicator was no longer solid), and frankly was just piss-poor. (Even most Linux-based HDD recovery programs won't work on this crappy graphics card.)


(I will say this; the two times I had to return it for warranty work, HP was fantastic, probably because the line was a pile of crap. They even replaced the keyboard and the DVD-RW drive when the C drive crapped out. It isn't enough, though, for me to take HP computers off of my no-buy.)


I realized I had two choices -- either get a new computer, or install Windows 7 (which finally became "supported" last year...how long had Win 7 been available?). I have used Win 7 on my work laptop, and Cheri has had it over a year. Many others recommended upgrading. Upgrading is definitely the cheaper option, and could extend the life of it for a year or more. Also, because of extensive customizations, wiping and installing fresh was not a desirable option. So I bit the bullet, buying the upgrade DVDs from Discount Mountain Software in Denver, CO for $160 or thereabouts (definitely the cheapest out there for credible vendors).


I knew I was going to have a rocky battle...and I was not disappointed. The biggest roadblocks were:

  • Issues with registry keys and userids and other random crap that all were tied to IIS 6. I only used IIS as an FTP server, so I uninstalled it.
  • Registry error with msacm.l3codecp: At some point (I've seen either base Vista -> SP1 or SP1 -> SP2) the data for HKLM\Software\Wow6432Node\Microsoft\Windows NT\CurrentVersion\Drivers32\msacml3codecp is set to an empty string. It needs to point to the file of that name in the same directory as the immediately preceding entry.
  • "Process exit error 11 (0x0000000b)" or response code 0x8009000b. This is because the Microsoft\Crypto\RSA\MachineKeys files entries are screwed up due to, again, a problem with one of the aforementioned SPs. One recommendation is to change the sharing of three of the files; unfortunately, that did not work. Another is to delete three files; that also did not work. The eventual solution was to delete every file in the directory (the location can be different for various reasons). They're not necessary because the keys are different in Win 7, and you don't want the Vista versions anyway.
EDIT: Of all the logs written during the upgrade, the most important log file is  (install_drive):\$WINDOWS.~BT\Sources\Panther\setuperr.log. It has just the error messages. Many can be ignored; the ones that are getting you are the ones that trigger a rollback. Note that you should look at this immediately after the error message and subsequent restart, because sometimes this directory disappears.

I now have Windows 7 Ultimate x64 running, and 134 required updates later (including SP1), I am now up to speed. I'm now re-downloading the language packs and a couple of optional updates, and things should be fine.

Am I surprised at how much fun (FSVO "fun") this was? No. Windows is a complex beast, and unfortunately has to cover so many situations. Combined with the crappy design of this HP craptop (as I call it) I didn't expect less.

I'm not going to Win 8. It's in the Win95/WinME/WinVista cycle, and the loss of desktop functionality and a more tablet-like experience on said desktop will cause great gnashing of teeth and much whining - so much, so, that not only do I predict rollback licensing (like XP for Vista), the desktop interface, which was removed at some point during the Consumer Preview, will return. 

Welcome!

Cat Herder Software, LLC, is now a reality

After over nine years, an opportunity dropped into my lap that required me to incorporate. As a part of setting up my web site, I decided to create a blog where I could share my experiences.

Although I am primarily a mainframe programmer, I have had many experiences with other platforms, including many invoking pulling of hair, eye-rolling, and the occasional headdesk. Sometimes I may relate those experiences here as well, primarily for documentation.

And who knows, a picture or two of a cat might show up here as well.

Thanks for following!