13 July 2012

TCBs and SRBs and FRRs, oh my! (with ESTAEXs thrown in) - part 1

I'm writing this blog entry to help clear my head. Sometimes, just talking about a problem helps you find the solution to something that has been vexing you, or otherwise making your brain dizzy. It's akin to The Shower Principle, which I have subscribed to for decades.

The past few days I've been working on some interesting code, which involves scheduling an SRB (Service Request Block) in another address space. The work this SRB is doing is straight-forward, but it's the type of code that during development is pretty prone to abends. So, to properly protect it, and to help debug (if necessary), I'm also creating an FRR (Functional Recovery Routine). It's a pretty simple FRR, just WTOing the PSW (no, you don't get an explanation of that), interrupt code, instruction length code, and registers, and some additional identifying information about which SRB code is involved and what it was doing, then issues an SDUMPX. But this identifying information is where the rub begins. (And, yes, that is sort of an FRR retry pun.)
("Abend" here also includes program interrupts, your traditional S0C1s, S0C4s, etc., i.e., abends not triggered by SVC 13.) 

There are three sources of identifying information in this code: the scheduling TCB (Task Control Block), the product CVT (Communications Vector Table) (chained off our allocated vendor CVT entry), and the SRB code itself. IBM, in its infinite wisdom 40 years ago, limits the amount of memory passed around. For example, the SRB gets only one fullword parameter, whose address is passed via GPR1. An FRR receives a 24-byte parameter area whose address in this case comes from the IEAMSCHD macro, as under the covers it is issuing the SETFRR macro where you normally specify this area. The address of this FRR parm is also in GPR2 when the SRB gets control, assuming you put FRR=YES on the IEAMSCHD or SCHEDULE macro invocation. (If not, good luck with that.)


So, at a minimum, the list of addresses must have the address of the product CVT and the address of the SRB data area, and/or the SRB code itself (as I write this, the SRB is not reentrant, and does not obtain a reentrant work area; this may change). 

So now let's list what data my FRR desires, and where I can get it from...


Piece of dataFrom where?
PSW, registers at abendSDWA (System Diagnostic Work Area), passed via GPR1
Job name/numberProduct CVT (long architectural story)
SRB CSECT in controlSRB code (either grabbed from there or moved into another data area)
Tracing point in SRB executionSRB data area, or...?

The next question is how do I get the product CVT address and the SRB code address to the FRR. The answer is obvious, of course–use the FRR parm, as this info only takes up 8 bytes. But how do I populate the FRR parm? The scheduling TCB cannot do this when using IEAMSCHD (it does with SCHEDULE, but as it requires much more setup, I am not using it), so the SRB has to populate it. And how do we get data to the SRB? Only through that one fullword parameter.  And in this case, this is fine, because we only need the product CVT address; the SRB-specific data can be filled in by the SRB itself as the code starts executing.


There's an additional possible complication that I have alluded to, but not brought up yet. Remember when I said the SRB is being scheduled in another address space (through ENV=STOKEN)? That means that the parameter passed to the SRB has to be located in common storage, either CSA (Common System Area) or SQA (System Queue Area), or their extended (i.e., above the line) counterparts. (It cannot be 64-bit storage, but if you can put a 64-bit pointer in the area pointed to by the parm; it has to be a common object (IARV64 REQUEST=GETCOMMON), though.) But, as it turns out, this isn't an issue, The product CVT area is in ECSA already, since it needs to be accessible by any address space. And since there is nothing required (at this time) from the scheduling TCB, passing the product CVT to the SRB is sufficient.


Next is the tracing point. This will be presented in both WTOs and the SDUMPX-generated dump header to note which system call or other event was (going to be) executed when things went 'splodey, set by a simple MVC. The best place is for this to be moved directly into the FRR parm. So far, I'm only using 8 bytes of it (product CVT address and SRB code address), so using 8 bytes is not that bad. If it turns out I need more bytes, I can move it into the SRB data area. And, as it turns out, I like to be prepared, so we'll address it via the SRB code pointer, rather than using 8 bytes.


(Remember, my SRB is technically not reentrant, with the data area following the code area. If I need to make this reentrant some time down the road, then I can just store the reentrant work area address in the FRR parm area. I will note that FRRs are invoked as reentrant, as GPR0 contains the address of a 304-byte work area that the FRR can use as it pleases. In my case, I'm using it for editing and for MF=E areas.)


So, here's my final FRR input design:
  • PARM= on IEAMSCHD points to a fullword that contains the product CVT address;
  • I will use 2 fullwords of the FRR parm, one being the product CVT address, the other being the SRB code address;
  • Between the two, I can get all the data I want.
A note about one confusing thing about some of the keyword parameters on the IEAMSCHD macro: many are actually labels which are fullwords that contain the address of the item you want passed, for example, the all-important PARM and the various SYNCH...ADDR ones.  This means that you code something like:
SRBEPARM DS    A
SYNCCMP@ DS    A
SYNCCMP  DS    F

         LA    R0,SYNCCMP
         ST    R0,SYNCCMP@

         ST    R10,SRBEPARM
         IEAMSCHD …,PARM=SRBEPARM,SYNCHCMPADDR=SYNCCMP@,…
Once the FRR gets control and does its thing (WTOing and SDUMPXing), there are two possibilities–retry or percolate. Retry means that we specify an address of where to reestablish control in the code that abended. In this case, it does not make sense to retry in the SRB, because everything needs to execute successfully. If it doesn't, we will just stop and let the scheduling TCB know. So we will use SETRP RC=0, which indicates percolation, i.e., pass the abend along to whatever recovery routine precedes this one. 


And thus we are done with the FRR design. And, so far, writing this helped me consolidate a few things, and made some simpler implementations more obvious. Never underestimate the power of writing it down, or even bouncing it off a friend, or even to yourself in the shower. Just the act of thinking how you phrase something to communicate it to someone else can lead you to the realization of what is causing that bug you've looked at for a day. 


I will continue with this theme in part 2, along with debugging the SRB and FRR. That has its own set of issues that, if you haven't dealt with them before, or even in a few years, can vex you mightily.

No comments:

Post a Comment

Feel free to leave a comment or ask questions.