08 November 2012

IBM Software for System z For Dummies ®

Yes, you read that right.

You do need to give up some contact info, but you can download and read IBM Software for System z for Dummies ® and learn how System z is the right path for your growing business. Integrity, transaction handling, technological superiority, all are hallmarks of the System z family, a tradition going back to System/360 days.

07 November 2012

An ongoing list of z/Architecture assembler things that you forget if you don't use them often enough

This blog entry will be updated and reposted whenever I come across things that I should remember but my brain has forgotten due to infrequent contact with the item.

  • TRT munches GPR2. That's why the subsequent compare failed miserably.
  • Man, there are a lot of SVCs and PCs that won't work in AR mode. Make sure you switch out of AR mode before calling macros like WAIT.

16 October 2012

Little bitty z/Architecture assembler tips and tricks not worthy of their own blog entry

This blog entry will be updated continually and reposted when I come up with little tips and techniques not worthy of their own blog entry.

High half register check

Can't use those bitchin' new high-half register instructions because you have customers not on the latest z196/z114? Need to check the upper half of a register for non-zero so your AMODE 64 doesn't run amok? Use the following:
CLG   R0,=X'00000000FFFFFFFF'
And you also have an example of a label longer than 8 characters.

Make sure your base register is clean in AR mode

LAE is a nice instruction for setting a base register if you aren't sure your code will be called in primary or AR mode. If the B2 field is 0, when you are in AR mode, 00000000 is placed into the corresponding R1 access register, so you can code 

LAE   R12,0(R15)

and get your entry point without the possibility of getting messed up by a stray ALET in AR12.

WAIT LONG=YES and baseless programming (z/OS, 1.12 and earlier)

If you've tried this combination before, you've gotten bit because the macro generates a BCR GPR8,0 instruction followed by an ICM of the second byte via an "*-1" operand. If you're baseless, this fails because you are baseless. However, there is no need to fret. The following code works and gets around this issue.

         IILH  GPR0,X'8000'
         IILL  GPR0,1
         WAIT  (0),ECB=MYECB

The macro generates an unnecessary LR GPR0,GPR0 followed by the appropriate code. If you are using an ECBLIST, just put the number of this-many-POSTed ECBs in GPR0; in this case, it might be easier to load GPR0 first, then use an OILH GPR0,X'8000' instruction.

As noted in the comments below, starting with 1.13, if you are SYSSTATE ARCHLVL=2, WAIT will generate appropriate code for baseless situations.

More to come as I think of them...

05 September 2012

An interesting way to implement a multi-entry-point API for an LE assembler module

Today I came up with an idea for implementing an API assembler module when LE is involved. Let me first say that although  LE is very functional and usually makes life easier for high-level languages, when assembler gets involved it complicates things until you get used to LE thinking.

The original code had an entry point which did the STM for the save area and then loaded a function code into a register. Unfortunately, this does not work well with LE because the CEEENTRY macro has the STM in it. So I came up with this interesting implementation to get the function code and have only one CEEENTRY macro. Here is a very simple example, without the required CEEPPA, CEEDSA, CEECAA, and CEETERM macros:

INIT     DC    0H
         J     LEENTRY
READ     DC    0H
         J     LEENTRY
WRITE    DC    0H
         J     LEENTRY
TERM     DC    0H
         J     LEENTRY
API      CEEENTRY MAIN=NO,BASE=R11 (default base)
         L     R15,4(,R13)        Backchain save area
         L     R15,16(,R15)       Get R15 at call
* R15 now is the address of the entry point
         LA    R14,0(,R11)        Clear out HOB just in case (SEE BELOW)
         LA    R15,4(,R15)        Clear out HOB and make 1-based 
         SR    R14,R15            Calculate offset from beginning of CSECT 
         SRA   R14,2              Divide by 4, preserving sign (allows use of JM BAD)
* R14 now has a function code ranging from 1 to 4
* Obviously we add some additional checking of R14 to make sure
* it is valid for our purposes.

So now GPR14 has a value ranging from 1 to 4, with 1 being INIT, 2 being READ, etc.. Good programmers will test GPR14 for falling within the required boundaries. You could also forgo the SRA and use the value of 4, 8, 12, or 16. In current releases of z/OS, CEEENTRY uses LARL to load the base register with the address of API, so the LA of GPR14 from GPR11 could just be an LR. This also works with non-LE assembler code; if you have a base register use LA, and if "baseless" use an LARL of the CSECT name. Just make sure that if you have a standard entry macro (most shops do) that generates a CSECT, the CSECT name used at the beginning of the module has to match that generated by your macro.

31 August 2012

Some uncommonly used features of WTO in z/OS (and maybe z/VSE)

The past few days I have worked on a couple enhancements that exploited some lesser-known WTO features. None of these require APF authorization.

Multiple-line WTOs

So, you ask, why multiple-line WTOs? Why don't I just send lots of WTOs? Well, that sucks up system resources, and if there are any console exits, you're driving them every time. A multple-line WTO can give the console subsystem and your hardcopy log a little breather. There is one drawback – you can only have 71 characters per line…but we're all professionals, that's not a big deal. (A normal single-line WTO can have up to 126 characters.)

Everybody who is a z/OS mainframe programmer has already been exposed to one example - the IEA995A message. This is the one issued when an abend occurs and it has eventually percolated to RTM. You can identify multiple-line WTOs in the JES message log (first SYSOUT data set), JES system messages (third SYSOUT data set), the hardcopy log, or your TSO session as there is no repetition of the date, time, nor the JES job identifier. They are replaced by the three-digit number which appears to the right of the first line of the message. Here's a contrived example (if you don't know, S806-4 means that the load module could not be found in the STEPLIB/JOBLIB/link pack/LPA chain), and I apologize for the small print but since I'm using a basic blogger.com template, this is the only way I could get it to show up without line wrapping:

13.14.51 JOB74042  IEA995I SYMPTOM DUMP OUTPUT  864                                    
   864             SYSTEM COMPLETION CODE=806  REASON CODE=00000004                    
   864              TIME=13.14.51  SEQ=28924  CPU=0000  ASID=0062                      
   864              PSW AT TIME OF ERROR  070C1000   8141F276  ILC 2  INTC 0D          
   864                NO ACTIVE MODULE FOUND                                          
   864                NAME=UNKNOWN                                                    
   864                DATA AT PSW  0141F270 - 8400181E  0A0D18FB  180C181D            
   864                AR/GR 0: 936AF47A/00000000_00001F00   1: 00000000/00000000_84806000
   864                      2: FFFFFFFF/FFFFFFFF_00000000   3: FFFFFFFF/FFFFFFFF_00000000
   864                      4: FFFFFFFF/FFFFFFFF_00000000   5: FFFFFFFF/FFFFFFFF_007E69F8
   864                      6: FFFFFFFF/FFFFFFFF_000000FF   7: FFFFFFFF/FFFFFFFF_00000000
   864                      8: FFFFFFFF/FFFFFFFF_007B5150   9: FFFFFFFF/FFFFFFFF_0141F778
   864                      A: FFFFFFFF/FFFFFFFF_00000000   B: FFFFFFFF/FFFFFFFF_00000000
   864                      C: FFFFFFFF/FFFFFFFF_00000000   D: FFFFFFFF/FFFFFFFF_007B5150
   864                      E: 00000000/00000000_84806000   F: 00000000/00000000_00000004
   864              END OF SYMPTOM DUMP                                                  
There are two variations of how to code the WTO macro to do this. Both utilize the second subparameter of either the first positional parameter (traditional WTO) or the TEXT= operand, and in both cases it is for each multiple subparameter. The first of these is "D", which indicates that the text is part of a multiple-line WTO. The other is your choice - "DE" or "E". "E" indicates that the WTO is fniished; "DE" means that the text supplied on this invocation is the last line. I have found that the easiest way to implement this is to use TEXT= and MF=L/E, something similar to this (only the required stuff shown so not to clutter with DESC= and ROUTCDE=):

         MVC   @WTOD,#WTOD
         MVC   WTOTEXT,MSG1
* put in variable parts of MSG1
         ST    R1,WTOCONN
         MVC   WTOTEXT,MSG2
* put in variable parts of MSG2
         MVC   @WTOE,#WTOE


@WTOD    DS    CL(#WTOD_L)
         ORG   @WTOD
@WTOE    DS    CL(#WTOE_L)
         ORG   ,

(Now, yes, I could have mapped the WTO parm list and manipulated things directly for the E invocation, but I believe this explains the concept better.)

The "D" and "E" invocations are tied together via the CONNECT parameter. The CONNECT token is returned in GPR1 if a multiple-line WTO is started. It is then referenced in subsequent WTO invocations. (In the old days, before CONNECT, you put it in GPR0 before the call, and you can still do this if you don't use TEXT=.) The WTO is not issued until you make the "E" invocation.

You can use the field where you save the CONNECT token as a switch in a loop. If it is zero, it's the first time, so you invoke the WTO without CONNECT and store the token, and with subsequent iterations you invoke the WTO with CONNECT.

But remember – you need that "E" call, otherwise your partially built WTO will be discarded after a few seconds, which is eons to a 5.5Ghz model 2827 (zEC12) CPU.

Non-authorized callers are limited to 10 lines maximum, but authorized callers can have up to 255 lines, and that's a whole lotta lines.

Command response WTO

Have you entered an operator command via "/" in SDSF and wonder how it was able to (usually) display the response from the command? It is not an "ancient Chinese secret," if you are old enough to remember that USA commercial from the 1970s. One crucial piece of information is required: the command-and-response token (CART).

When your task that processes operator MODIFY or STOP commands (or if you're really fancy, from the START command, or through operator command or subsystem interface exits) retrieves the command, the CART is located in both the CIBX (addressable by adding CIBXOFF to the CIB address), the CSCB, or the aforementioned exits. (Modern processing uses the CIB.) Once you have this piece of information, you can specify it via the CART= operand on the WTO. This identifies this WTO as a command response, and certain processes that use the console subsystem can retrieve just these WTOs and tie them to the originating command. This is how SDSF does it, and it becomes useful for the next section…

Limiting a command response WTO to the console where the command was issued

When using the console subsystem for issuing operator commands and retrieving WTOs, such as IBM's NetView or Software AG's Entire System Server, it is nice to only send the WTO command response to the originating console, rather than to every console (hardware and subsystem) in the LPAR. You limit this by specifying the console ID on the WTO through either CONSID=. The console ID is available in the CIBX, the CSCB, and the aforementioned exits. It is also possible to put a console ID that was obtained elsewhere, if you have an environment that has well-defined consoles. If you know the console name, you can use CONSNAME=. (CONSNAME and CONSID are mutually exclusive, however.)


If you are feeling adventurous, you can build your own parameter list to SVC 35 using the IEZWPL DSECT (found in SYS1.MODGEN). However, you really need to be careful in building this list, because a mistake can lead to an SD23 abend, incomplete WTO, or pure garbage and messages incorrectly flagged as critical.

If you want to try this, I recommend that you do some sample assemblies with various expansions of the WTO macro to see how it builds the parameter list. I also recommend you talk to your systems programmer, and if you are nice maybe s/he will allow you to try this on a sandbox system.

WTOs and commands in z/VSE

In the late 1990s, the OS/390 console support was ported to VSE/ESA., retaining many of the same design concepts. Therefore, many of the concepts above apply as well, however I have to admit that I have not done a multiple-line or a specific command-response WTO in a z/VSE task as a response to the MSG console command. If there is interest, I'll get our z/VSE expert Martin to elaborate.

16 August 2012

My high school is no more

Some sad news crossed my path the other day. My high school alma mater, Montclair College Preparatory School, shut down a month ago, a victim of the economy. They'd been around for 56 years.

The founder and operator, Dr. Vernon E. Simpson, passed away about 4 years ago. Apparently since, the school had had difficulty getting endowments and drawing students. I'm sure the economy had a lot to do with this, as well as the passing of the visible presence of the school to parents and the community. Last year, they had cut all sports programs, which is never a good sign. 

Montclair is where I learned all about computers. Up until the last semester of my senior year, I had absolutely no clue as to what I wanted to do. Then, that Christmas, I received a TI-58 programmable calculator. I started playing with the programming, and it was like the proverbial light bulb. Then a computer class became available, and I started working on a TRS-80 Model 1. Within a month I was working on the school's PDP-8, and the rest, as is said, is history. And here I am today.

On the auditorium stage when I got up to receive my diploma (actually a rolled-up piece of paper because the original diplomas had the school's name misspelled by the printer), Dr. Simpson whispered to me "and honors in computers", as he knew that I spent lots of time after school hours working on the PDP-8, just writing programs.

Now that part of my life really only exists in memories. No longer can I drive on Sepulveda Blvd. heading to Western Bagel and look at the school, and remember that's where I got the kick in the tushie that pointed me to what I am today. But I do have good friendships with several of my classmates, and I've seen a few of them in the past month. Our 35th reunion is coming up in a couple of years. I helped get people together via Facebook for our 30th, and social networking will help us again. And Montclair Prep will live on in our collective memories.

Thank you, Montclair. You may have had your problems off and on over the years, but you taught me a lot about life, and got me into a career field I love. 

07 August 2012

Sometimes being a pack rat works out well, sometimes you wind up on "Hoarders"

This past weekend was dedicated to a big rearrangement and purge of my office (although my wife says I still need to purge more, and I actually agree with this sentiment). The catalyst was the receipt of a recent woot.com deal of a combination of two HP 22" monitors with a dual stand. I rearranged the work areas so that my day job workspace is much bigger, and I cleared enough to put my Mac Mini, my Windows 2003 Server server, and two laptops destined to become Linux/BSD machines in one corner.

The wired keyboard and mouse I was using for the Mac were a pain, so I decided to use one of a pair of Microsoft wireless keyboard/mice I had bought a while ago from woot.com. I opened the package, and started to work on it. I saw all of these dire warnings about "install software first!" (exclamation points included), and went about this. That's when I remembered something - the Mac Mini does not have an optical drive. No big deal, I'll just copy the contents of the CD to a USB drive and use Sneakernet. But when I loaded the CD into my laptop, I noticed something - no .dmg files (the format used by Apple OS for installation). I didn't dig further, but I realized that this was a dual-format CD, and Windows would not acknowledge the OS partition.

Then I remembered something - I had an ancient external CD writer. Purchased around the turn of the century, we used it to back up an old Sony VAIO laptop used extensively during our eBay selling days. It was temperamental with writing CDs, but, hey, who cares, I'm reading, not writing. It was also early USB technology, probably 1.1. Off to one of my two junk boxes; I tore through the contents, and found it and its power supply. I grabbed a USB cable, plugged it in (both places), and inserted the CD. Surprisingly, OS X Lion recognized it, and, even better, it recognized the CD! I then clicked through Finder and started the dmg package, and it installed. There were a couple of dire warning messages, but after the restart, I was able to plug in the remote receiver and my wireless keyboard and mouse are working perfectly.

Moral of the story: Don't throw anything out, you might need it. Wait, revised moral: You can get rid of some things, for example, the 10Mb network switches and the parallel and serial mechanical A/B switches.

(A victim of the purge is my dearly loved IBM P96 flat-screen CRT monitor, made by Sony, with Trinitron innards. It is unique because it was an early dual-input device, and drives an incredible-for-its-time 1600x1200 (but I swear I drove it harder than that at 1920x1440). I call it the boat anchor, because it's pretty darned heavy, and is really not conducive to relocation. It suffers from a little tic in that when it warms up the display is a bit cut off; all you have to do, though, is power-off and -on and it clears up. The reason why I am digressing is that despite its little glitch, it is still a great monitor, and I hate to send it to the recycling facility. If you're in the Sacramento area, drop me a note via my web site, and it's yours for free.)

01 August 2012

Getting the CPU-Measurement Facility to work in z/VSE

At long last - the promised guest blog entry from Martin Trübner! Vielen dank!

Let me first introduce myself

My name is Martin Trübner. I am German and work from here, but did (when being on site was vital) gigs all over Europe and the US. I was educated and did my first job in z/OS predecessors but did my first sysgen on DOS/VS R31. During my activities I do run into z/OS and it is a wonderful environment. You can do everything and everything can be avoided and/or forbidden (through real measures), but not so in DOS/VS. Today I am bit-wise active in z/VSE and CICS/TS, and as an application person (with enough background to be dangerous) in z/OS. 

First Encounter

With the announcement of the z196 I stumbled over the CPU-Measurement Facility (CPU-MF). I was delighted to read that it is available on the z10 as well. The various counters did not really interest me, but the sampling was something for which I could imagine a very rewarding use case. If you want details of what is collected, here is a picture with mouseover explanations. But I read that it is only available for native operating systems. Since most of the z/VSE shops do run with z/VM on top, there was no point for me to investigate further. 

During a presentation at the end of June 2012, I was alerted by a single sentence that read: 
New z/VM CPU MF COUNTERS support – APAR VM64961
I thought to myself: “Now this is good news. Let's dive into the details”. 

Preliminary checks

I Googled for it and found a few presentations, the book that documents the instructions and structures used for it, and some bits and pieces about it; I must also not forget documentation regarding Hardware Instrumentation Services (HIS) for z/OS (thanks to "E."). Here are my findings:
  • For sampling, it needs a setting in the SE/HWC
  • For VM it needs the above PTF
  • z/OS has HIS that explores it and makes lots of extra information  available
  • z/HISR from Phoenix has software that uses the z/OS HIS data.
  • The CPU-MF "POP-equivalent" manual, Load-Program-Parameter and the Set-Program-Parameter and CPU-Measurement Facilities (IBM manual SA23-2260-02) says that this facility is a prerequisite for the CPU-measurement sampling facility.
  • The structures for sampling are very clever and suitable for any number of samples
  • There is even provision for emptying buffers while sampling is active  
  • There is even extra space for extension (as it was when TRAP–ping was introduced, shortly before 64-bit was introduced).

Setup in z/VSE / Problems

Nothing is a show-stopper, so I decided to go for it and write a POC with only 1–minute sampling at every 2 milliseconds (just one SDB (sampling data buffer) at 1MB), and here is what happened…

(Please do not forget- this is all occuring in z/VSE 5.1, which is the first version of VSE that supports 64-bits virtual.)
  • My hardware does not have it. Okay, there are customers with a z10/z196/z114 that willing to help but the switching on via the SE/HWC is not an everyday activity. So here we have a delay.
  • It needs IARV64 – my first chance to use it – one more reason to go for it.
  • For debugging I would like to display what I “prepared for the HW”/”got back from HW”. 
  • Ah…well, SDUMPX does support areas above the bar–but how do I get an address space STOKEN. I know what it is but heck – should I really go for control blocks to get it? Thanks to "I.", who showed me an undocumented operand of ALESERV. So now we can display these areas even without the famous 0C3 or my debugger (TRAPPER, which does understand 64-bit storage).(Ray adds: I know of this operand. It can also be used to access other partitions via access registers.)
  • IARV64 REQUEST=PAGEFIX,RANGLIST=RLIST_PTR,NUMRANGE=1 abends with a ABEND 2C5 reason 0598 which reads: “0598 IARV64: Parameter list or range list must be in 31-bit storage”. How can I guarantee 31 bit storage? I can do 24, below the bar, and any boundary- but 31? With the help of "I." it was identified as typo (meaning “not 64”) and a coding-error on my side.
  • I have yet to find a place where my program needs to use LPP. I am not part of the operating system, and have no way to transport (and complement later) any specific information via these 8 bytes (as opposed to z/VM or z/OS which certainly do utilize the 3 fields (basic, logical, virtual) and would certainly produce confusing results with a program injecting its own 8 bytes).


I coded it, commented out the QSI and LSCTL instructions, created entries as I expect the HW to do it, tested it, and removed the testcode…and then I learned that I have not read the fine print of the PTF (thanks to "E.", saving me hours of frustration in front of a z196) . Here is the fine print: 
Support for the CPU-Measurement Sampling Facility and ... interfaces for guest use are not provided.

So I can put this to a rest and wait. Maybe one day support for guests will show up.


Next time I get a chance, I will write about CICS/TS and coding a simple TRUE and a GLUE.

25 July 2012

LOCTR - very powerful, and makes it very easy to shoot yourself in the foot

I promise that I will get Martin's blog up in a couple of days, and I hope that part 3 of "TCBs, SRBs, and FRRs, oh my!" will be posted in a couple of weeks (shifting work priorties), but I thought I'd throw this little tidbit out there for you.

I am updating our standard entry/exit macros to support AMODE 64, and support various types of AMODEs and ASCs on entry. As a part of this, I've had to add some more LOCTRs to the definition of our work areas. But let me demonstrate this little trap for you:

DW10     DS    D
DW20     DS    D
DW11     DS    D
DW21     DS    D
DW13     DS    D
* All done!

         LHI   R0,WORKLEN

What value is loaded into R0 by the LHI instruction, and how many bytes of storage will be obtained? Raise your hand. How many say 40? OK, a couple. How many say 24? Aha, those are the ones who have worked with LOCTR and have been bit by this little gotcha.

Those of you who said 40, can you figure it why it's 24? Yep, that's right. I'll go ahead and redo the above with location counters, so the reason becomes clear:

00000000 WORK     DSECT ,

00000000 WORKLOC1 LOCTR ,
00000000 DW10     DS    D
00000018 WORKLOC2 LOCTR ,
00000018 DW20     DS    D
00000008 WORKLOC1 LOCTR ,
00000008 DW11     DS    D
00000020 WORKLOC2 LOCTR ,
00000020 DW21     DS    D
00000010 WORKLOC1 LOCTR ,
00000010 DW13     DS    D
* All done!
00000018 WORKLEN  EQU   *-WORK

The EQU is in WORKLOC1, and it correctly calculates the length of the WORKLOC1 LOCTR area. There are two ways to correct this: 1) insert a WORKLOC2 LOCTR before the equate; 2) my preferred solution, which is to introduce a new LOCTR (call it WORK_END) before the EQU. In either case,  WORKLEN will now be the actual length of the WORK DSECT, 40 bytes.

LOCTR is very powerful, as it allows you to define items in macros but yet segregate; such as putting required constants in a CONSTANT LOCTR area, out of the instruction pipeline. But, like any powerful tool in the system developer's arsenal, it must be used with great caution, and always take a sanity look at the assembler listing. The hour of debugging you save may be your own!

And a few more things...

  1. Please try to avoid hard-coded offsets. It may have made sense to the person who originally wrote this code 20 years ago, but when someone new comes along, having nice, identifiable labels and DSECTs will make that person's life easier when trying to decipher your control block chaining. The time you spend coding the DSECT and labels now may preserve someone's sanity down the road.
  2. When you do switch to DSECTs, remember to put the right register in the USING statement. (That's more of a *headdesk* moment for me.)
  3. Don't use PRINT OFF unless you really need it. Nothing ticks me off more that looking at a listing file and seeing that the entry macro has issued PRINT OFF, and nothing follows.

24 July 2012

Es freut mich sehr, euch Martin Trübner, unseren Gastblogger, vorzustellen

(That's enough German for today.)

Martin's blog entry will be published this week, but first I thought I would give you a little bit of information about him. 

Martin is a freelance technical software developer, with deep system roots in z/VSE, and he has carried over his vast CICS applications knowledge over into z/OS. He has also written some software for Windows platforms. He was born in 1953 and met his first bit (an EBCDIC one) in 1972. He speaks German, English, HLASM, Cobol, PLI, REXX and some lesser known dialects like Pascal, BASIC, and Perl (which, if you didn't know, is really a town near the Luxembourgish/French/German border, located near Borg and Apach(e))Before becoming a freelancer he worked for various shops including applications, central development, a vendor of system programs, and for a PCM (Plug-Compatible Manufacturer, a producer of hardware that supported the IBM System/370 and 390 architecture)

I met Martin during my days in Software AG R&D. I was a developer for Entire System Server, working on the two supported IBM platforms (the third being BS2000/OSD). I don't remember all the details, but I know VSE/ESA was involved. He and I became good friends, and we finally met when I travelled to Germany in 2003, and we got together again in 2005. (From that last visit, I seem to recall lots of good beer and yelling back in unison "Wir wissen!" at the GPS' repetitive "links fahren" as he drove me back to the hotel.)

Martin is one of a small group of fellow mainframe developers whose opinions and advice I wholeheartedly trust. I am glad he asked me if he could be a guest blogger, and as I stated in the title, es freut mich sehr.

Welcome aboard, Martin!

(Und vielen dank, Martin, for making my title more German-sounding. I was pretty close with the original, just a couple minor transgressions, like forgetting that vor- is a separable prefix, and the wrong case for "unser".)

20 July 2012

Guest blogger coming soon – and an open invite to others

So, after a few blog entries, it looks like that this blog is going to be a lot about the development and debugging experiences when writing system tools software, as well as the toils and tribulations of working with computers on what should be simple tasks (like updating Windows) from an old-school mainframe perspective. 

(GUIs? Ha! We don't need no stinkin' GUIs! When I was your age we had switches on the panel and blinking lights, and we liked it!)

A good friend of mine - or should I say ein gut Freund - Martin Trübner asked if he could contribute his experiences working with one of those other IBM mainframe operating systems, z/VSE. I said sure! I do miss working with z/VSE; it's been about 8 years since I worked with it on a regular basis. It requires a bit more hands-on management from a systems programming perspective, which is good, as it keeps the brain engaged. I wouldn't mind getting a chance to dip my fingers back in the platform.

Look for his first guest blog post in the next few days.

I also am extending an open invitation to all mainframe software developers to guest blog. You've seen what I've done so far. If you want to share your thought process on designing or debugging of your daily routine (nothing proprietary, of course!), please send me an email, drop me a note on FB, G+, LinkedIn, XING, or Twitter, or send smoke signals in the Sacramento vicinity, and please make sure they do not look like the recent spate of grass fires we've had.

16 July 2012

TCBs and SRBs and FRRs, oh my! - part 2

Debugging the SRB and associated FRR

UPDATE: See the end for a nice piece of helpful information that can remove hours, gray hairs, and wrinkles from the debugging process!

Debugging an SRB and an associated FRR can be a real PITA. You can't get a traditional SYSUDUMP/SYSABEND; the only thing you can do is try to get an SVC dump, as they are known. These are binary dumps of storage, similar to SYSMDUMP, and have to be processed with IPCS (or AMDPRDMP in the old days).

So, the first problem I ran into was that the SRB was abending with an S0C4. No problem, look at the FRR…but there's no FRR WTO output. There's also no WTO output from the SRB, even the one near the very beginning. Hmmm…so I do what all programmers do – I start limiting the code. I reduce my SRB to just the WTO. Now it shows up. I eye suspiciously my accessing of the product CVT via GPR1, which is the PARM= parameter of the IEAMSCHD macro. Oops. Apparently, I had a bout of fuzzy logic. I was interpreting GPR1 as being a pointer to the fullword specified in PARM=. Nope. GPR1 has the value specified in the area pointed to by PARM=. So from the example from part 1, GPR1 will have the value of GPR10, which was stored in SRBEPARM, which is then pointed to by the PARM=. So I fixed that, and I got the WTO. 

Now I want to drive the FRR, so I added a good old DC H'0' to force an S0C1. Instead, I got an S0C4. I also noticed garbage in the job name/job number fields for the WTO in the hardcopy log. These fields are obtained from the product CVT, so I realized that the product CVT might either be zero or initialized or otherwise garbage. I changed the code so that if the product CVT pointer was invalid, then use a eye-catching value for the job name and job number. That helped, but then I was still not getting the WTOs with the PSW and other information. I then noticed that a system dump had been produced from the WTO. I saved that off, but then I realized that I needed to make sure that the PARM= was actually the product CVT. I added a WTO to print the contents of the area and return, and, sure enough. it was fine. But I was still getting an S0C4 and my FRR was not firing…why was this happening? Then I thought that it must be occurring upon return from the FRR. But after a few more tests, I noticed that the FRR WTO was firing randomly.

During this time, I was involved in an IM conversation with my friend "R",  discussing my frustrations. He reminded me that WTOs are not synchronous by default, and in an SRB this can lead to timing issues. I added SYNCH=YES, and I also added a parameter to do testing inside my address space, not the external address space, as this will help reduce issues and exposure to screwing up the system. And then begins the part of debugging I hate the most – things suddenly start working.

I ran with an option that sets the FRR to be the wonderful z/XDC product from ColeSoft. And it hits my forced S0C1. That's good. I set it back to my FRR. It doesn't hit. I then begin to notice that the LPAR is having issues, and it looks like we will need to IPL. Time to quit for the weekend, the highlight of which was seeing the blues band that two of my high school classmates are in playing in the East Bay. Another classmate, who was on vacation, sat in as well. (All three, along with a fourth classmate, had had their own band during our high school era.)

Just another manic Monday…wish it were Sunday…

So, after a weekend of not looking at this, back to figuring out what's going on. But on Sunday, The Shower Principle hit again; I realized that by scheduling the SRB for testing purposes in the same address space, I can use GTF. Unfortunately, a SLIP Instruction Fetch trap has a limitation of 1M on the RANGE= keyword, so I have to think about to handle this.

Meanwhile, when I looked again at the S0C4 I had on Friday, I realized that it's not dying in the FRR, but on a QSAM (Queued Sequential Access Method) PUT. Since it doesn't occur when z/XDC is in the way, this is when GTF becomes useful. I moved to our sandbox LPAR so if my testing is messing with the LPAR, I will only shoot myself. Working with SRBs is like working with fire - you need to be very careful, because if you make a mistake, you can burn not only yourself, but the system. (And, as it turns out, I was. A wayward store in *MASTER* or in other common storage can have far reaching negative consequences.) Therefore, you need to be testing these types of routines on a sandbox system so you burn only yourself, and in case you need to IPL, you can quickly do so.

So I started GTF and reran the job…and of course, because this is how it's going, it sort of works. It finished with CC 16. That may prove useful, though, and I have output. Oh, wait, I forgot that the LPAR was IPLed on Friday night, so there is no APF authorization. I issue the appropriate APF commands, running a convenient batch job that issues operator commands that I run after an IPL. And the job completes successfully. The FRR fires, but the return information from the IEAMSCHD shows an S0C4. Remember, I have a forced S0C1 to drive the FRR, so the FRR itself must be abending with the S0C4.

Now I download it to my PC (remember, binary!) because I have a Perl script which parses a GTF trace data set and creates a CSV file that can be loaded into Excel for easier perusal and analysis. (I may make it available for download on my web site.) Two things cropped up, though; 1) MATCHLIM was too small; 2) HSM reallocated my TRACE data set and I only collected about 2500 records. So I fix those and try again, more to prove that my GTF parameters work, and they do.

Now how do I trace the GTF? Listings show that the ESQA allocation address is different every time, so I can't really guarantee a range. (ESQA allocation is volatile, so it's not surprising, really.) z/XDC can show me where it's allocated, so I make a test run under z/XDC but without using it as the SRB FRR. For various reasons which I won't go into, this doesn't work. And I'm beginning to think that maybe the S0C4 in the FRR isn't always triggering, and other times random storage is being overlaid. But it appears that I cannot reliably get a GTF trace of the SRB and FRR.

Since I'm dealing with something that is apparently random, I started looking at the FRR code after the two WTOs that I'm seeing. And, oh, boy. I made a horrible coding mistake. I can lay it at the foot of "it's been a while since I worked with SRBs and FRRs," but, still, I should not have made this.

What did I do that is worthy of a *headdesk*? The coding error was in the SETRP macro. If you do not specify WKAREA=, it defaults to assuming that GPR1 points to the SDWA. And which register is used by the WTO macro for the service's parameter list? Yep, you guessed it. And that is why it was purely random, as the address of the SRB code and data area moved around consistently.

Unfortunately, there also seems to be some sort of conflict with z/XDC and my SRB and FRR, so I'm going to take a suggestion that my friend "C" gave me during a conversation last week. Instead of obtaining ESQA and moving the code into the storage, I can split the SRB and FRR into its own load module. I can then use a LOAD GLOBAL=YES,EOM=YES and I then should be able to use GTF to trace any activity there. But there is a question of how I can set up the SLIP trap, because the only way to specify modules are LPA or private. LOAD GLOBAL loads them into CSA, which can be addressed by RANGE. But I don't know where it will be loaded until after the LOAD has executed, which makes it rather difficult to set up the SLIP, unless I do something like coding a WTOR with the address in it. I'd rather not do this, but it may come to that. But back to the head-scratching…

I'm beginning to suspect that the parameter coming in via GPR2 to the FRR isn't quite what I expect it to be. I added a WTO that writes out the value of GPR2 coming in, and my first run looks suspicious – in fact, it's X'0000C00'. That looks rather strange, so I run it again. It's the same. I check with a couple of people…and yes, that is a legitimate address. The FRR stack is located between X'800' and X'1000'. So something else is happening; maybe the SRB is building this 24-byte area wrong.

I added WTOs dumping the 24 bytes pointed to by GPR2. (Note that SDWAPARM also has this address.) And this shows what I've done. Somewhere along the way, the instruction to store the address of the product CVT has disappeared from the SRB, so the FRR was loading 0 instead. Big oops. Big, big oops. And this may explain a lot of the randomness as well.

So I fix that and rerun, and now the 24-byte area looks correct. But things still don't seem to be working right. I add a WTO here and there, and I realize that I've miscoded an NI instruction to turn off a recursion-prevention flag, forgetting the 255– part; thus it thought it was recursing. I fixed that and the FRR is working for the code I have there. I now add the full FRR code (dump the register/PSW info out of the SDWA), but something is still breaking. I narrow it down to a load of another part of the FRR parm area. Now I haven't purposely munched GPR2, but I add some WTOs to see if maybe branch entry WTO is changing it (the documentation says it does not). Nope, that's not it. 

I start an IM session with "R"; I can't understand why it fails if I include an instruction and works if I skip it. Finally, I move the code up to the very beginning, and suddenly my WTOs become garbage. I then look and I realize that I am munching my constant area pointer with one of these loads. This was left over from when I started working on this code and my register settings were in flux. I change the register and voilà, it works.

*headdesk* *headdesk* *headdesk*

Part of what made this so hard is that I was so familiar with the code, it looked right. I'd lost track of the fact that the original use of that register had changed. This is why when you are really at a loss, a second set of eyes can help immensely. The second set is not familiar with the code, and they can spot anomalies pretty quickly. Back in university, I became known for the person who looked at the code and could spot the issue quickly, and it was because I was the second set of eyes. I did pass this along to others, though, and gradually people picked up on this. Never be afraid to ask others to look at your code, and never be embarrassed if they find something really, really stupid. You just didn't see it, because your brain interpreted the code as you thought you wrote it, not how you did write it.

But, after all this, I have a working FRR routine. Now I can fully test the SRB code, which is connecting an ETX…and it works perfectly the first time out. (This is always suspicious, but it worked when it was not a part of the SRB, so I suspect it is fine.) Now I will need to add the destroy ETX code. Once this is working, I will pass this version onwards so my colleague can begin testing, and then I will code the ESTAEX, which is the subject of part 3.  I will then make some enhancements, including the aforementioned LOAD GLOBAL for the SRB, and also adding LOAD GLOBAL for the PC code.

So, a post-mortem on this pretty much lays it on three things which had overwriting consequences: 

  1. incorrectly interpreting the value in R1 when the SRB gets control;
  2. forgetting the WKAREA= on the SETRP macro; and
  3. wiping out my constant base register.
Oh, one piece of advice I received from "C" that I should not neglect: if you do move the FRR directly into SQA, put it on a doubleword boundary.

Part three, detailing adventures in ESTAEXing, will probably come later this week or early next.

UPDATE: Jim Mulder, who is one of the z/OS people in Pok, reminded me that normally when an FRR abends, a record is written to LOGREC. You can print this off and it will give you lots of information. I know this would have saved me several hours of teeth-gnashing.

13 July 2012

TCBs and SRBs and FRRs, oh my! (with ESTAEXs thrown in) - part 1

I'm writing this blog entry to help clear my head. Sometimes, just talking about a problem helps you find the solution to something that has been vexing you, or otherwise making your brain dizzy. It's akin to The Shower Principle, which I have subscribed to for decades.

The past few days I've been working on some interesting code, which involves scheduling an SRB (Service Request Block) in another address space. The work this SRB is doing is straight-forward, but it's the type of code that during development is pretty prone to abends. So, to properly protect it, and to help debug (if necessary), I'm also creating an FRR (Functional Recovery Routine). It's a pretty simple FRR, just WTOing the PSW (no, you don't get an explanation of that), interrupt code, instruction length code, and registers, and some additional identifying information about which SRB code is involved and what it was doing, then issues an SDUMPX. But this identifying information is where the rub begins. (And, yes, that is sort of an FRR retry pun.)
("Abend" here also includes program interrupts, your traditional S0C1s, S0C4s, etc., i.e., abends not triggered by SVC 13.) 

There are three sources of identifying information in this code: the scheduling TCB (Task Control Block), the product CVT (Communications Vector Table) (chained off our allocated vendor CVT entry), and the SRB code itself. IBM, in its infinite wisdom 40 years ago, limits the amount of memory passed around. For example, the SRB gets only one fullword parameter, whose address is passed via GPR1. An FRR receives a 24-byte parameter area whose address in this case comes from the IEAMSCHD macro, as under the covers it is issuing the SETFRR macro where you normally specify this area. The address of this FRR parm is also in GPR2 when the SRB gets control, assuming you put FRR=YES on the IEAMSCHD or SCHEDULE macro invocation. (If not, good luck with that.)

So, at a minimum, the list of addresses must have the address of the product CVT and the address of the SRB data area, and/or the SRB code itself (as I write this, the SRB is not reentrant, and does not obtain a reentrant work area; this may change). 

So now let's list what data my FRR desires, and where I can get it from...

Piece of dataFrom where?
PSW, registers at abendSDWA (System Diagnostic Work Area), passed via GPR1
Job name/numberProduct CVT (long architectural story)
SRB CSECT in controlSRB code (either grabbed from there or moved into another data area)
Tracing point in SRB executionSRB data area, or...?

The next question is how do I get the product CVT address and the SRB code address to the FRR. The answer is obvious, of course–use the FRR parm, as this info only takes up 8 bytes. But how do I populate the FRR parm? The scheduling TCB cannot do this when using IEAMSCHD (it does with SCHEDULE, but as it requires much more setup, I am not using it), so the SRB has to populate it. And how do we get data to the SRB? Only through that one fullword parameter.  And in this case, this is fine, because we only need the product CVT address; the SRB-specific data can be filled in by the SRB itself as the code starts executing.

There's an additional possible complication that I have alluded to, but not brought up yet. Remember when I said the SRB is being scheduled in another address space (through ENV=STOKEN)? That means that the parameter passed to the SRB has to be located in common storage, either CSA (Common System Area) or SQA (System Queue Area), or their extended (i.e., above the line) counterparts. (It cannot be 64-bit storage, but if you can put a 64-bit pointer in the area pointed to by the parm; it has to be a common object (IARV64 REQUEST=GETCOMMON), though.) But, as it turns out, this isn't an issue, The product CVT area is in ECSA already, since it needs to be accessible by any address space. And since there is nothing required (at this time) from the scheduling TCB, passing the product CVT to the SRB is sufficient.

Next is the tracing point. This will be presented in both WTOs and the SDUMPX-generated dump header to note which system call or other event was (going to be) executed when things went 'splodey, set by a simple MVC. The best place is for this to be moved directly into the FRR parm. So far, I'm only using 8 bytes of it (product CVT address and SRB code address), so using 8 bytes is not that bad. If it turns out I need more bytes, I can move it into the SRB data area. And, as it turns out, I like to be prepared, so we'll address it via the SRB code pointer, rather than using 8 bytes.

(Remember, my SRB is technically not reentrant, with the data area following the code area. If I need to make this reentrant some time down the road, then I can just store the reentrant work area address in the FRR parm area. I will note that FRRs are invoked as reentrant, as GPR0 contains the address of a 304-byte work area that the FRR can use as it pleases. In my case, I'm using it for editing and for MF=E areas.)

So, here's my final FRR input design:
  • PARM= on IEAMSCHD points to a fullword that contains the product CVT address;
  • I will use 2 fullwords of the FRR parm, one being the product CVT address, the other being the SRB code address;
  • Between the two, I can get all the data I want.
A note about one confusing thing about some of the keyword parameters on the IEAMSCHD macro: many are actually labels which are fullwords that contain the address of the item you want passed, for example, the all-important PARM and the various SYNCH...ADDR ones.  This means that you code something like:

         LA    R0,SYNCCMP
         ST    R0,SYNCCMP@

         ST    R10,SRBEPARM
Once the FRR gets control and does its thing (WTOing and SDUMPXing), there are two possibilities–retry or percolate. Retry means that we specify an address of where to reestablish control in the code that abended. In this case, it does not make sense to retry in the SRB, because everything needs to execute successfully. If it doesn't, we will just stop and let the scheduling TCB know. So we will use SETRP RC=0, which indicates percolation, i.e., pass the abend along to whatever recovery routine precedes this one. 

And thus we are done with the FRR design. And, so far, writing this helped me consolidate a few things, and made some simpler implementations more obvious. Never underestimate the power of writing it down, or even bouncing it off a friend, or even to yourself in the shower. Just the act of thinking how you phrase something to communicate it to someone else can lead you to the realization of what is causing that bug you've looked at for a day. 

I will continue with this theme in part 2, along with debugging the SRB and FRR. That has its own set of issues that, if you haven't dealt with them before, or even in a few years, can vex you mightily.