Porting Cray shmem-codes to IBM Regatta using TurboSHMEM

The Research Center Jülich is acquiring a new IBM Regatta supercomputer (in the beginning called JUPP, now JUMP) as a replacement for the aging Cray T3Es. Many users of the T3E have preferred the shmem library over MPI in order to gain speed or because of shmem's simplicity. Porting shmem-codes to MPI can be a lot of work, but on JUPP a porting tool is available: TurboSHMEM.

TurboSHMEM emulates the shmem-library on JUPP. Porting from Cray shmem to TurboSHMEM is easy and straightforward:

Ideally, this should suffice to port your program from Cray to IBM. Of course, you should also check the results, i. e. run the program with exactly the same parameters on Cray and IBM: the results should be the same (of course, if you use floating point, you can run into rounding errors).

Supplying 32 bit parameters to shmem-routines

Because TurboSHMEM is based on MPI, and the IBM Power machines have a 32 bit history, TurboSHMEM has some problems with 64 bit integers: it can shovel around 64 bit integers, but the ancillary integer parameters for shmem calls have to be of kind=4. As an example:

include 'shmem.fh'
integer*4 pe_me, pe_other, npes, len
integer*8 mydata(100)

call shmem_init()
npes = shmem_n_pes()
pe_me = shmem_my_pe()
pe_other = 1-pe_me
len = 50
call shmem_put8(mydata(1), mydata(51), len, pe_other)
call shmem_finalize()

end

As you can see, you can transmit 64 bit data, but length and PE number arguments to shmem-routines have to be integer*4. You can keep portability: as integer is by default 64 bit on Cray and 32 bit on IBM, you can just declare integer, and the compiler will choose the right kind. But if you have hand coded integer*8 for ancillary parameters, you have to change that by hand.

Additionally note, when you transmit 64 bit data, call explicitly shmem_put8() instead of shmem_put(), to be on the safe side (and to know exactly what you are doing).

Preparing your environment

Put this into your .profile:

if [ -r /home/admin/beta/etc/initmodules.sh ]
then
  . /home/admin/beta/etc/initmodules.sh
fi
module load TurboSHMEM
export MP_EUILIB=ip
export MP_RESD=no
export LAPI_USE_SHM=only
export MP_MSG_API=mpi,lapi
export MP_SHARED_MEMORY=yes

Compiling your programs

mpxlf90_r -qsuffix=f=f90 -q64 -qwarn64 -O3 -qstrict -qarch=pwr4 -qtune=pwr4 test.f90 -lsmaf -lturbo
or if you use 'old school' Fortran 77:
mpxlf_r -q64 -qwarn64 -O3 -qstrict -qarch=pwr4 -qtune=pwr4 test.f90 -lsmaf -lturbo

You have to use the MPI/reentrant version of the compiler. libsmaf.a is the TurboSHMEM-library, libturbo.a the TurboMPI-library, which speeds up TurboSHMEM.

Running your programs

llrun -p2 a.out

Voilà!