- Login to the workshop machine
Workshops differ in how this is done. The instructor will go over this
beforehand.
- Copy the example files
- In your home directory, create a subdirectory for the POE example codes
and then cd to it.
mkdir ~/poe
cd ~/poe
- Copy either the Fortran or the C version of the exercise files to your
poe subdirectory:
C: |
cp /usr/local/spclass/blaise/poe/samples/C/* ~/poe |
Fortran: |
cp /usr/local/spclass/blaise/poe/samples/Fortran/* ~/poe
|
- List the contents of your poe subdirectory
You should have the following files:
- Understand your system configuration
- Display the pool configuration for the workshop machine:
js
Questions:
- Which pool number has been configured?
- What is the name of the pool?
- How many nodes are in the pool?
- What are the names of the nodes in the pool?
Click
to confirm the workshop pool and node names. You'll need to know this for
later.
- Try the following commands, which display information about
running jobs also:
ju
spjstat
Note: since the workshop machine is reserved for the class, you probably
won't see much. On a production machine, such as white or frost, you would
see more.
- Authentication
- LLNL has already taken care of this step for you...you need to do
nothing.
- You can verify (if you want) that LLNL has authorized you to use
these nodes.
Check the /etc/hosts.equiv file. It should contain
the names of all nodes in the system.
| Note that not all SP sites use this method for authentication.
|
- Compile the poe_hello program
Depending upon your language preference, use one of the IBM parallel
compilers to compile the poe_hello program.
C: |
mpcc -o poe_hello poe_hello.c |
Fortran: |
mpxlf -o poe_hello poe_hello.f |
- Setup your POE environment
In this step you'll set a few POE environment variables. Specifically,
those which answer the three questions:
- How many tasks/nodes do I need;
- How will nodes be allocated?
- How will communications be conducted (protocol and network)?
Depending upon your shell, set the following environment variables as shown:
Environment Variable |
Setting |
Description |
MP_PROCS
| 4
| Request 4 MPI tasks (processes)
|
MP_RESD
| yes
| Non-specific allocation (let the Resource Manager decide which nodes to use)
|
MP_RMPOOL
| 0
| This is the node pool number. Set it to the number zero.
|
MP_EUIDEVICE
| css0
| Use the SP switch network interface
|
MP_EUILIB
| us
| User Space protocol
|
- Run your poe_hello executable
- This is the simple part. Just issue the command:
poe_hello
- Provided that everything is working and setup correctly, you should
receive output that looks something like below (your node names may
vary, of course).
0:Total number of tasks = 4
0:Hello! From task 0 on host berg05.pacific.llnl.gov
1:Hello! From task 1 on host berg06.pacific.llnl.gov
2:Hello! From task 2 on host berg07.pacific.llnl.gov
3:Hello! From task 3 on host berg08.pacific.llnl.gov
- Maximize your use of all 4 cpus on a node
The previous step was the most "wasteful" way to run a POE program, since by
default, POE will load only one task on a node. To make better use of the SMP nodes, try the following:
- Run four poe_hello tasks on each of 2 nodes. Three different
ways to do this are shown below, all of which use
command line flags. The corresponding environment variables could be
used instead. See the
POE man page for details.
Method 1: Specify POE flags for number of nodes and number of tasks:
poe_hello -nodes 2 -procs 8
Method 2: Specify POE flags for number of tasks per node and
and number of tasks:
poe_hello -tasks_per_node 4 -procs 8
Method 3: Specify POE flags for number of nodes and and number of
tasks per node:
unsetenv MP_PROCS
poe_hello -nodes 2 -tasks_per_node 4
- Try the poe_bandwidth exercise code
- Depending upon your language preference, compile the poe_bandwidth
source file as shown:
C: |
mpcc -o poe_bandwidth poe_bandwidth.c |
Fortran: |
mpxlf -o poe_bandwidth poe_bandwidth.f |
- Change a couple environment variables:
setenv MP_PROCS 2
setenv MP_EUILIB ip
- Run the executable:
poe_bandwidth
As the program runs, it will display the effective communications bandwidth
between two nodes using Internet protocol (ip) over the SP switch.
Sample output from poe_bandwidth
using IP communications
|
0:
0:****** MPI/POE Bandwidth Test ******
0:Message start size= 100000 bytes
0:Message finish size= 1000000 bytes
0:Incremented by 100000 bytes per iteration
0:Roundtrips per iteration= 10
0:Task 0 running on: berg05.pacific.llnl.gov
0:Task 1 running on: berg06.pacific.llnl.gov
0:
0:Message Size Bandwidth (bytes/sec)
0: 100000 131277120
0: 200000 160914005
0: 300000 155605856
0: 400000 159537625
0: 500000 169940602
0: 600000 192885165
0: 700000 193172080
0: 800000 222912305
0: 900000 223830179
0: 1000000 224911334
|
- Now, try running the executable again, but this time use a command line
flag to specify User Space communications protocol.
Note that using the command line flag insures this by overriding
whatever the MP_EUILIB environment variable is set to.
poe_bandwidth -euilib us
Note: It is very possible that when
you try this step, you will get one of the error messages that look
something like:
ERROR: 0031-124 Less than XX nodes available from pool N
- or -
ERROR: 0031-365 LoadLeveler unable to run job, reason:
LoadL_negotiator: 2544-870 Step blue199.pacific.llnl.gov.11575.0 was not
considered to be run in this scheduling cycle due to its relatively low
priority or because there are not enough free resources.
|
This is because there may be others in the workshop using
nodes in User Space mode at the same time as you. Recall that only one
user at a time may run US tasks on a node. If you get this error
message, just try running again in a few seconds/minutes.
- Notice the output. You should see a significant increase in bandwidth.
Sample output from poe_bandwidth
using US communications
|
0:
0:****** MPI/POE Bandwidth Test ******
0:Message start size= 100000 bytes
0:Message finish size= 1000000 bytes
0:Incremented by 100000 bytes per iteration
0:Roundtrips per iteration= 10
0:Task 0 running on: berg05.pacific.llnl.gov
0:Task 1 running on: berg06.pacific.llnl.gov
0:
0:Message Size Bandwidth (bytes/sec)
0: 100000 393351214
0: 200000 437020474
0: 300000 450266125
0: 400000 456386278
0: 500000 480783135
0: 600000 991952069
0: 700000 985073913
0: 800000 985199935
0: 900000 983501016
0: 1000000 968929957
|
- Determine per-task communication bandwidth behavior
In this exercise, pairs of tasks, located on two different nodes,
will communicate with each other.
- First, make sure that the User Space protocol is used for communications:
setenv MP_EUILIB us
- Compile the code:
C: |
mpcc -o smp_bandwidth smp_bandwidth.c |
Fortran: |
mpxlf -o smp_bandwidth smp_bandwidth.f |
- Then use the smp_bandwidth code to determine per-task bandwidth
characteristics on an smp node:
smp_bandwidth -nodes 2 -procs 2
smp_bandwidth -nodes 2 -procs 4
smp_bandwidth -nodes 2 -procs 8
smp_bandwidth -nodes 2 -procs 16
What happens to the per-task bandwidth as the number of tasks increase?
- Optimize intra-node communication bandwidth
When all of the task communications occur "on-node", it is
possible to optimize the effective per-task bandwidth by utilizing
shared memory instead of the network.
- First use shared memory and note the per-task bandwidth:
setenv MP_SHARED_MEMORY yes
smp_bandwidth -nodes 1 -procs 4
smp_bandwidth -nodes 1 -procs 8
- Now try it without shared memory (using the network):
setenv MP_SHARED_MEMORY no
smp_bandwidth -nodes 1 -procs 4
smp_bandwidth -nodes 1 -procs 8
What differences do you notice?
- Try using POE's Multiple Program Multiple Data (MPMD) mode
POE allows you to load and run different executables on different nodes.
This is controlled by the MP_PGMMODEL environment variable.
- First, set some environment variables:
Environment Variable |
Setting |
Description |
MP_PGMMODEL
mpmd
Specify MPMD mode
|
MP_PROCS
| 4
Use 4 tasks again
|
MP_NODES
| 1
Use one node for all four tasks
|
MP_STDOUTMODE
| ordered
Sort the output by task
| | | | | |
- Then, simply issue the poe command.
- After a moment, you will be prompted to enter your executables one
at a time. Notice that the machine name where the executable will
run is displayed as part of the prompt.
In any order you choose, enter these four program names, one per prompt:
prog1 prog2 prog3 prog4
Note: these four programs are just simple shell scripts used to
demonstrate how to use the MPMD programming model.
- After the last program name is entered, POE will run all four executables.
Observe their different outputs.
- Try specific node allocation using a host list file
Generally speaking, there aren't many cases where you'll need to "manually"
select which nodes should be used to run your POE job. This step will
demonstrate how to do it though, should you ever have the need.
- First, use your favorite UNIX editor and create a file in your POE
executables directory. Call it
hostfile. As its contents, enter 4 different node names from
the workshop node pool - one node name per line.
Click to
see the nodes in the workshop node pool.
- Set the appropriate POE environment variables which specify specific
node allocation:
Environment Variable |
Setting |
Description |
MP_RESD
| no
| Turn off selection by the Resource Manager - just to be sure
|
MP_HOSTFILE
| hostfile
| Specify the host file you created
|
MP_SAVEHOSTFILE
| hosts_used
| Save the names of the hosts used to run your program
|
MP_EUILIB
| ip
| Required protocol for specific node allocation
|
MP_PGMMODEL
| spmd
| Reset from mpmd used in the previous step
|
- Run the poe_hello executable again and observe the output. Does
it match what you specified in your hostlist file?
- Check your hosts_used file, which was created when your
program ran. Do the names match the ones specified by your hostlist file?
- Review relevant LC documentation (or at least know where to find it):
- LC Home Page
- especially note the Machine Configurations link in the
Machine Info section. Try the OCF Machine Status link also.
- news job.limits command
- news job.lim.blue command
- news job.lim.frost command
- Review the /etc/environment file
- ASCI Blue web
pages - see the "Running Jobs" section
- ASCI White web
pages - see the "Running Jobs" section