IBM SP Systems Overview Exercise


  1. Login to the workshop machine

    Workshops differ in how this is done. The instructor will go over this beforehand.

  2. Copy the example files

    1. In your home directory, create a subdirectory for the POE example codes and then cd to it.

      mkdir ~/poe
      cd ~/poe

    2. Copy either the Fortran or the C version of the exercise files to your poe subdirectory:

      C: cp /usr/local/spclass/blaise/poe/samples/C/*    ~/poe
      Fortran: cp /usr/local/spclass/blaise/poe/samples/Fortran/*    ~/poe

  3. List the contents of your poe subdirectory

    You should have the following files:

    C Files Fortran Files Description
    poe_hello.c poe_hello.f Simple MPI program which prints a task's rank and hostname.
    poe_bandwidth.c poe_bandwidth.f An MPI communications bandwidth test between two tasks only.
    smp_bandwidth.c smp_bandwidth.f An MPI communications bandwidth test between any even number of tasks.
    prog1
    prog2
    prog3
    prog4
    prog1
    prog2
    prog3
    prog4
    Simple shell scripts used for MPMD mode

  4. Understand your system configuration

    1. Display the pool configuration for the workshop machine:

      js

      Questions:

      • Which pool number has been configured?
      • What is the name of the pool?
      • How many nodes are in the pool?
      • What are the names of the nodes in the pool?

      Click to confirm the workshop pool and node names. You'll need to know this for later.

    2. Try the following commands, which display information about running jobs also:

      ju
      spjstat

      Note: since the workshop machine is reserved for the class, you probably won't see much. On a production machine, such as white or frost, you would see more.

  5. Authentication

    1. LLNL has already taken care of this step for you...you need to do nothing.

    2. You can verify (if you want) that LLNL has authorized you to use these nodes. Check the /etc/hosts.equiv file. It should contain the names of all nodes in the system.

    Note Note that not all SP sites use this method for authentication.

  6. Compile the poe_hello program

    Depending upon your language preference, use one of the IBM parallel compilers to compile the poe_hello program.

    C:
    mpcc -o poe_hello poe_hello.c
    Fortran:
    mpxlf -o poe_hello poe_hello.f 

  7. Setup your POE environment

    In this step you'll set a few POE environment variables. Specifically, those which answer the three questions:

    • How many tasks/nodes do I need;
    • How will nodes be allocated?
    • How will communications be conducted (protocol and network)?

    Depending upon your shell, set the following environment variables as shown:

    Environment Variable Setting Description
    MP_PROCS 4 Request 4 MPI tasks (processes)
    MP_RESD yes Non-specific allocation (let the Resource Manager decide which nodes to use)
    MP_RMPOOL poolid Set poolid to the workshop pool number. Ask the instructor, or click to determine the workshop node pool.
    MP_EUIDEVICE css0 Use the SP switch network interface
    MP_EUILIB us User Space protocol

  8. Run your poe_hello executable

    1. This is the simple part. Just issue the command:

      poe_hello

    2. Provided that everything is working and setup correctly, you should receive output that looks something like below (your node names will vary, of course).
      Total number of tasks = 4
      Hello! From task 1 on host blue281.llnl.gov
      Hello! From task 2 on host blue282.llnl.gov
      Hello! From task 3 on host blue283.llnl.gov
      Hello! From task 0 on host blue284.llnl.gov
      

  9. Maximize your use of all 4 cpus on a node

    The previous step was the most "wasteful" way to run a POE program, since by default, POE will load only one task on a node. To make better use of the SMP nodes, try the following:

    1. Run four poe_hello tasks on each of 2 nodes. Three different ways to do this are shown below, all of which use command line flags. The corresponding environment variables could be used instead. See the POE man page for details.

      Method 1: Specify POE flags for number of nodes and number of tasks:

      poe_hello -nodes 2 -procs 8

      Method 2: Specify POE flags for number of tasks per node and and number of tasks:

      poe_hello -tasks_per_node 4 -procs 8

      Method 3: Specify POE flags for number of nodes and and number of tasks per node:

      unsetenv MP_PROCS
      poe_hello -nodes 2 -tasks_per_node 4

  10. Try the poe_bandwidth exercise code

    1. Depending upon your language preference, compile the poe_bandwidth source file as shown:

      C:
      mpcc -o poe_bandwidth poe_bandwidth.c
      Fortran:
      mpxlf -o poe_bandwidth poe_bandwidth.f 

    2. Change a couple environment variables:

      setenv MP_PROCS 2
      setenv MP_EUILIB ip

    3. Run the executable:

      poe_bandwidth

      As the program runs, it will display the effective communications bandwidth between two nodes using Internet protocol (ip) over the SP switch.

      Sample output from poe_bandwidth using IP communications
      
         0: ****** MPI/POE Bandwidth Test ******
         0: Message start size=   100000 bytes
         0: Message finish size=  1000000 bytes
         0: Incremented by   100000 bytes per iteration
         0: Roundtrips per iteration=  10
         0: Task 0 running on: smurf01.llnl.gov              
         0: Task 1 running on: smurf02.llnl.gov 
         0:
         0: Message Size   Bandwidth (bytes/sec)
         0:    100000         16287971
         0:    200000         22837133
         0:    300000         26188293
         0:    400000         26179723
         0:    500000         27529502
         0:    600000         23452768
         0:    700000         27418902
         0:    800000         27829474
         0:    900000         29754525
         0:   1000000         29817072
      

    4. Now, try running the executable again, but this time use a command line flag to specify User Space communications protocol. Note that using the command line flag insures this by overriding whatever the MP_EUILIB environment variable is set to.

      poe_bandwidth -euilib us

      Note: It is very possible that when you try this step, you will get one of the error messages that look something like:

      ERROR: 0031-124 Less than XX nodes available from pool N

      - or -

      ERROR: 0031-365 LoadLeveler unable to run job, reason:
      LoadL_negotiator: 2544-870 Step blue199.pacific.llnl.gov.11575.0 was not
      considered to be run in this scheduling cycle due to its relatively low
      priority or because there are not enough free resources.

      This is because there may be others in the workshop using nodes in User Space mode at the same time as you. Recall that only one user at a time may run US tasks on a node. If you get this error message, just try running again in a few seconds/minutes.

    5. Notice the output. You should see a significant increase in bandwidth.

      Sample output from poe_bandwidth using US communications
      
         0: ****** MPI/POE Bandwidth Test ****** 
         0: Message start size=  100000
         0: Message finish size=  1000000
         0: Incremented by  100000  bytes per iteration
         0: Roundtrips per iteration=  10
         0: Task 0 running on: smurf01.llnl.gov              
         0: Task 1 running on: smurf02.llnl.gov              
         0:  
         0: Message Size   Bandwidth (bytes/sec)
         0:   100000        55275330
         0:   200000        63483464
         0:   300000        68636231
         0:   400000        72111640
         0:   500000        73518283
         0:   600000        75819661
         0:   700000        76557262
         0:   800000        77621591
         0:   900000        78142147
         0:  1000000        77918613
      

  11. Determine per-task communication bandwidth behavior

    In this exercise, pairs of tasks, located on two different nodes, will communicate with each other.

    1. First, make sure that the User Space protocol is used for communications:

      setenv MP_EUILIB us

    2. Compile the code:

      C:
      mpcc -o smp_bandwidth smp_bandwidth.c 
      Fortran:
      mpxlf -o smp_bandwidth smp_bandwidth.f  

    3. Then use the smp_bandwidth code to determine per-task bandwidth characteristics on an smp node:

      smp_bandwidth -nodes 2 -procs 2
      smp_bandwidth -nodes 2 -procs 4
      smp_bandwidth -nodes 2 -procs 6
      smp_bandwidth -nodes 2 -procs 8

      What happens to the per-task bandwidth as the number of tasks increase?

  12. Optimize intra-node communication bandwidth

    When all of the task communications occur "on-node", it is possible to optimize the effective per-task bandwidth by utilizing shared memory instead of the network.

    1. First use shared memory and note the per-task bandwidth:

      setenv MP_SHARED_MEMORY yes
      smp_bandwidth -nodes 1 -procs 2
      smp_bandwidth -nodes 1 -procs 4

    2. Now try it without shared memory (using the network):

      setenv MP_SHARED_MEMORY no
      smp_bandwidth -nodes 1 -procs 2
      smp_bandwidth -nodes 1 -procs 4

      What differences do you notice?

  13. Try using POE's Multiple Program Multiple Data (MPMD) mode

    POE allows you to load and run different executables on different nodes. This is controlled by the MP_PGMMODEL environment variable.

    1. First, set some environment variables:

      Environment Variable Setting Description
      MP_PGMMODEL mpmd Specify MPMD mode
      MP_PROCS 4 Use 4 tasks again
      MP_NODES 4 Use one node per task
      MP_STDOUTMODE ordered Sort the output by task

    2. Then, simply issue the poe command.

    3. After a moment, you will be prompted to enter your executables one at a time. Notice that the machine name where the executable will run is displayed as part of the prompt. In any order you choose, enter these four program names, one per prompt:

      prog1 prog2 prog3 prog4

      Note: these four programs are just simple shell scripts used to demonstrate how to use the MPMD programming model.

    4. After the last program name is entered, POE will run all four executables. Observe their different outputs.

  14. Try specific node allocation using a host list file

    Generally speaking, there aren't many cases where you'll need to "manually" select which nodes should be used to run your POE job. This step will demonstrate how to do it though, should you ever have the need.

    1. First, use your favorite UNIX editor and create a file in your POE executables directory. Call it hostfile. As its contents, enter 4 different node names from the workshop node pool - one node name per line. Click to see the nodes in the workshop node pool.

    2. Set the appropriate POE environment variables which specify specific node allocation:

      Environment Variable Setting Description
      MP_RESD no Turn off selection by the Resource Manager - just to be sure
      MP_HOSTFILE hostfile Specify the host file you created
      MP_SAVEHOSTFILE hosts_used Save the names of the hosts used to run your program
      MP_EUILIB ip Required protocol for specific node allocation
      MP_PGMMODEL spmd Reset from mpmd used in the previous step

    3. Run the poe_hello executable again and observe the output. Does it match what you specified in your hostlist file?

    4. Check your hosts_used file, which was created when your program ran. Do the names match the ones specified by your hostlist file?

  15. Review relevant LC documentation (or at least know where to find it):
    • LC Home Page - especially note the Machine Configurations link in the Machine Info section. Try the OCF Machine Status link also.
    • news job.limits command
    • news job.lim.blue command
    • news job.lim.frost command
    • Review the /etc/environment file
    • ASCI Blue web pages - see the "Running Jobs" section
    • ASCI White web pages - see the "Running Jobs" section

This concludes the POE exercise.


This completes the exercise.

Evaluation Form       Please complete the online evaluation form.

Where would you like to go now?