Example hpmcount Output

% hpmcount a.out -s 1
 adding counter 5 event 12 Cycles
 adding counter 0 event 1 Instructions completed
 adding counter 7 event 0 TLB misses
 adding counter 2 event 9 Stores completed
 adding counter 3 event 5 Loads completed
 adding counter 4 event 5 FPU 0 instructions
 adding counter 1 event 35 FPU 1 instructions
 adding counter 6 event 9 FMAs executed

 Running pipe()...
 s=  80200000000.0000000
 Running unroll()...
 s=  51840000160.0000000
 Running strength()...
 s=  84147.0984807570785
 Running block()...
 c(N,N) =  67239936.0000000000

 hpmcount (V 1.1) summary

 Total execution time (wall clock time): 73.797731 seconds

 ########  Resource Usage Statistics  ########

 Total amount of time in user mode            : 73.660000 seconds
 Total amount of time in system mode          : 0.120000 seconds
 Maximum resident set size                    : 7648 Kbytes
 Average shared memory use in text segment    : 118016 Kbytes*sec
 Average unshared memory use in data segment  : 24794792 Kbytes*sec
 Number of page faults without I/O activity   : 1912
 Number of page faults with I/O activity      : 0
 Number of times process was swapped out      : 0
 Number of times file system performed INPUT  : 0
 Number of times file system performed OUTPUT : 0
 Number of IPC messages sent                  : 0
 Number of IPC messages received              : 0
 Number of signals delivered                  : 0
 Number of voluntary context switches         : 27
 Number of involuntary context switches       : 81

 #######  End of Resource Statistics  ########

  PM_CYC (Cycles)                            :     27292091054
  PM_INST_CMPL (Instructions completed)      :     20517716625
  PM_TLB_MISS (TLB misses)                   :       137885794
  PM_ST_CMPL (Stores completed)              :      1722228796
  PM_LD_CMPL (Loads completed)               :      6770324209
  PM_FPU0_CMPL (FPU 0 instructions)          :      1485301893
  PM_FPU1_CMPL (FPU 1 instructions)          :       200251380
  PM_EXEC_FMA (FMAs executed)                :      1259220642

  Average number of loads per TLB miss       :          49.101
  Total loads and stores                     :        8492.553 M
  Instructions per load/store                :           2.416
  Cycles per instruction                     :           1.330
  Instructions per cycle                     :           0.752
  Total floating point operations            :        2944.774 M
  Hardware float point rate                  :          39.903 Mflop/sec
  Computation intensity                      :           0.347