Performance measures have been obtained with the High-Performance Linpack Benchmark.
Performance
Rack # |
# of Nodes |
# of Cores |
DP FP factor |
Clock Speed |
R_peak* | ||||
---|---|---|---|---|---|---|---|---|---|
0 | 30 | x | 2 | x | 2 | x | 1.80 Ghz | = | 216.0 |
1 | 15 | x | 4 | x | 2 | x | 2.40 Ghz | = | 288.0 |
2 | 32 | x | 8 | x | 4 | x | 2.33 Ghz | = | 2385.9 |
77 | 376 | 2889.9 |
* R_peak and R_max are values in GFLOPS. R_max has been scaled from non-full cluster tests and is not representative. R_peak is simply the theoretical processing power of a processor multiplied by the number of cores.
Settings
320 cores, 1GB memory per core
Optimal performance at 80% memory usage:
#entries = sqrt(#cores * mem * 0.8 / 8) = 185 kentries
NB = 320
N = 320 * x = y
HPLinpack benchmark input file
Innovative Computing Laboratory, University of Tennessee
HPL.out output file name (if any)
6 device out (6=stdout,7=stderr,file)
1 # of problems sizes (N)
40000 Ns
3 # of NBs
44 88 132 NBs
1 # of process grids (P x Q)
2 Ps
8 Qs
16.0 threshold
3 # of panel fact
0 1 2 PFACTs (0=left, 1=Crout, 2=Right)
1 # of recursive stopping criterium
8 NBMINs (>= 1)
1 # of panels in recursion
2 NDIVs
1 # of recursive panel fact.
2 RFACTs (0=left, 1=Crout, 2=Right)
1 # of broadcast
1 BCASTs (0=1rg,1=1rM,2=2rg,3=2rM,4=Lng,5=LnM)
1 # of lookahead depth
1 DEPTHs (>=0)
2 SWAP (0=bin-exch,1=long,2=mix)
80 swapping threshold
0 L1 in (0=transposed,1=no-transposed) form
0 U in (0=transposed,1=no-transposed) form
1 Equilibration (0=no,1=yes)
8 memory alignment in double (> 0)