HPCToolkit
The HPCToolkit Performance Tools
2011/02/22
Version 2017.10
HPCToolkit
is a collection of performance analysis tools for node-based performance analysis.
It has been designed around the following principles:
- Language independence.
- Avoid code instrumentation.
- Avoid blind spots.
- Context is essential for understanding layered and object-oriented software.
- Any one performance measure produces a myopic view.
- Derived performance metrics are essential for effective analysis.
- Performance analysis should be top down.
- Hierarchical aggregation is important in the face of approximate attribution.
- With instruction-level parallelism, aggregate properties are vital.
- Measurement and analysis must be scalable.
HPCToolkit's
website (hpctoolkit.org)
contains papers that explain these design principles in more detail.
Table of Contents
A typical performance analysis session consists of:
- Measurement.
Collect low-overhead, high-accuracy profiles using statistical sampling.
hpcrun(1)
collects call path profiles while hpcrun-flat(1)
collects `flat' profiles (IP histograms, where IP is instruction pointer).
hpclink(1)
is used to collect call path profiles for statically linked applications.
- Recovering static source code structure.
hpcstruct(1)
recovers static program structure such as procedures and loop nests.
It accounts for procedure and loop transformations such as inlining and software pipelining.
Technically, this is an optional step, but critical information is lost without it.
- Correlating dynamic profiles with static source code structure.
HPCToolkit's
correlation tool combines dynamic profile information with hpcstruct(1)
's static program structure to correlate costs of the optimized object code to useful source code constructs such as loop nests.
The result of correlation is called an Experiment database.
Currently, hpcprof(1)
is used for call stack profiles and hpcprof-flat(1)
is used for flat profiles.
- Top-down visualization
hpcviewer(1)
is a Rich Client Platform-based tool for presenting the resulting Experiment databases.
An important feature of the Experiment database is that it is relocatable, containing profile information and copies of application source files.
This means that the first three steps can be performed remotely on a cluster and then the database can be viewed locally on a laptop or workstation.
- Measurement.
Collect low-overhead, high-accuracy `flat' profiles using statistical sampling (hpcrun-flat(1)
).
- Recovering static source code structure.
hpcstruct(1)
is used for the same purpose described above.
- Correlating dynamic profiles with procedures source lines.
hpcproftt(1)
correlates `flat' profiles with source structure and produces textual output.
Assume we have an application called zoo
whose source code is located in in path-to-zoo.
- Compile.
First compile and link the application normally with full optimization and as much debugging information as possible.
Typically, this involves compiler options similar to -O3 -g.
(See hpcstruct(1)
for options for specific compilers.)
- Measure.
Profile with hpcrun(1)
or hpcrun-flat(1)
.
Assume we wish to use two different sets of events.
hpcrun[-flat] <event-set-1> zoo
hpcrun[-flat] <event-set-2> zoo
hpcrun(1)
will by default place the results in a measurement directory named hpctoolkit--measurements.
hpcrun-flat(1)
by default creates data files in the current directory; assume the results are placed in profile-file-1
and profile-file-2.
- Recover static source code structure.
Use hpcstruct(1)
to recover program structure and write it to the file zoo.hpcstruct
hpcstruct zoo
- Correlate call path or flat metrics with static source code structure.
Create an Experiment database using hpcprof(1)
or hpcprof-flat(1)
.
(The version of hpcprof(1)
must match the version of hpcrun(1)
.)
Assume the generated Experiment database is named hpctoolkit-database.
hpcprof -I path-to-zoo/'*' -S zoo.hpcstruct hpctoolkit-zoo-measurements
or
hpcprof-flat -I path-to-zoo/'*' -S zoo.hpcstruct profile-file-1 profile-file-2
- Visualize.
Visualize the Experiment database using hpcviewer(1)
:
hpcviewer hpctoolkit-database
Derived metrics may be computed on-the-fly with hpcviewer(1)
.
See The hpcviewer User Interface
Guide for more information.
Begin with steps 1--3 above.
- Correlate metrics with static source code structure and generate textual summaries.
Use hpcproftt(1)
.
To compute raw metrics for each native event and generate the default program and load module summaries:
hpcproftt -I path-to-zoo/'*' -S zoo.hpcstruct profile-file-1 profile-file-2
To compute raw and summary metrics (but only show the latter) and generate summaries for all program structure elements:
hpcproftt --src=sum --metric=sum-only -I path-to-zoo/'*' -S zoo.hpcstruct profile-file-1 profile-file-2
To compute raw and summary metrics and generate summaries for all program structure elements and generate annotated source files:
hpcproftt --src=all --metric=sum -I path-to-zoo/'*' -S zoo.hpcstruct profile-file-1 profile-file-2
hpcrun(1)
,
hpclink(1)
(hpcrun-flat(1)
)
hpcstruct(1)
hpcprof(1)
,
hpcprof-mpi(1)
(hpcprof-flat(1)
)
hpcproftt(1)
hpcsummary(1)
hpcviewer(1)
,
hpctraceviewer(1)
Version: 2017.10 of 2011/02/22.
- Copyright
- © 2002-2017, Rice University.
- License
- See README.License.
Email: hpctoolkit-forum =at= rice.edu
WWW: http://hpctoolkit.org.