hpcprof-flat:
Correlation of Flat Profile Metrics with Static Program Structure
The HPCToolkit Performance Tools
2011/02/22
Version 2017.10
hpcprof-flat
correlates `flat' (IP histograms) profile metrics with static source code structure. See hpctoolkit(1)
for an overview of HPCToolkit.
Table of Contents
hpcprof-flat
[output-options]
[correlation-options]
profile-file...
hpcprof-flat
[output-options]
--config config-file
hpcprof-flat
correlates flat profiling metrics with static source code structure and (by default) generates an Experiment database for use with hpcviewer(1)
.
hpcprof-flat
is invoked in one of two ways.
In the former, correlation options are specified on the command line along with a list of flat profile files.
In the latter, these options along with derived metrics are specified in the configuration file config-file.
Note that the first mode is generally sufficient since derived metrics may be computed in hpcviewer(1)
.
However, to facilitate the batch processing of the second mode, during the first mode, a sample configuration file (config.xml)
is generated within the Experiment database.
See the section Configuration File
below for more details about its syntax.
For optimal results, structure information from hpcstruct(1)
should be provided.
Without structure information, hpcprof-flat will default to correlation using line map information.
- profile-file...
- A list of flat profile files.
- config-file
- The hpcprof-flat
configuration file.
Default values for an option's optional arguments are shown in {}.
- -v [n], --verbose [n]
-
Verbose: generate progress messages to stderr at verbosity level n.
{1} (Use n=3 to debug path replacement if metric and program structure is not properly matched.)
- -V, --version
-
Print version information.
- -h, --help
-
Print help.
- --debug [n]
-
Debug: use debug level n.
{1}
- --name name, --title name
-
Set the database's name (title) to name.
- -I path, --include path
-
Use path
when searching for source files. For a recursive search, append a '*' after the last slash, e.g., '/mypath/*' (quote or escape to protect from the shell). May pass multiple times.
- -S file, --structure file
-
Use hpcstruct(1)
structure file file
for correlation. May pass multiple times (e.g., for shared libraries).
- -R 'old-path=new-path', --replace-path 'old-path=new-path'
-
Substitute instances of old-path
with new-path;
apply to all paths (e.g., a profile's load map and source code) for which old-path
is a prefix. Use '
'to escape instances of '=' within a path. May pass multiple times.
Use this option when a profile or binary contains references to files that have been relocated, such as might occur with a file system change.
- -o db-path, --db db-path, --output db-path
- Specify Experiment database name db-path.
{./experiment-db}
- --src [yes | no], --source [yes | no]
- Whether to copy source code files into Experiment database. {yes} By default, hpcprof-flat
copies source files with performance metrics and that can be reached by PATH/REPLACE statements, resulting in a self-contained dataset that does not rely on an external source code repository. Note that if copying is suppressed, the database is no longer self-contained.
Select different output formats and optionally specify the output filename file
(located within the Experiment database). The output is sparse in the sense that it ignores program areas without profiling information. (Set file
to '-' to write to stdout.)
- -x [file], --experiment [file]
- Default. ExperimentXML format. {experiment.xml}.
NOTE: To disable, set file
to no.
A hpcprof-flat
configuration file is an XML document of type HPCPROF.
The following briefly describes its syntax.
(Optional) Use my-title to name the Experiment database.
(Optional) A set of PATH directives specifying path names to search for source files.
path
is a relative or absolute path containing source code to which performance data should be correlated.
In order to recursively search a directory, append an escaped `*'
after the last slash, e.g., /mypath/\* (escaping is for the shell).
(Optional) A set of REPLACE directives can be used to define one path prefix to operationally match another prefix occuring in profile data files or in a program structure file. This can be particularly useful when trying to compare performance metrics between machines with different file structures, e.g., because the executables or the source files are installed in different places.
<REPLACE in="old-path-prefix"
out="new-path-prefix"
/>
(Optional) A set of STRUCTURE directives providing program structure files created using hpcstruct(1)
.
<STRUCTURE name="program.psxml"/>
<STRUCTURE name="key-dso1.psxml"/>
<STRUCTURE name="key-dso2.psxml"/>
One or more metrics. A metric may be of two types, native or derived. Metrics are introduced using the METRIC element
<METRIC name="name"
displayName="name-in-display"
display="true|false"
percent="true|false">
...
</METRIC>
which has several attributes.
- name. A unique name. This name is used when creating derived metrics that are expressions of other metrics.
- displayName. A display name needs not be unique.
- display. Controls metric visibility. One might read a metric and not display it because it is only useful as input to some computed metric.
- percent. Indicates whether the viewer should display a column of percentages computed as the ratio of the metric for this scope to the metric for the whole program. Percents are useful when metrics are computed by summing contributions from descendants in the scope tree, but are meaningless for computed metrics such as ratio of flops/memory access in a scope.
The two metric types are
- Native or FILE metrics. This metric is read from a file which can be of type HPCRUN (from
hpcrun-flat(1)
) or PROFILE (from hpcproftt(1)).
<METRIC name="m1" ...>
<FILE name="file1"
select="short-name-in-file1"
type="HPCRUN|PROFILE"/>
</METRIC>
Since a file may contain multiple metrics, the FILE
element has an optional `select' attribute to identify a particular metric from the file. Metrics are identified by their `shortName' values which are typically zero-based indices. The default `select' value is 0, which corresponds to the first metric.
There is one important difference in how each type is handled. HPCRUN files are correlated to source code using the object code addresses annotations of STRUCTURE files; thus, they require valid STRUCTURE information. In contrast, PROFILE files are correlated by source line level information.
- Derived or COMPUTE metrics. Derived metrics are specified by a COMPUTE element containing a MathML equation in terms of metrics defined earlier in the HPCPROF document. hpcprof-flat supports the following operands
- constants:
<cn>2</cn>
- variables:
<ci>m1</ci> (used to refer to other metrics)
and the following MathML operators (used within <apply>):
- negation:
<minus/> (1-ary)
- subtraction:
<minus/> (2-ary)
- addition:
<plus/> (n-ary)
- multiplication:
<times/> (n-ary)
- division:
<divide/> (2-ary)
- exponentiation:
<power/> (2-ary)
- minimum:
<min/> (n-ary)
- maximum:
<max/> (n-ary)
- mean (arithmetic):
<mean/> (n-ary)
- standard deviation:
<sdev/> (n-ary)
Assuming we have a document with two metrics PAPI_TOT_CYC (cycles) and PAPI_FP_INS (floating point operations) we could compute cycles/FLOP:
<METRIC name="cyc/fp" displayName="..." percent="false">
<COMPUTE>
<math>
<apply> <divide/>
<ci>PAPI_TOT_CYC</ci>
<ci>PAPI_FP_INS</ci>
</apply>
</math>
</COMPUTE>
</METRIC>
End document type.
- Cycles per instruction.
<METRIC name="CPI" displayName="..." percent="false">
<COMPUTE>
<math>
<apply> <divide/>
<ci>PAPI_TOT_CYC</ci>
<ci>PAPI_TOT_INS</ci>
</apply>
</math>
</COMPUTE>
</METRIC>
- Unrealized FLOPS. Assume a processor core that can issue three floating point operations per cycle. Then unrealized FLOPS could be computed as:
<METRIC name="unrealized FLOPS" displayName="..." percent="false">
<COMPUTE>
<math>
<apply> <minus/>
<apply> <times/> <ci>PAPI_TOT_CYC</ci> <cn>3</cn> </apply>
<ci>PAPI_FP_INS</ci>
</apply>
</math>
</COMPUTE>
</METRIC>
- Wait cycles. Assume a processor core with a 2 GHz frequency and a wall clock metric (WALLCLK) measured in seconds. Then we can compute wait cycles as the difference between wall clock cycles and user cycles.
<METRIC name="wall-cyc" displayName="..." percent="true">
<COMPUTE>
<math>
<apply> <times/>
<ci>WALLCLK</ci>
<cn>2000000000</cn>
</apply>
</math>
</COMPUTE>
</METRIC>
<METRIC name="wait-cyc" displayName="..." percent="true">
<COMPUTE>
<math>
<apply> <minus/>
<ci>wall-cyc</ci>
<ci>PAPI_TOT_CYC</ci>
</apply>
</math>
</COMPUTE>
</METRIC>
- TLB miss rate. (data TLB misses + instruction TLB misses)/cycles
- Memory traffic at level L_k. (L_{k-1} load misses + L_{k-1} store misses) * (L_{k-1} cache line size)
- Memory bandwidth consumed at level L_k. (L_k traffic)/(wall clock time)
hpctoolkit(1)
.
Version: 2017.10 of 2011/02/22.
- Copyright
- © 2002-2017, Rice University.
- License
- See README.License.
Nathan Tallent
John Mellor-Crummey
Rob Fowler
Rice HPCToolkit Research Group
Email: hpctoolkit-forum =at= rice.edu
WWW: http://hpctoolkit.org.