Versionen im Vergleich

Schlüssel

  • Diese Zeile wurde hinzugefügt.
  • Diese Zeile wurde entfernt.
  • Formatierung wurde geändert.


Auszug

-Quick performance evaluation with VTune APS or detailed hotspot, memory, or threading analysis with VTune profiler.

First load an the environment module:

Codeblock
languagebash
module add vtune/XXXX

Intro:
https://ci.spdk.io/download/2019-summit-prc/02_Presentation_02_VTune_and_Analyzers_Overview_Sri.pdf
www.intel.com/content/www/us/en/docs/vtune-profiler/get-started-guide/2023/linux-os.html

Help:Manuals:
Intel_APS.pdf
VTune_features_for_HPC.pdf

Codeblock
languagebash
vtune -help

Run VTune via the command line interface

...

Run your application with VTune wrapper as follows:
www.intel.com/content/www/us/en/docs/vtune-profiler/user-guide/2023-0/command-line-interface.html

...

Codeblock
languagebash
titleMinimal Application Performance Snapshot (APS)
mpirun -np 4 aps -collect hotspots advanced-hotspots ./path-to_your/app.exe args_of_your_app

# after completion, explore the results:
aps-report aps_result_*


Codeblock
languagebash
titleFull Hotspot Analysis
mpirun -np 4 vtune –collect hotspots -result-dir vtune_hotspot ./path-to_your/app.exe args_of_your_app

After completion explore hotspot analysis e.g. via

Codeblock
languagebash


# after completion, explore the results:
vtune -report summary -r vtune_*

Run VTune-GUI (not recommended)

Login with x-window support (ssh -X) and then start

Codeblock
languagebash
vtune-gui

Run VTune-GUI remotely

...

on your

...

local browser (recommended

...

)

First, login Login to the supercomputer with local port forwarding and start your VTune server on an exclusive compute node (1 hour1h job):

Codeblock
languagebash
ssh -L 127.0.0.1:55055:127.0.0.1:55055 blogin.hlrn.de
salloc -p standard96:test -t 01:00:00
ssh -L 127.0.0.1:55055:127.0.0.1:55055 $SLURM_NODELIST
module load add intel/19.0.5 impi/2019.9 vtune/2022
vtune-backend --web-port=55055 --enable-server-profiling &

Second, open Open 127.0.0.1:55055 in your browser (allow security exception, if first time set initial password).

...

languagebash

...

In 1st "Welcome"

...

VTune tab (run MPI parallel Performance Snapshot):

Click: Configure Analysis
-> Set application:  /path-to-your-application/program.exe

...


->

...

Check:

...

Use

...

app.

...

dir.

...

as

...

work

...

dir.

...


-

...

> In case of MPI parallelism, expand "Advanced":

...

keep

...

defaults

...

but

...

paste the following wrapper script and check "Trace MPI":

Codeblock
languagebash
#!/bin/bash
	# Prefix
script
	echo "Target process PID: ${VTUNE_TARGET_PID}"
	# Run VTune collector 	mpirun -np 2 "$@"
--> Expand "Advanced"
---> keep defaults but paste "Wrapper script:"
---> Check: Trace MPI
Under HOW (in 1st "Welcome" tab)
-> Run "Performance Snapshot"
When complete (in 2nd tab r0...)
-> for overview expand: "HPC Perf. Characterization"
-> for results & to select next analysis expand: "Performance Snapshot" 
--> Click: "Hotspots"
Codeblock
languagebash
Under HOW (in 3rd tab r0...)
-> Run "Hotspots"
When complete (after finalizing results)
--> Expand sub-tab "Top-down Tree"
---> In "Function Stack" expand "_start" fct. and expand further down to "main" fct. (first with entry under "Source File") 
---> Double click on: source_file_name.c
--> In new sub-tab "source_file_name.c" scroll down to line with max. "CPU Time: Total" to find hotspot(here with 4 MPI ranks)
mpirun -np 4 "$@"

Under HOW, run: Performance Snapshot.
(After completion/result finalization a 2nd result tab opens automatically.)

In 2nd "r0..." VTune tab (explore Performance Snapshot results):

-> Here you find several analysis results e.g. the HPC Perf. Characterization.
-> Under Performance Snapshot - depending on the snapshot outcome - VTune suggests (see % in the figure below) more detailed follow-up analysis types:

Image Added
--> For example select/run a Hotspot analysis:

In 3nd "r0..." VTune tab (Hotspot analysis):

-> Expand sub-tab Top-down Tree
--> In Function Stack expand the "_start" function and expand further down to the "main" function (first with an entry in the source file column)
--> In the source file column double-click on "filename.c" of the "main" function

-> In the new sub-tab "filename.c" scroll down to the line with maximal CPU Time: Total to find hotspots in the main function

To quit the debug session press "Exit" in the VTune "Menu" (upper left symbol of "three horizontal bars"). Then close the browser page. Exit your compute node via CTRL+D and kill your interactive job:

Codeblock
languagebash
squeue -l -u $USER
scancel your-job-id

...