Atlassian uses cookies to improve your browsing experience, perform analytics and research, and conduct advertising. Accept all cookies to indicate that you agree to our use of cookies on your device. Atlassian cookies and tracking notice, (opens new window)
User Manual

User Manual
Results will update as you type.
  • Application Guide
  • Status of System
  • Usage Guide
  • Compute partitions
  • Software
    • AI Frameworks and Tools
    • Bring your own license
    • Chemistry
    • Data Manipulation
    • Engineering
    • Environment Modules
    • Miscellaneous
    • Numerics
    • Virtualization
    • Devtools Compiler Debugger
      • Anaconda (conda) and Mamba
      • antlr
      • Arm DDT
      • Charm++
      • Intel oneAPI Compiler Suite
      • Intel oneAPI MPI
      • Intel oneAPI Performance Tools
        • VTune profiler
      • LIKWID Performance Tool Suite
      • OpenMPI
      • Patchelf
      • Python
      • SYCL
      • Valgrind instrumentation framework
      • VS Code
      • Julia
      • Perforce TotalView
    • Visualisation Tools
  • FAQ
  • NHR Community
  • Contact

    You‘re viewing this with anonymous access, so some content might be blocked.
    /
    VTune profiler

    VTune profiler

    Apr. 27, 2023

    Quick performance evaluation with VTune APS or detailed hotspot, memory, or threading analysis with VTune profiler.

    First load the environment module:

    module add vtune/XXXX

    Intro:
    https://ci.spdk.io/download/2019-summit-prc/02_Presentation_02_VTune_and_Analyzers_Overview_Sri.pdf
    www.intel.com/content/www/us/en/docs/vtune-profiler/get-started-guide/2023/linux-os.html

    Manuals:
    Intel_APS.pdf
    VTune_features_for_HPC.pdf

    vtune -help

    Run VTune via the command line interface

    Run your application with VTune wrapper as follows:
    www.intel.com/content/www/us/en/docs/vtune-profiler/user-guide/2023-0/command-line-interface.html

    Minimal Application Performance Snapshot (APS)
    mpirun -np 4 aps -collect hotspots advanced-hotspots ./path-to_your/app.exe args_of_your_app
    
    # after completion, explore the results:
    aps-report aps_result_*
    Full Hotspot Analysis
    mpirun -np 4 vtune –collect hotspots -result-dir vtune_hotspot ./path-to_your/app.exe args_of_your_app
    
    # after completion, explore the results:
    vtune -report summary -r vtune_*

    Run VTune-GUI (not recommended)

    Login with x-window support (ssh -X) and then start

    vtune-gui

    Run VTune-GUI remotely on your local browser (recommended)

    Login to the supercomputer with local port forwarding and start your VTune server on an exclusive compute node (1h job):

    ssh -L 127.0.0.1:55055:127.0.0.1:55055 blogin.hlrn.de
    salloc -p standard96:test -t 01:00:00
    ssh -L 127.0.0.1:55055:127.0.0.1:55055 $SLURM_NODELIST
    module add intel/19.0.5 impi/2019.9 vtune/2022
    vtune-backend --web-port=55055 --enable-server-profiling &

    Open 127.0.0.1:55055 in your browser (allow security exception, set initial password).

    In 1st "Welcome" VTune tab (run MPI parallel Performance Snapshot):

    Click: Configure Analysis
    -> Set application:  /path-to-your-application/program.exe
    -> Check: Use app. dir. as work dir.
    -> In case of MPI parallelism, expand "Advanced": keep defaults but paste the following wrapper script and check "Trace MPI":

    #!/bin/bash
    
    # Run VTune collector (here with 4 MPI ranks)
    mpirun -np 4 "$@"

    Under HOW, run: Performance Snapshot.
    (After completion/result finalization a 2nd result tab opens automatically.)

    In 2nd "r0..." VTune tab (explore Performance Snapshot results):

    -> Here you find several analysis results e.g. the HPC Perf. Characterization.
    -> Under Performance Snapshot - depending on the snapshot outcome - VTune suggests (see % in the figure below) more detailed follow-up analysis types:


    --> For example select/run a Hotspot analysis:

    In 3nd "r0..." VTune tab (Hotspot analysis):

    -> Expand sub-tab Top-down Tree
    --> In Function Stack expand the "_start" function and expand further down to the "main" function (first with an entry in the source file column)
    --> In the source file column double-click on "filename.c" of the "main" function

    -> In the new sub-tab "filename.c" scroll down to the line with maximal CPU Time: Total to find hotspots in the main function

    To quit the debug session press "Exit" in the VTune "Menu" (upper left symbol of "three horizontal bars"). Then close the browser page. Exit your compute node via CTRL+D and kill your interactive job:

    squeue -l -u $USER
    scancel your-job-id
    , multiple selections available, Use left or right arrow keys to navigate selected items
    devtools
    sw-misc
    {"serverDuration": 12, "requestCorrelationId": "c4bfa10b6ffb4ef3aa4e8b58d471abe7"}