Atlassian uses cookies to improve your browsing experience, perform analytics and research, and conduct advertising. Accept all cookies to indicate that you agree to our use of cookies on your device. Atlassian cookies and tracking notice, (opens new window)
User Manual

User Manual
Results will update as you type.
  • Application Guide
  • Status of System
  • Usage Guide
  • Compute partitions
  • Software
  • FAQ
    • Floating point exception with Intel MPI 2019.x using one task per node
    • How long can I access my project data?
    • I cannot contact external servers/services
    • I lost access to my ssh key.
    • I lost my password for Portal NHR@ZIB.
    • I lost my PIN to reset my Portal NHR@ZIB password.
    • INTEL-MPI version 19 slower than INTEL-MPI version 18
    • Known Warnings
    • Memory Limits on Login Nodes
    • MPI executable dies with error: "hfi_userinit: mmap of status page ..."
    • MPI Jobs with more than 40 (96) tasks per node failing
    • Search in this wiki fails
    • Slow execution of nccopy from netcdf 4.6.3
    • SLURM does not recognize job script
    • Too many open files
    • Unspecific error messages when reading huge input files
  • NHR Community
  • Contact

    You‘re viewing this with anonymous access, so some content might be blocked.
    /
    Unspecific error messages when reading huge input files

    Unspecific error messages when reading huge input files

    Mai 06, 2021

    Problem

    In a job that requires "staging" of new huge input files (8GB in 650 files) during runtime, the job fails with error messages like "invalid file format". Inspecting the files later, does not reveal any errors and the input files are sane

    cp repository/* input_area
    mpirun ...

    It seems to be a lustre cache related problem, the startup of the parallel process is faster than lustre can sychronise itself on all nodes.

    Solution

    Add some delay after copying large file sets:

    cp repository/* input_area
    sleep 20
    mpirun ...
    sleep 20

    Alternatively, the tool nocache serves as a workaround for this issue (thanks John):

    nocache cp repository/* input_area
    mpirun ...

    Related articles

    • Seite:
      Metadata Usage on WORK
    • Seite:
      Unspecific error messages when reading huge input files
    • Seite:
      Multiple programs multiple data



    Problem

    In a job that requires "staging" of new huge input files (8GB in 650 files) during runtime, the job fails with error messages like "invalid file format". Inspecting the files later, does not reveal any errors and the input files are sane

    cp repository/* input_area
    mpirun ...

    It seems to be a lustre cache related problem, the startup of the parallel process is faster than lustre can sychronise itself on all nodes.

    Solution

    Add some delay after copying large file sets:

    cp repository/* input_area
    sleep 20
    mpirun ...
    sleep 20

    Alternatively, the tool nocache serves as a workaround for this issue (thanks John):

    nocache cp repository/* input_area
    mpirun ...

    Related articles

    • Seite:
      Metadata Usage on WORK
    • Seite:
      Unspecific error messages when reading huge input files
    • Seite:
      Multiple programs multiple data



    , multiple selections available, Use left or right arrow keys to navigate selected items
    huge
    files
    invalid
    file
    format
    kb-troubleshooting-article
    {"serverDuration": 9, "requestCorrelationId": "d45eade7287d4e63a3d92cf328d6f1ab"}