diff --git a/docs/Documentation/Development/Debug_Tools/ARM/ddt.md b/docs/Documentation/Development/Debug_Tools/ARM/ddt.md index e69de29bb..ec76143d8 100644 --- a/docs/Documentation/Development/Debug_Tools/ARM/ddt.md +++ b/docs/Documentation/Development/Debug_Tools/ARM/ddt.md @@ -0,0 +1,74 @@ +# Linaro (ARM) DDT + +**Documentation:** [Linaro (ARM) DDT]( https://developer.arm.com/documentation/101136/22-1-3/DDT?lang=en) + +*Linaro DDT (formerly ARM DDT) is a powerful GUI-based parallel debugger. It is part of the Linaro Forge suite of parallel tools, alongside Linaro MAP and Linaro Performance Reports.* + +The focus of this page is on setting up and running DDT, rather than the useful features of DDT. For an overview of parallel debuggers, with a focus on DDT and its capabilities, see our [parallel debugging overview](/Documentation/Development/Debug_tools) page. For links to in-depth tutorials and guides on DDT, see our [resources](#resources) section. For help setting up ARM DDT, contact [HPC help](mailto:hpc-help@nrel.gov) + +## Compiling for debugging + +In order to effectively use any debugger on compiled code, including DDT, we must compile with the `-g` flag and, preferably, with the `-O0` optimization flag. + +The `-g` flag produces debugging information. + +The `-O0` optimization flag ensures that no variables or functions get optimized out, which simplifies debugging. + +An example compile including the proper debug flags might look like: +` mpicc -O0 -g application.c -o application.exe` + +## Remote GUI Set-Up + +DDT involves working with a GUI. Thus, we first need to connect to the cluster in a way that supports fast and efficient visualization of applications. + +To do this, we’ll use the FastX application. Follow the instructions on the [FastX page](/Documentation/Viz_Analytics/FastX/fastx) to set up the application and connect to the cluster. + +## Launching an application with DDT + +Once we’re connected to the cluster via FastX, we want to open a terminal inside FastX and initiate an interactive session: + +`salloc --nodes=1 –account= --time=1:00:00` + +Then, once the interactive session launches, load the arm module: + +`module load arm` + +Next, make sure that you have loaded any additional modules that you need to run your application. + +Finally, launch DDT by typing `ddt` or `vglrun ddt`. This will launch the DDT GUI. From the GUI, click the `run` button. This will generate a box that allows you to specify the path to the executable and your working directory and choose how many tasks and threads to run your application with, among other settings. + +!!! note + + There are multiple ways to launch DDT. If your job will take a long time to run, it may be better to submit a job that launches DDT in offline mode. To do this, open DDT in GUI mode, set the desired debug points and save the sessionfile (mysession.session). Then, submit a job with the following command: + + `ddt --offline --session= -o ./executable` + + This will produce a debugging report you can read when the job completes + +Be sure to set your working directory and application directory correctly: + +![DDT setup](/assets/images/Debugging/ddt_app_path.png) + +In this example, the application to debug is the epsilon executable of BerkeleyGW (epsilon.cplx.x). The development version of this code that we wish to debug can be found in /projects/scatter/OH/BerkeleyGW/bin. Be sure to set this `Application` line correctly. We want to run a particular BerkeleyGW calculation out of the /projects/scatter/OH/GaN/eps_ff directory, which contains all of the input files needed for BerkeleyGW to execute properly. Be sure that your `Working Directory` is set correctly. + +Next, we want to choose how many MPI tasks to launch the application with, and provide any additional arguments to srun: + +![DDT srun](/assets/images/Debugging/ddt_srun_options.png) + +Finally, you can set the number of OpenMP threads to launch, among other options, by checking the corresponding boxes (OpenMP, CUDA, Memory Debugging, etc.). + +Then, click the “run” button. A window will appear that states “listening for your program.” When DDT is done listening, it will show a “paused” view of your source code. From here, you can add break points, etc. When you’re ready to run the program again, click the green triangle towards the top left corner. + +## Summary of steps +1. Compile your code with the `-g` and `-O0` flags +2. Connect to the HPC machine via the [FastX](/Documentation/Viz_Analytics/Fastx/fastx) program +3. Launch an interactive session on the HPC machine with salloc +4. Launch DDT with the command `vglrun ddt` or `ddt` +5. Provide DDT the path to the executable and the path to your working directory + +## Resources + +* NERSC has an excellent in-depth DDT tutorial [here]( https://docs.nersc.gov/tools/debug/ddt/#basic-debugging-functionality) +* See [these slides]( https://www.alcf.anl.gov/sites/default/files/2020-05/Hulguin_Arm_DDT.pdf) for a high-level overview of the Linaro Forge toolkit, including DDT +* See [this tutorial]( https://www.bsc.es/support/DDT-ug.pdf) for a user-friendly walkthrough of DDT + diff --git a/docs/Documentation/Development/Debug_Tools/ARM/index.md b/docs/Documentation/Development/Debug_Tools/ARM/index.md index 07394211f..6547bab0b 100644 --- a/docs/Documentation/Development/Debug_Tools/ARM/index.md +++ b/docs/Documentation/Development/Debug_Tools/ARM/index.md @@ -1 +1,7 @@ -# ARM \ No newline at end of file +# Linaro (ARM) + +Linaro Forge (formerly ARM Forge) is a suite of powerfull parallel tools, including the parallel debugger [DDT](/Documentation/Development/Debug_Tools/ARM/ddt), the parallel profiler [MAP](/Documentation/Development/Performance_Tools/ARM/map), and the parallel profiling "overview" generator [Performance Reports](/Documentation/Development/Performance_Tools/ARM/performance_rep). + +For an overrview of parallel debuggers and their utility, focused on DDT, see [here](/Documentation/Development/Debug_Tools/). + +For an overview of parallel profiling tools and their utility, focused on MAP, see [here](/Documentation/Development/Performance_Tools/). diff --git a/docs/Documentation/Development/Debug_Tools/index.md b/docs/Documentation/Development/Debug_Tools/index.md index c076d26d9..8562616a1 100644 --- a/docs/Documentation/Development/Debug_Tools/index.md +++ b/docs/Documentation/Development/Debug_Tools/index.md @@ -1 +1,103 @@ -# Debug Tools \ No newline at end of file +#Debugging Overview + +Debugging code can be difficult on a good day, and parallel code introduces additional complications. Thankfully, we have a few tools available on NREL HPC machines to help us debug parallel programs. The intent of this guide is to serve as an overview of the types of information one can obtain with a parallel debugger on a supercomputer, and how this information can be used to solve problems in your code. Ultimately, the various parallel debugging and profiling tools tend to work similarly, providing information about individual task and thread performance, parallel memory usage and stack tracing, task-specific bottlenecks and fail points, and more. + +We offer several suites of parallel tools: + +* The [Linaro Forge suite](https://developer.arm.com/documentation/101136/latest/) (DDT - debugger) +* The [Intel oneAPI HPC Toolkit]() +* HPE’s tool suite (Kestrel-only): + * [Cray Debugger Support Tools (CDST)](https://support.hpe.com/hpesc/public/docDisplay?docId=a00113947en_us&page=Cray_Debugger_Support_Tools_CDST.html), including [gdb4hpc](https://support.hpe.com/hpesc/public/docDisplay?docLocale=en_US&docId=a00115304en_us&page=Debug_Crashed_Applications_With_gdb4hpc.html), a command-line parallel debugger based on [GDB](gdb.md). + * [Cray Performance Measurement and Analysis Tools (CPMAT)](https://support.hpe.com/hpesc/public/docDisplay?docId=a00113947en_us&page=Cray_Performance_Measurement_and_Analysis_Tools_CPMAT.html) + +For a low-overhead serial debugger available on all NREL machines, see our [GDB documentation](/Documentation/Development/Debug_tools/gdb). + +To skip to a walk-through example of parallel debugging, click [here](#walk-through). + +## Key Parallel Debugging Features + +Parallel debuggers typically come equipped with the same features available on serial debuggers (breakpoint setting, variable inspection, etc.). However, unlike serial debuggers, parallel debuggers provide valuable MPI task- and thread-specific information, too. We present some key parallel features here. + +Note that while we present features of Linaro DDT (formerly ARM DDT) below, we stress that the many parallel debugging tools function in similar ways and offer similar features. + + +### Fail points +Sometimes, some MPI tasks will fail at a particular point while others will not. This could be for a number of reasons (MPI task-defined variable goes out of bounds, etc.). Parallel debuggers can help us track down which tasks and/or threads are failing, and why. See the [walk through](#walk-through) for an example. + +### Parallel variable inspection +Other times, your code may not fail, but will produce an obviously incorrect answer. Such a situation is even less desirable than your code failing outright, since tracking down the problem variables and problem tasks is often more difficult. + +In these situations, the parallel variable inspection capabilities of parallel debuggers are valuable. We can first check if our code runs as expected when we run in serial. If so, the fault doesn’t lie with the parallelism of the code, and we can proceed using serial debugging techniques. If the code succeeds in serial but yields incorrect results in parallel, then the code is likely afflicted with a parallel bug. + +Inspecting key parallel variables may help in identification of this bug. For example, we can inspect the variables that dictate how the parallel code is divided amongst MPI tasks. Is it as expected? Such a process will vary greatly on a code-by-code basis, but inspecting task-specific variables is a good place to start. + +![doiownc](/assets/images/Debugging/DDT_BGW_doiownc.png) + +The above image shows a comparison across 8 MPI tasks of the first entry of the “doiownc” variable of the BerkeleyGW code. In BerkeleyGW, this variable states whether or not the given MPI task “owns” a given piece of data. Here, we can see that Task 0 owns this piece of data, while tasks 1-7 do not. + + +### Advanced parallel memory debugging + +In addition to detecting seg faults and out-of-bounds errors, parallel debuggers may offer more advanced memory debugging features. For example, DDT allows for advanced task-specific [heap debugging](https://developer.arm.com/documentation/101136/2012/DDT/Memory-debugging). + +### Walk-through + +To highlight some of the above features, we've introduced an out-of-bounds error to the BerkeleyGW code. We’ve changed a writing of the `pol%gme` variable from: + +``` +do ic_loc = 1, peinf%ncownactual + do ig = 1, pol%nmtx + pol%gme(ig, ic_loc, iv_loc, ispin, irk, freq_idx) = ZERO + enddo +enddo +``` + +to: + +``` +do ic_loc = 1, peinf%ncownactual + do ig = 1, pol%nmtx + pol%gme(ig, ic_loc, iv_loc, ispin, irk, freq_idx+1) = ZERO + enddo +enddo +``` + +With this change, the sixth dimension of the array will go out of bounds. The details of the code aren’t important, just the fact that we know the code will fail! + +Now, when we run the code in DDT, we receive the following error: + +![prog_stopped](/assets/images/Debugging/DDT_BGW_prog_stopped.png) + +When we click `pause`, we are immediately taken to the line that caused the failure: + +![fail line](/assets/images/Debugging/DDT_BGW_fail_line.png) + +We can inspect the variables on the line of failure in the righthand-side box, and we can control which task's variables we are examining in the blue bar across the top. In the above two images, we are examining MPI Task 27, which was the first task to fail. + +One particularly useful feature is the “current stack” view. When we click this header, we are taken to a stack trace. When we click on each line in the stack trace, it takes us to the corresponding line in the corresponding source file, making stack tracing an error fast and simple. This is a common component of debuggers, even serial debuggers, but paired with our ability to choose which MPI task to focus on, this is a powerful feature! We get a task-specific view of the issue. + +![stack trace](/assets/images/Debugging/DDT_BGW_stack.png) + +\#0 on the stack trace corresponds to our `pol%gme` line. If we were to click, for example, on \#5, we are taken to a routine "further upstream" that is implicated in the call: + +![stack trace 2](/assets/images/Debugging/DDT_BGW_stack_trace2.png) + +If we want to compare the offending array, pol%gme, across MPI tasks, we only need to right-click on it in the "locals" box: + +![compare gme 1](/assets/images/Debugging/DDT_BGW_compare1.png) + +which launches a box that allows us to examine how pol%gme faired on each MPI task: + +![compare gme 2](/assets/images/Debugging/DDT_BGW_compare2.png) + +The tasks listed next to `` have not yet reached this line of code, but the 18 tasks who have (the tasks listed in the lines above `