Commit f419557
committed
Adds pause-resume elasticity if the proxy backend is used.
Added the changes to the jobset for elastic training to enable elasticity.
Added changes to launch_trainer so that the pause_resume decorator is used.
Set logging.raiseExceptions=True so that DATA_LOSS errors that occur in debug/info/other log calls are raise exceptions immediately.1 parent ae9a2c5 commit f419557
File tree
3 files changed
+33
-6
lines changed- axlearn
- cloud/gcp
- common
3 files changed
+33
-6
lines changed| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
343 | 343 | | |
344 | 344 | | |
345 | 345 | | |
| 346 | + | |
346 | 347 | | |
347 | 348 | | |
348 | 349 | | |
| |||
604 | 605 | | |
605 | 606 | | |
606 | 607 | | |
| 608 | + | |
| 609 | + | |
| 610 | + | |
| 611 | + | |
| 612 | + | |
| 613 | + | |
| 614 | + | |
607 | 615 | | |
608 | 616 | | |
609 | 617 | | |
610 | 618 | | |
611 | 619 | | |
612 | 620 | | |
613 | 621 | | |
614 | | - | |
| 622 | + | |
615 | 623 | | |
616 | 624 | | |
617 | 625 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
3 | 3 | | |
4 | 4 | | |
5 | 5 | | |
| 6 | + | |
6 | 7 | | |
7 | 8 | | |
8 | 9 | | |
| |||
15 | 16 | | |
16 | 17 | | |
17 | 18 | | |
| 19 | + | |
| 20 | + | |
18 | 21 | | |
19 | 22 | | |
20 | 23 | | |
| |||
158 | 161 | | |
159 | 162 | | |
160 | 163 | | |
161 | | - | |
162 | | - | |
163 | | - | |
164 | | - | |
| 164 | + | |
| 165 | + | |
| 166 | + | |
| 167 | + | |
| 168 | + | |
| 169 | + | |
| 170 | + | |
| 171 | + | |
| 172 | + | |
| 173 | + | |
| 174 | + | |
| 175 | + | |
| 176 | + | |
| 177 | + | |
| 178 | + | |
| 179 | + | |
| 180 | + | |
| 181 | + | |
| 182 | + | |
| 183 | + | |
165 | 184 | | |
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
108 | 108 | | |
109 | 109 | | |
110 | 110 | | |
111 | | - | |
| 111 | + | |
112 | 112 | | |
113 | 113 | | |
114 | 114 | | |
| |||
0 commit comments