You should set number of devices to equal number for CatBoost/XGBoost
so the benchmark in not fair without CUDA_VISIBLE_DEVICES=id, because CatBoost uses all devices by default and this is not a good idea to you small benchmark datasets with 8V100 servers