Skip to content
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
56 commits
Select commit Hold shift + click to select a range
ab8a43c
fix: Enable CI on IBM repo
zhouyuan Aug 26, 2025
7187c10
Set ccache maximum size to 1G
FelixYBW Nov 22, 2025
cfd0039
Remove sed command from Gluten workflow
FelixYBW Nov 23, 2025
ee2c5c0
Modify get-velox.sh to change 'ibm' to 'ibm-xxx'
FelixYBW Nov 23, 2025
11183c2
Update sed command to be case-insensitive
FelixYBW Nov 23, 2025
d3bc030
Update gluten.yml
FelixYBW Nov 23, 2025
fa5e032
fix iceberg unit test
zhouyuan Nov 24, 2025
18a60f6
Update gluten.yml
FelixYBW Nov 24, 2025
df73ec9
Enable enhanced features in gluten build script
FelixYBW Nov 24, 2025
67841f1
Update cache keys for Gluten workflow
FelixYBW Nov 25, 2025
e336226
feat(plan_builder): Add fluent expression builder API (#15615)
pedroerp Nov 25, 2025
fd0682b
refactor: Add Iceberg connector (#15581)
PingLiuPing Nov 25, 2025
e953f06
fix: Revert refactor: Remove enableConstantFolding flag from ExprComp…
amitkdutta Nov 26, 2025
33248d5
feat: Enhance PlanConsistencyChecker to check columns used in lambda …
mbasmanova Nov 26, 2025
fea5baa
feat(plan_builder): Support untyped filter expressions (#15611)
pedroerp Nov 26, 2025
1c7df46
feat(plan_builder): Alias support for expression builder (#15634)
pedroerp Nov 26, 2025
4d3409f
refactor: Improve VeloxPromise API (#15628)
mbasmanova Nov 26, 2025
bc8d017
perf: Avoid string copy in IPAddressType (#15638)
pedroerp Nov 26, 2025
a62c85d
fix: Fix flaky test SharedArbitrationTestWithParallelExecutionModeOnl…
duxiao1212 Nov 26, 2025
bcc64b7
misc: Add TopNRowNumber in MemoryArbitrationFuzzer (#15598)
duxiao1212 Nov 26, 2025
6375476
fix: Fix flaky test freeUnusedCapacityWhenReclaimMemoryPool (#15612)
duxiao1212 Nov 26, 2025
8ecb1a1
fix: Exception handling in cast from JSON (#15617)
Nov 26, 2025
be0837f
perf: Avoid StringView copy in folly/Conv.h (#15639)
pedroerp Nov 26, 2025
d1ec7e7
feat: Add map to struct pushdown support to file readers (#15545)
Yuhta Nov 26, 2025
fc70dbe
feat: Add fuzzer type transform for KHLL (#15644)
natashasehgal Nov 26, 2025
c045573
Revert "feat: Add rewrite for IN special form" (#15649)
Nov 27, 2025
5e3d5a7
fix: Do not wait for async load at ParallelUnitLoader destructor (#15…
Nov 27, 2025
0007f37
refactor: Extract common BaseSerializedPage API (#15626)
tanjialiang Nov 27, 2025
7eaf45f
misc: Updated MEM_POOL_CAP_EXCEEDED error message (#15616)
duxiao1212 Nov 28, 2025
1c59485
[OAP][NA] Register merge extract companion agg functions without suffix
zhztheplayer Dec 29, 2023
762b3b2
[OAP][5962]Support struct schema evolution matching by name
rui-mo Mar 18, 2025
9a7cfbd
[OAP][15173][15343]Allow reading integers into smaller-range types
rui-mo Sep 18, 2025
64413d5
[OAP][11771]fix: Fix smj result mismatch issue in semi, anit and full…
zhouyuan Sep 4, 2025
49a1cb8
[OAP][7066]Stream input row to hash table when addInput for left semi…
liujiayi771 Nov 24, 2024
d19ce5c
[14722] Fix memory leak caused by asynchronous prefetch
rui-mo Nov 21, 2025
bb51ce5
Revert "refactor: Add Iceberg connector (#15581)"
PingLiuPing Nov 27, 2025
0000732
Revert "feat: Add Iceberg partition name generator (#15461)"
PingLiuPing Nov 17, 2025
e7c7cd6
Revert "feat: Add support for evaluating Iceberg partition transforms…
PingLiuPing Nov 14, 2025
1ebdc9d
Revert "feat: Add Iceberg partition name generation utility (#15443)"
PingLiuPing Nov 11, 2025
fa14ad1
Revert "feat: Add iceberg partition specification (#15423)"
PingLiuPing Nov 7, 2025
314431d
refactor: Move toValues from InPredicate.cpp to Filter.h
yingsu00 Mar 15, 2025
53bd55c
feat(connector): Support reading Iceberg split with equality deletes
yingsu00 May 1, 2024
c26744d
Support insert data into iceberg table.
PingLiuPing Oct 3, 2025
74228d0
Add iceberg partition transforms.
PingLiuPing Jul 1, 2025
b789939
Add NaN statistics to parquet writer.
PingLiuPing Sep 4, 2025
a8995b2
Collect Iceberg data file statistics in dwio.
PingLiuPing Jul 28, 2025
217e4a6
Fix incorrect min max stats when the column value are infinity or -in…
PingLiuPing Aug 26, 2025
b769a00
Integrate Iceberg data file statistics and adding unit test.
PingLiuPing Sep 1, 2025
c8b0f5d
Support write field_id to parquet metadata SchemaElement.
PingLiuPing Sep 5, 2025
5c5f218
Implement iceberg sort order
PingLiuPing May 30, 2025
58ae621
Add clustered Iceberg writer mode.
PingLiuPing Sep 1, 2025
1ead19b
adding daily tests
zhouyuan Jul 3, 2025
619b561
remote gluten daily build
FelixYBW Nov 25, 2025
0e9e465
fix: remove website folder to bypass the security issues
zhouyuan Jul 9, 2025
33c8ab4
Disable flaky test: partialAggregateWithTableScan
zhouyuan Nov 21, 2025
867d04e
Merge branch 'ci-fix-pr' into staging-7eaf45f96-pr
wanglinsong Nov 28, 2025
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
The table of contents is too big for display.
Diff view
Diff view
  •  
  •  
  •  
225 changes: 225 additions & 0 deletions .github/workflows/gluten.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,225 @@
# Copyright (c) Facebook, Inc. and its affiliates.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.

name: Gluten Build

on:
pull_request:
paths:
- .github/workflows/gluten.yml
env:
MVN_CMD: mvn -ntp

jobs:

gluten-cpp-build:
name: gluten cpp build
# prevent errors when forks ff their main branch
if: ${{ github.repository == 'IBM/velox' }}
runs-on: ubuntu-22.04
env:
CCACHE_DIR: "${{ github.workspace }}/.ccache"
steps:
- uses: actions/checkout@v4
- name: Get Ccache
uses: actions/cache/restore@v4
with:
path: '${{ env.CCACHE_DIR }}'
key: ccache-ibm-velox
- name: Setup Gluten
run: |
git clone --depth 1 https://github.com/apache/incubator-gluten gluten && cd gluten
sed -i 's/ibm/ibm-xxx/gI' ep/build-velox/src/get-velox.sh
BRANCH=$(echo ${GITHUB_REF#refs/heads/})
# sed -i 's/VELOX_BRANCH=2025.*/VELOX_BRANCH=${BRANCH}/g' ep/build-velox/src/get-velox.sh
- name: Build Gluten native libraries
run: |
docker pull apache/gluten:vcpkg-centos-7
docker run -v $GITHUB_WORKSPACE:/work -w /work apache/gluten:vcpkg-centos-7 bash -c "
git config --global --add safe.directory /work
set -e
df -a
cd /work
git log -n 3
cd /work/gluten
git log -n 3
export CCACHE_DIR=/work/.ccache
export CCACHE_SLOPPINESS=file_macro,locale,time_macros
mkdir -p /work/.ccache
ccache -M 1G
ccache -sz
source /opt/rh/devtoolset-11/enable
source /opt/rh/rh-git227/enable
export NUM_THREADS=4
./dev/builddeps-veloxbe.sh --enable_vcpkg=ON --build_arrow=OFF --build_tests=OFF --build_benchmarks=OFF \
--build_examples=OFF --enable_s3=ON --enable_gcs=ON --enable_hdfs=ON --enable_abfs=ON --enable_enhanced_features=ON --velox_home=/work
pushd /work
git log -n 3
popd
ccache -s
mkdir -p /work/.m2/repository/org/apache/arrow/
cp -r /root/.m2/repository/org/apache/arrow/* /work/.m2/repository/org/apache/arrow/
"
- name: "Save ccache"
if: always()
uses: actions/cache/save@v4
id: ccache
with:
path: '${{ env.CCACHE_DIR }}'
key: ccache-ibm-velox

- uses: actions/upload-artifact@v4
with:
name: velox-native-lib-centos-7-${{github.sha}}
path: ./gluten/cpp/build/releases/
if-no-files-found: error
- uses: actions/upload-artifact@v4
with:
name: arrow-jars-centos-7-${{github.sha}}
path: .m2/repository/org/apache/arrow/
if-no-files-found: error

spark-test-spark32:
needs: gluten-cpp-build
runs-on: ubuntu-22.04
container: apache/gluten:centos-8-jdk8
steps:
- name: Setup Gluten
run: |
git clone --depth 1 https://github.com/apache/incubator-gluten gluten && cd gluten
- name: Download All Artifacts
uses: actions/download-artifact@v4
with:
name: velox-native-lib-centos-7-${{github.sha}}
path: ./gluten/cpp/build/releases
- name: Download Arrow Jars
uses: actions/download-artifact@v4
with:
name: arrow-jars-centos-7-${{github.sha}}
path: /root/.m2/repository/org/apache/arrow/
- name: Build package for Spark 3.2
run: |
cd $GITHUB_WORKSPACE/gluten
export SPARK_SCALA_VERSION=2.12
yum install -y java-17-openjdk-devel
export JAVA_HOME=/usr/lib/jvm/java-17-openjdk
export PATH=$JAVA_HOME/bin:$PATH
java -version
$MVN_CMD clean package -Pspark-3.2 -Pbackends-velox -Piceberg -Pdelta -Phudi -DskipTests

spark-test-spark34:
needs: gluten-cpp-build
runs-on: ubuntu-22.04
container: apache/gluten:centos-8-jdk8
steps:
- name: Setup Gluten
run: |
git clone --depth 1 https://github.com/apache/incubator-gluten gluten && cd gluten
- name: Download All Artifacts
uses: actions/download-artifact@v4
with:
name: velox-native-lib-centos-7-${{github.sha}}
path: ./gluten/cpp/build/releases
- name: Download Arrow Jars
uses: actions/download-artifact@v4
with:
name: arrow-jars-centos-7-${{github.sha}}
path: /root/.m2/repository/org/apache/arrow/
- name: Prepare spark.test.home for Spark 3.4.4 (other tests)
run: |
dnf module -y install python39 && \
alternatives --set python3 /usr/bin/python3.9 && \
pip3 install setuptools==77.0.3 && \
pip3 install pyspark==3.4.4 cython && \
pip3 install pandas==2.2.3 pyarrow==20.0.0
- name: Build and Run unit test for Spark 3.4.4 (other tests)
run: |
cd $GITHUB_WORKSPACE/gluten
export SPARK_SCALA_VERSION=2.12
yum install -y java-17-openjdk-devel
export JAVA_HOME=/usr/lib/jvm/java-17-openjdk
export PATH=$JAVA_HOME/bin:$PATH
java -version
export SPARK_HOME=/opt/shims/spark34/spark_home/
ls -l $SPARK_HOME
$MVN_CMD clean test -Pspark-3.4 -Pjava-17 -Pbackends-velox -Piceberg -Piceberg-test -Pdelta -Phudi -Pspark-ut \
-DtagsToExclude=org.apache.spark.tags.ExtendedSQLTest,org.apache.gluten.tags.UDFTest,org.apache.gluten.tags.EnhancedFeaturesTest,org.apache.gluten.tags.SkipTest \
-DargLine="-Dspark.test.home=$SPARK_HOME ${EXTRA_FLAGS}"
- name: Upload test report
if: always()
uses: actions/upload-artifact@v4
with:
name: ${{ github.job }}-report
path: '**/surefire-reports/TEST-*.xml'
- name: Upload unit tests log files
if: ${{ !success() }}
uses: actions/upload-artifact@v4
with:
name: ${{ github.job }}-test-log
path: |
**/target/*.log
**/gluten-ut/**/hs_err_*.log
**/gluten-ut/**/core.*
- name: Upload golden files
if: failure()
uses: actions/upload-artifact@v4
with:
name: ${{ github.job }}-golden-files
path: /tmp/tpch-approved-plan/**

spark-test-spark34-slow:
needs: gluten-cpp-build
runs-on: ubuntu-22.04
container: apache/gluten:centos-8-jdk8
steps:
- name: Setup Gluten
run: |
git clone --depth 1 https://github.com/apache/incubator-gluten gluten && cd gluten
- name: Download All Artifacts
uses: actions/download-artifact@v4
with:
name: velox-native-lib-centos-7-${{github.sha}}
path: ./gluten/cpp/build/releases
- name: Download Arrow Jars
uses: actions/download-artifact@v4
with:
name: arrow-jars-centos-7-${{github.sha}}
path: /root/.m2/repository/org/apache/arrow/
- name: Build and Run unit test for Spark 3.4.4 (slow tests)
run: |
cd $GITHUB_WORKSPACE/gluten
yum install -y java-17-openjdk-devel
export JAVA_HOME=/usr/lib/jvm/java-17-openjdk
export PATH=$JAVA_HOME/bin:$PATH
java -version
export SPARK_HOME=/opt/shims/spark34/spark_home/
ls -l $SPARK_HOME
$MVN_CMD clean test -Pspark-3.4 -Pjava-17 -Pbackends-velox -Piceberg -Pdelta -Pspark-ut -Phudi \
-DtagsToInclude=org.apache.spark.tags.ExtendedSQLTest \
-DargLine="-Dspark.test.home=$SPARK_HOME ${EXTRA_FLAGS}"
- name: Upload test report
if: always()
uses: actions/upload-artifact@v4
with:
name: ${{ github.job }}-report
path: '**/surefire-reports/TEST-*.xml'
- name: Upload unit tests log files
if: ${{ !success() }}
uses: actions/upload-artifact@v4
with:
name: ${{ github.job }}-test-log
path: |
**/target/*.log
**/gluten-ut/**/hs_err_*.log
**/gluten-ut/**/core.*
15 changes: 0 additions & 15 deletions .github/workflows/linux-build.yml
Original file line number Diff line number Diff line change
Expand Up @@ -15,21 +15,6 @@
name: Linux Build using GCC

on:
push:
branches:
- main
paths:
- velox/**
- '!velox/docs/**'
- CMakeLists.txt
- CMake/**
- scripts/setup-ubuntu.sh
- scripts/setup-common.sh
- scripts/setup-versions.sh
- scripts/setup-helper-functions.sh
- .github/workflows/linux-build.yml
- .github/workflows/linux-build-base.yml

pull_request:
paths:
- velox/**
Expand Down
4 changes: 2 additions & 2 deletions velox/common/base/AsyncSource.h
Original file line number Diff line number Diff line change
Expand Up @@ -121,7 +121,7 @@ class AsyncSource {
return nullptr;
}
if (making_) {
promise_ = std::make_unique<ContinuePromise>();
promise_ = std::make_unique<ContinuePromise>("AsyncSource::move");
wait = promise_->getSemiFuture();
} else {
if (!make_) {
Expand Down Expand Up @@ -178,7 +178,7 @@ class AsyncSource {
{
std::lock_guard<std::mutex> l(mutex_);
if (making_) {
promise_ = std::make_unique<ContinuePromise>();
promise_ = std::make_unique<ContinuePromise>("AsyncSource::close");
wait = promise_->getSemiFuture();
} else if (make_) {
make_ = nullptr;
Expand Down
2 changes: 1 addition & 1 deletion velox/common/caching/AsyncDataCache.h
Original file line number Diff line number Diff line change
Expand Up @@ -445,7 +445,7 @@ class CoalescedLoad {
return state_;
}

void cancel() {
virtual void cancel() {
setEndState(State::kCancelled);
}

Expand Down
19 changes: 11 additions & 8 deletions velox/common/future/VeloxPromise.h
Original file line number Diff line number Diff line change
Expand Up @@ -28,12 +28,15 @@ class VeloxPromise : public folly::Promise<T> {
VeloxPromise() : folly::Promise<T>() {}

explicit VeloxPromise(const std::string& context)
: folly::Promise<T>(), context_(context) {}
: folly::Promise<T>(), context_(context) {
if (context.empty()) {
LOG(WARNING)
<< "PROMISE: VeloxPromise must be constructed with a context.";
}
}

VeloxPromise(
folly::futures::detail::EmptyConstruct,
const std::string& context) noexcept
: folly::Promise<T>(folly::Promise<T>::makeEmpty()), context_(context) {}
explicit VeloxPromise(folly::futures::detail::EmptyConstruct) noexcept
: folly::Promise<T>(folly::Promise<T>::makeEmpty()) {}

~VeloxPromise() {
if (!this->isFulfilled()) {
Expand All @@ -52,8 +55,8 @@ class VeloxPromise : public folly::Promise<T> {
return *this;
}

static VeloxPromise makeEmpty(const std::string& context = "") noexcept {
return VeloxPromise<T>(folly::futures::detail::EmptyConstruct{}, context);
static VeloxPromise makeEmpty() noexcept {
return VeloxPromise<T>(folly::futures::detail::EmptyConstruct{});
}

private:
Expand All @@ -72,7 +75,7 @@ using ContinueFuture = folly::SemiFuture<folly::Unit>;
/// exception throwing and stack unwinding thus performance issue. See
/// https://github.com/prestodb/presto/issues/26094 for details.
static inline std::pair<ContinuePromise, ContinueFuture>
makeVeloxContinuePromiseContract(const std::string& promiseContext = "") {
makeVeloxContinuePromiseContract(const std::string& promiseContext) {
auto p = ContinuePromise(promiseContext);
auto f = p.getSemiFuture();
return std::make_pair(std::move(p), std::move(f));
Expand Down
10 changes: 7 additions & 3 deletions velox/common/memory/SharedArbitrator.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -897,13 +897,16 @@ void SharedArbitrator::growCapacity(ArbitrationOperation& op) {
RETURN_IF_TRUE(maybeGrowFromSelf(op));

if (!ensureCapacity(op)) {
const auto maxCapacity = op.participant()->maxCapacity();
MEM_POOL_CAP_EXCEEDED(
fmt::format(
"Can't grow {} capacity with {}. This will exceed its max capacity "
"Can't grow {} capacity with {}. This will exceed its {} "
"{}, current capacity {}.",
op.participant()->name(),
succinctBytes(op.requestBytes()),
succinctBytes(op.participant()->maxCapacity()),
capacity_ < maxCapacity ? "arbitrator capacity"
: "memory pool capacity",
succinctBytes(std::min(capacity_, maxCapacity)),
succinctBytes(op.participant()->capacity())),
op.participant()->pool());
}
Expand Down Expand Up @@ -1401,7 +1404,6 @@ uint64_t SharedArbitrator::reclaim(
if (participant->aborted()) {
removeGlobalArbitrationWaiter(participant->id());
}
freeCapacity(reclaimedBytes);

updateMemoryReclaimStats(
reclaimedBytes, reclaimTimeNs, localArbitration, stats);
Expand All @@ -1413,6 +1415,8 @@ uint64_t SharedArbitrator::reclaim(
<< " stats " << succinctBytes(stats.reclaimedBytes)
<< " numNonReclaimableAttempts "
<< stats.numNonReclaimableAttempts;

freeCapacity(reclaimedBytes);
if (reclaimedBytes == 0) {
FB_LOG_EVERY_MS(WARNING, 1'000) << fmt::format(
"Nothing reclaimed from memory pool {} with reclaim target {}, memory pool stats:\n{}\n{}",
Expand Down
2 changes: 1 addition & 1 deletion velox/common/memory/tests/MemoryCapExceededTest.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -69,7 +69,7 @@ TEST_P(MemoryCapExceededTest, singleDriver) {
// why).
std::vector<std::string> expectedTexts = {
"Can't grow ",
"capacity with 2.00MB. This will exceed its max capacity 5.00MB, current "
"capacity with 2.00MB. This will exceed its memory pool capacity 5.00MB, current "
"capacity 5.00MB.\n"
"ARBITRATOR[SHARED CAPACITY[6.00GB] STATS[numRequests 1 numRunning 1 "
"numSucceded 0 numAborted 0 numFailures 0 numNonReclaimableAttempts 0 "
Expand Down
3 changes: 2 additions & 1 deletion velox/common/memory/tests/SharedArbitratorTest.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -760,7 +760,7 @@ DEBUG_ONLY_TEST_P(
folly::EventCount taskPauseWait;
auto taskPauseWaitKey = taskPauseWait.prepareWait();

const auto fakeAllocationSize = kMemoryCapacity - (32L << 20);
const auto fakeAllocationSize = kMemoryCapacity - (2L << 20);

std::atomic<bool> injectAllocationOnce{true};
fakeOperatorFactory_->setAllocationCallback([&](Operator* op) {
Expand Down Expand Up @@ -1379,6 +1379,7 @@ TEST_P(
if (e.errorCode() != error_code::kMemCapExceeded.c_str() &&
e.errorCode() != error_code::kMemAborted.c_str() &&
e.errorCode() != error_code::kMemAllocError.c_str() &&
e.errorCode() != error_code::kMemArbitrationTimeout.c_str() &&
(e.message() != "Aborted for external error")) {
std::rethrow_exception(std::current_exception());
}
Expand Down
Loading
Loading