Skip to content
Draft
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
184 commits
Select commit Hold shift + click to select a range
25209fb
wip
taniabogatsch Jan 20, 2025
cd9b27a
some tidying
taniabogatsch Jan 21, 2025
6068d57
Merge branch 'v1.2-histrionicus' into reclaim-storage
taniabogatsch Jan 21, 2025
62d6d87
more reclaim space fixes
taniabogatsch Jan 21, 2025
cbfdb45
removing written blocks and moving the optimistic row group collectio…
taniabogatsch Jan 22, 2025
00474d4
Merge branch 'v1.2-histrionicus' into reclaim-storage
taniabogatsch Jan 23, 2025
6a807b8
implemented the filter
Tmonster Jan 23, 2025
b335726
added one test
Tmonster Jan 23, 2025
63b53f0
some refactoring to prepare for batch insert
taniabogatsch Jan 24, 2025
9203af5
use local table storage for batch insert
taniabogatsch Jan 25, 2025
ae5e494
turn into reference
taniabogatsch Jan 25, 2025
9ebf87d
fix broken test
Tmonster Jan 27, 2025
2a2d359
generate metric enum
Tmonster Jan 27, 2025
4f17c52
add missing includes
Tmonster Jan 27, 2025
f024e1d
tidying some stuff up
taniabogatsch Jan 27, 2025
82c2b89
Merge branch 'v1.2-histrionicus' into reclaim-storage
taniabogatsch Jan 27, 2025
3ff8737
this aligns the behavior of varchar->list with that of varchar->struct
Tishj Jan 27, 2025
edd2747
WIP: support for escaping in string -> list/struct cast, struct isn't…
Tishj Feb 5, 2025
f7503c6
Merge branch 'v1.2-histrionicus' into reclaim-storage
taniabogatsch Feb 5, 2025
49558e9
removing unescaped quotes, perhaps a little too aggressively, still WIP
Tishj Feb 5, 2025
8866325
leave the escapes in deeper list levels alone
Tishj Feb 5, 2025
595e4a8
add escaped doublequote to test
Tishj Feb 6, 2025
bb5ca2e
more WIP, worked on supporting the same escaping in MAP
Tishj Feb 6, 2025
8de6cd5
map, struct and list should all work correctly now
Tishj Feb 6, 2025
9b9df84
messed up one piece of escape handling logic
Tishj Feb 6, 2025
fe5a76f
Do duckdb_extract_statements to be able to execute pivot
pdet Feb 10, 2025
1dbbb6c
Issue #8265: AsOf Nested Loop
Feb 10, 2025
5d8434a
Issue #8265: AsOf Nested Loop
Feb 10, 2025
110808f
allow escaping whitespace
Tishj Feb 10, 2025
a5305f5
better way of dealing with escaped spaces
Tishj Feb 11, 2025
f65c097
yet another case of missing backslash escape logic
Tishj Feb 11, 2025
602fa82
don't trim escaped backslashes at the end of the input
Tishj Feb 11, 2025
80a8c37
whis is this leaking
pdet Feb 11, 2025
fb59f61
add support for unnamed struct format to VARCHAR->STRUCT cast
Tishj Feb 11, 2025
1748b07
need to recognize ( as a scope as well
Tishj Feb 11, 2025
1da281b
add another test, with escaped leading and trailing spaces, and escap…
Tishj Feb 11, 2025
8252d4d
some more nesting
Tishj Feb 11, 2025
d2a2d3d
adjust tests
Tishj Feb 11, 2025
a5173d3
fix tidy issues
Tishj Feb 11, 2025
50613b6
fix up test
Tishj Feb 12, 2025
384f5f0
moved the escaped case to the start of the cases, reduces complexity …
Tishj Feb 12, 2025
26e8d22
give varchar->struct the same treatment, escaped case should be on top
Tishj Feb 12, 2025
12e51ba
Ensure MergeCollectionTask has a writer
ywelsch Feb 12, 2025
e78d96e
and varchar->map as well
Tishj Feb 12, 2025
9bdf2d5
same for unnamed structs
Tishj Feb 12, 2025
6a25a90
I think I should use duckdb_destroy_extracted instead of delete
pdet Feb 12, 2025
3edeac6
turn off RESPECT_SCOPES for struct keys, so escapes are interpreted e…
Tishj Feb 12, 2025
da9ab90
more tests
Tishj Feb 12, 2025
ba1cb2e
add escape tests for maps, also fix a bug: map keys are allowed to be…
Tishj Feb 12, 2025
89eccf1
Woopsie
pdet Feb 12, 2025
f1179bf
fix unused variable
Tishj Feb 12, 2025
c640ee1
improve performance of hashing longer strings
Feb 12, 2025
f503fba
implement bit-packing for RleBpEncoder and allow for larger dictionaries
Feb 12, 2025
c8e5916
init primitive dictionary
Feb 12, 2025
9b8fff6
Ensure MergeCollectionTask has a writer (#16207)
Mytherin Feb 12, 2025
1e5d01d
support unnamed structs to appear in the other casts (MAP KEY+VALUE, …
Tishj Feb 12, 2025
7642be6
Issue #8265: AsOf Nested Loop
Feb 12, 2025
8629291
Also use destroy_statement
pdet Feb 12, 2025
5649f91
Issue #8265: AsOf Nested Loop
hawkfish Feb 12, 2025
de5a838
Issue #8265: AsOf Nested Loop
hawkfish Feb 13, 2025
e136bc7
integrate PrimitiveDictionary into Parquet writer and improve writing…
Feb 13, 2025
2917298
more destroy
pdet Feb 13, 2025
61be508
simplify SkipToClose
Tishj Feb 13, 2025
3f2430c
trigger a cast to unnamed struct
Tishj Feb 13, 2025
c674964
no need for a 'lvl' list nesting tracker
Tishj Feb 13, 2025
7de75f6
remove temp_state, IsNull just needs a buf, start and end
Tishj Feb 13, 2025
9ca9ca9
use random seeds for bernoulli sample when parallel is enabled
Tmonster Feb 13, 2025
981b4c2
deduplicate some logic
Tishj Feb 13, 2025
90ff46c
use the same function in the list value
Tishj Feb 13, 2025
201ba3f
also use the same function in struct value
Tishj Feb 13, 2025
a460765
also use the same function in unnamed struct cast
Tishj Feb 13, 2025
a6cc094
improve cast error message for VARCHAR -> nested type
Tishj Feb 13, 2025
591f090
make string_dictionary_page_size_limit for Parquet writer configurable
Feb 13, 2025
5d79f5d
some more fast paths/optimizations for parquet writer
Feb 13, 2025
503d0c0
add another fast path
Feb 13, 2025
301cc55
WIP generalize rowid column to "virtual columns", and make "filename"…
Mytherin Feb 13, 2025
b3a67cd
Backport #16115
Mytherin Feb 11, 2025
32a0f49
Move get_virtual_columns to a separate table function instead of tryi…
Mytherin Feb 13, 2025
cd514e9
format-fix
Tmonster Feb 13, 2025
d2cec3e
Backport #16115 (#16227)
Mytherin Feb 13, 2025
6549a0e
Correctly deal with and propagate virtual columns
Mytherin Feb 13, 2025
2a75b22
Issue #8265: AsOf Nested Loop
hawkfish Feb 13, 2025
b6e8b33
Fix building Duckdb on Windows with MSVC 2022. _win32 is the correct …
cfis Feb 14, 2025
012b693
Building the Python bindings on Windows fails with MSVC and having Py…
cfis Feb 14, 2025
535fe5a
-std=c++11 is invalid with MSVC. It it is set correctly here - https:…
cfis Feb 14, 2025
057c2d4
Various fixes for virtual columns <> MultiFileReader interaction
Mytherin Feb 14, 2025
94e5002
use cast operator for src/target types in primitive dictionary, and a…
Feb 14, 2025
19ca17f
Add EMPTY column that can be used for COUNT(*) - but not queried - an…
Mytherin Feb 14, 2025
fad8546
take vectors larger than standard into account
Feb 14, 2025
8cb3c56
Support virtual columns in the CSV reader
Mytherin Feb 14, 2025
4b827f0
Format fix
Mytherin Feb 14, 2025
1990d37
take parent nulls into account in fast path writing define levels
Feb 14, 2025
09a5da2
prefer allocator over unique array
Feb 14, 2025
e14c6ac
Add missing includes
Mytherin Feb 14, 2025
4a7d440
Fix for statistics propagation in Parquet for virtual columns
Mytherin Feb 14, 2025
8cb2b66
cast and directly write to memorystream in primitive dictionary so we…
Feb 14, 2025
32e1058
all parquet tests working again
Feb 14, 2025
63f15c8
Use _win32 with MSVC (#16235)
Mytherin Feb 14, 2025
deefb6f
Fix Python 3 executable name on Windows (#16236)
Mytherin Feb 14, 2025
3148218
Fix for filename on windows
Mytherin Feb 14, 2025
5f27683
Deleted copy constructor of pending query
NiclasHaderer Feb 14, 2025
20a0961
format fix
NiclasHaderer Feb 14, 2025
efc5413
slightly tweak hash function
Feb 14, 2025
245f034
add clickbench write benchmark
Feb 14, 2025
d99ceb6
backslashes only escape double/single quotes outside of quotes, insid…
Tishj Feb 14, 2025
d0fbc86
arena allocator for minmaxn and just skip nulls when creating enum
Feb 14, 2025
4a20971
codequality fixes and buffer-manage parquet columndatacollections
Feb 14, 2025
2951ea0
Fix -std=c++11 (#16237)
Mytherin Feb 14, 2025
7d90767
Issue #8265: AsOf Nested Loop (#16218)
Mytherin Feb 14, 2025
0682cec
Include extension_util.hpp in libduckdb
mlafeldt Feb 15, 2025
dfdd6a5
Deleted copy constructor of pending query (#16242)
Mytherin Feb 16, 2025
28c95be
Include extension_util.hpp in libduckdb (#16255)
Mytherin Feb 16, 2025
141c449
Report errors caused by get_database in C extensions
mlafeldt Feb 15, 2025
a46237b
Simplify SetError
mlafeldt Feb 15, 2025
2b87326
Correctly report errors caused by get_database in C extensions (#16253)
Mytherin Feb 16, 2025
381f75e
Issue #16250: Window Range Performance
hawkfish Feb 17, 2025
5ac9f9e
format/test fixes for parquet writer
Feb 17, 2025
3c90da4
whenever seed is set, parallel sink is false
Tmonster Feb 17, 2025
7ab1893
Generalize `rowid` into the concept of virtual columns, and make `fil…
Mytherin Feb 17, 2025
63502f1
Merge branch 'v1.2-histrionicus' into reclaim-storage
taniabogatsch Feb 17, 2025
0483d90
merge resolution
taniabogatsch Feb 17, 2025
694ad70
some ci fixes
Feb 17, 2025
7b9d464
Modify histogram test to statement ok since the test can be inconsist…
Mytherin Feb 17, 2025
bf1d472
Check avg count
Mytherin Feb 17, 2025
8651a48
move optimistic writers
taniabogatsch Feb 17, 2025
3136585
Execute does not like a dirty validity mask, use vector caches (throu…
Tishj Feb 17, 2025
4b0167d
Merge branch 'main' into parquet_stuff
Feb 17, 2025
ee5cc90
change result order now that string hash has changed
Feb 17, 2025
b27267e
Avoid caching the compressed buffer in the ColumnReader
Mytherin Feb 17, 2025
b21d19b
improve performance of boolean column writer too
Feb 17, 2025
7d720df
Fix #16260: correctly handle parameters in getvariable
Mytherin Feb 17, 2025
9c3cd8a
Handle macros as well
Mytherin Feb 17, 2025
52811a9
use random seeds for bernoulli sample when parallel is enabled (#16223)
Mytherin Feb 17, 2025
ba6fe78
change extension install mode to not_installed instead of null
samansmink Feb 17, 2025
80fa4cd
add the correct variant of the flag based on the compiler (MSVC or not)
Tishj Feb 17, 2025
7c3296f
add test
samansmink Feb 17, 2025
f066290
Avoid calling SetFilterAlwaysTrue multiple times in RowGroup::CheckZo…
Mytherin Feb 17, 2025
6637b90
Add safeguard to SetFilterAlwaysTrue
Mytherin Feb 17, 2025
fe56c8f
fix scanning from normal leaf to nested leaf
taniabogatsch Feb 17, 2025
1d06c91
Fix #16231: refer to order by condition in ARRAY(SUBQUERY) by alias i…
Mytherin Feb 17, 2025
7dec52e
add pragma to truncate log
samansmink Feb 17, 2025
fc58d8b
increase max variation for linux
taniabogatsch Feb 17, 2025
16f1151
fix #16257
Feb 17, 2025
9dfc067
Modify histogram test to more fuzzily check boundaries since the test…
Mytherin Feb 17, 2025
73e15e8
even faster boolean writing
Feb 17, 2025
81bd903
Adding fuzzer tests
pdet Feb 10, 2025
9c74515
Fix for buffer_size=1 in encoding, and check for conflicts between ma…
pdet Feb 10, 2025
7cba0a9
Adjust a couple more tests
pdet Feb 10, 2025
fb12980
More tests and fixes
pdet Feb 12, 2025
b2ac033
Add hangs
pdet Feb 12, 2025
0110b24
This needs to be slightly bigger for windows
pdet Feb 12, 2025
28fd731
Format
pdet Feb 12, 2025
a04d906
Verify that the table names are valid
pdet Feb 12, 2025
36538e6
Drastrically minimize the number of tests
pdet Feb 17, 2025
abc3a3e
[Dev] Fix issue in `TRY` expression with `dictionary_expression` Vect…
Mytherin Feb 17, 2025
c27c052
Parquet Reader: avoid caching the compressed buffer in the ColumnRead…
Mytherin Feb 17, 2025
df68ebd
Fix #16260: correctly handle parameters in getvariable (#16264)
Mytherin Feb 17, 2025
3942e67
[Python Dev] Add the correct variant of the `-std=c++11` flag based o…
Mytherin Feb 17, 2025
5adf302
Fix extension install mode null (#16268)
Mytherin Feb 17, 2025
2ba2b65
Avoid calling SetFilterAlwaysTrue multiple times in RowGroup::CheckZo…
Mytherin Feb 17, 2025
4c8d1f9
[Fix] Scanning from normal leaf to nested leaf (#16270)
Mytherin Feb 17, 2025
cfafb10
Fix #16231: refer to order by condition in ARRAY(SUBQUERY) by alias i…
Mytherin Feb 17, 2025
7e72364
Better comment in optimizer.cpp
JasonPunyon Feb 1, 2025
aadd543
Issue #16250: Window Range Performance
hawkfish Feb 18, 2025
d1062f1
Fix #16257 (#16275)
Mytherin Feb 18, 2025
e1d1131
AFL Tests for the CSV reader (#16280)
Mytherin Feb 18, 2025
8605f80
Improve Parquet writer performance (#16243)
Mytherin Feb 18, 2025
aa82eb9
change string hash function again, now inlined strings are hashed bra…
Feb 18, 2025
42859d2
Issue #16250: Window Range Performance (#16276)
Mytherin Feb 18, 2025
12e96f6
some more fast paths
Feb 18, 2025
5c0fc80
Merge branch 'v1.2-histrionicus'
Mytherin Feb 18, 2025
e2ce2ca
Merge branch 'main' into parquet_stuff
Feb 18, 2025
b2bf617
fix hash function for empty strings and fix test output now that hash…
Feb 18, 2025
85bb66f
Merge branch 'main' into arena_allocate_minmaxn
Feb 18, 2025
138639a
Merge v1.2-histrionicus into main (#16284)
Mytherin Feb 18, 2025
afe41f7
Many reclaim storage fixes (#15825)
Mytherin Feb 18, 2025
53edbbc
Arena allocator for `MinMaxN` and skip `NULL`s when creating enum (#1…
Mytherin Feb 18, 2025
6682a52
make ValidityMask::RowIsValidUnsafe really unsafe
xuke-hat Feb 18, 2025
7f22b1e
Add pragma to truncate duckdb log storage (#16274)
Mytherin Feb 18, 2025
219bafa
Some more Parquet writer performance improvements (#16287)
Mytherin Feb 18, 2025
0876e92
Do duckdb_extract_statements to be able to execute pivot in ADBC (#16…
Mytherin Feb 18, 2025
1e8caf8
[Dev] Improve/Add handling of escapes in VARCHAR -> list/struct/map a…
Mytherin Feb 18, 2025
e249a40
make ValidityMask::RowIsValidUnsafe really unsafe (#16302)
Mytherin Feb 18, 2025
b929ea7
Merge branch 'main' into optimizer_remove_unnecessary_projections
Tmonster Feb 19, 2025
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
17 changes: 13 additions & 4 deletions CMakeLists.txt
Original file line number Diff line number Diff line change
Expand Up @@ -193,7 +193,7 @@
else()
if(APPLE AND CMAKE_SYSTEM_PROCESSOR MATCHES "arm64")
if("${CMAKE_CXX_COMPILER_VERSION}" VERSION_GREATER 14.0)
message(

Check warning on line 196 in CMakeLists.txt

View workflow job for this annotation

GitHub Actions / OSX Extensions Release (x86_64)

Not disabling vptr sanitizer on M1 Macbook - set DISABLE_VPTR_SANITIZER

Check warning on line 196 in CMakeLists.txt

View workflow job for this annotation

GitHub Actions / OSX Extensions Release (x86_64)

Not disabling vptr sanitizer on M1 Macbook - set DISABLE_VPTR_SANITIZER

Check warning on line 196 in CMakeLists.txt

View workflow job for this annotation

GitHub Actions / OSX Extensions Release (arm64)

Not disabling vptr sanitizer on M1 Macbook - set DISABLE_VPTR_SANITIZER

Check warning on line 196 in CMakeLists.txt

View workflow job for this annotation

GitHub Actions / OSX Extensions Release (arm64)

Not disabling vptr sanitizer on M1 Macbook - set DISABLE_VPTR_SANITIZER

Check warning on line 196 in CMakeLists.txt

View workflow job for this annotation

GitHub Actions / OSX Extensions Release (arm64)

Not disabling vptr sanitizer on M1 Macbook - set DISABLE_VPTR_SANITIZER
WARNING
"Not disabling vptr sanitizer on M1 Macbook - set DISABLE_VPTR_SANITIZER manually if you run into issues with false positives in the sanitizer"
)
Expand All @@ -203,7 +203,7 @@
endif()
endif()

if(${ENABLE_UBSAN})

Check warning on line 206 in CMakeLists.txt

View workflow job for this annotation

GitHub Actions / Windows Extensions (64-bit)

TIFF support is not enabled and will result in the inability to read some

Check warning on line 206 in CMakeLists.txt

View workflow job for this annotation

GitHub Actions / OSX Extensions Release (x86_64)

TIFF support is not enabled and will result in the inability to read some

Check warning on line 206 in CMakeLists.txt

View workflow job for this annotation

GitHub Actions / OSX Extensions Release (arm64)

TIFF support is not enabled and will result in the inability to read some
if(${ENABLE_THREAD_SANITIZER})
message(
WARNING
Expand Down Expand Up @@ -1065,7 +1065,7 @@
macro(register_external_extension NAME URL COMMIT DONT_LINK DONT_BUILD LOAD_TESTS PATH INCLUDE_PATH TEST_PATH APPLY_PATCHES LINKED_LIBS SUBMODULES EXTENSION_VERSION)
include(FetchContent)
if (${APPLY_PATCHES})
set(PATCH_COMMAND python3 ${CMAKE_SOURCE_DIR}/scripts/apply_extension_patches.py ${CMAKE_SOURCE_DIR}/.github/patches/extensions/${NAME}/)
set(PATCH_COMMAND ${Python3_EXECUTABLE} ${CMAKE_SOURCE_DIR}/scripts/apply_extension_patches.py ${CMAKE_SOURCE_DIR}/.github/patches/extensions/${NAME}/)
endif()
FETCHCONTENT_DECLARE(
${NAME}_extension_fc
Expand Down Expand Up @@ -1389,7 +1389,7 @@

add_custom_target(
duckdb_merge_vcpkg_manifests ALL
COMMAND python3 scripts/merge_vcpkg_deps.py ${VCPKG_PATHS} ${EXT_NAMES}
COMMAND ${Python3_EXECUTABLE} scripts/merge_vcpkg_deps.py ${VCPKG_PATHS} ${EXT_NAMES}
WORKING_DIRECTORY ${PROJECT_SOURCE_DIR}
COMMENT Generates a shared vcpkg manifest from the individual extensions)
string(REPLACE ";" ", " VCPKG_NAMES_COMMAS "${VCPKG_NAMES}")
Expand Down Expand Up @@ -1420,6 +1420,15 @@
set(ALL_COMPILE_FLAGS "${CMAKE_CXX_FLAGS}")
endif()

# Check for MSVC compiler and set the correct C++ standard flag
if(MSVC)
# MSVC does not support `-std=c++11` or `-std=c++14`, use `/std:c++14`
set(ALL_COMPILE_FLAGS "${ALL_COMPILE_FLAGS} /std:c++14")
else()
# For non-MSVC compilers, use the `-std=c++11`
set(ALL_COMPILE_FLAGS "${ALL_COMPILE_FLAGS} -std=c++11")
endif()

get_target_property(duckdb_libs duckdb LINK_LIBRARIES)

set(PIP_COMMAND
Expand All @@ -1432,9 +1441,9 @@
)

if(PYTHON_EDITABLE_BUILD)
set(PIP_COMMAND ${PIP_COMMAND} python3 -m pip install --editable .)
set(PIP_COMMAND ${PIP_COMMAND} ${Python3_EXECUTABLE} -m pip install --editable .)
else()
set(PIP_COMMAND ${PIP_COMMAND} python3 -m pip install .)
set(PIP_COMMAND ${PIP_COMMAND} ${Python3_EXECUTABLE} -m pip install .)
endif()

if(USER_SPACE)
Expand Down
23 changes: 23 additions & 0 deletions benchmark/parquet/clickbench_write.benchmark
Original file line number Diff line number Diff line change
@@ -0,0 +1,23 @@
# name: benchmark/parquet/clickbench_write.benchmark
# description: Write ClickBench data to Parquet
# group: [parquet]

require httpfs

require parquet

name ClickBench Write Parquet
group Clickbench

cache clickbench.duckdb

load benchmark/clickbench/queries/load.sql

init
set preserve_insertion_order=false;

run
COPY hits TO '${BENCHMARK_DIR}/hits.parquet';

result I
10000000
Binary file added data/csv/afl/20250211_csv_fuzz_crash/case_53.csv
Binary file not shown.
Binary file added data/csv/afl/4172/case_4.csv
Binary file not shown.
4 changes: 4 additions & 0 deletions data/csv/flights.csv
Original file line number Diff line number Diff line change
@@ -0,0 +1,4 @@
FlightDate|UniqueCarrier|OriginCityName|DestCityName
1988-01-01|AA|New York, NY|Los Angeles, CA
1988-01-02|AA|New York, NY|Los Angeles, CA
1988-01-03|AA|New York, NY|Los Angeles, CA
Original file line number Diff line number Diff line change
Expand Up @@ -545,8 +545,8 @@ class ArgMinMaxNState {
BinaryAggregateHeap<K, V, COMPARATOR> heap;

bool is_initialized = false;
void Initialize(idx_t nval) {
heap.Initialize(nval);
void Initialize(ArenaAllocator &allocator, idx_t nval) {
heap.Initialize(allocator, nval);
is_initialized = true;
}
};
Expand Down Expand Up @@ -601,7 +601,7 @@ static void ArgMinMaxNUpdate(Vector inputs[], AggregateInputData &aggr_input, id
if (nval >= MAX_N) {
throw InvalidInputException("Invalid input for arg_min/arg_max: n value must be < %d", MAX_N);
}
state.Initialize(UnsafeNumericCast<idx_t>(nval));
state.Initialize(aggr_input.allocator, UnsafeNumericCast<idx_t>(nval));
}

// Now add the input to the heap
Expand Down
1 change: 1 addition & 0 deletions extension/core_functions/scalar/generic/least.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -2,6 +2,7 @@
#include "core_functions/scalar/generic_functions.hpp"
#include "duckdb/function/create_sort_key.hpp"
#include "duckdb/planner/expression/bound_function_expression.hpp"
#include "duckdb/planner/expression_binder.hpp"

namespace duckdb {

Expand Down
2 changes: 2 additions & 0 deletions extension/json/include/json_scan.hpp
Original file line number Diff line number Diff line change
Expand Up @@ -360,6 +360,8 @@ struct JSONScan {
const TableFunction &function);
static unique_ptr<FunctionData> Deserialize(Deserializer &deserializer, TableFunction &function);

static virtual_column_map_t GetVirtualColumns(ClientContext &context, optional_ptr<FunctionData> bind_data);

static void TableFunctionDefaults(TableFunction &table_function);
};

Expand Down
11 changes: 10 additions & 1 deletion extension/json/json_scan.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -171,7 +171,7 @@ unique_ptr<GlobalTableFunctionState> JSONGlobalTableFunctionState::Init(ClientCo
const auto &col_id = input.column_ids[col_idx];

// Skip any multi-file reader / row id stuff
if (col_id == bind_data.reader_bind.filename_idx || IsRowIdColumnId(col_id)) {
if (col_id == bind_data.reader_bind.filename_idx || IsVirtualColumn(col_id)) {
continue;
}
bool skip = false;
Expand Down Expand Up @@ -1025,6 +1025,14 @@ unique_ptr<FunctionData> JSONScan::Deserialize(Deserializer &deserializer, Table
return std::move(result);
}

virtual_column_map_t JSONScan::GetVirtualColumns(ClientContext &context, optional_ptr<FunctionData> bind_data) {
auto &csv_bind = bind_data->Cast<JSONScanData>();
virtual_column_map_t result;
MultiFileReader::GetVirtualColumns(context, csv_bind.reader_bind, result);
result.insert(make_pair(COLUMN_IDENTIFIER_EMPTY, TableColumn("", LogicalType::BOOLEAN)));
return result;
}

void JSONScan::TableFunctionDefaults(TableFunction &table_function) {
MultiFileReader().AddParameters(table_function);

Expand All @@ -1039,6 +1047,7 @@ void JSONScan::TableFunctionDefaults(TableFunction &table_function) {

table_function.serialize = Serialize;
table_function.deserialize = Deserialize;
table_function.get_virtual_columns = GetVirtualColumns;

table_function.projection_pushdown = true;
table_function.filter_pushdown = false;
Expand Down
10 changes: 4 additions & 6 deletions extension/parquet/column_reader.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -304,7 +304,8 @@ void ColumnReader::PreparePageV2(PageHeader &page_hdr) {

auto compressed_bytes = page_hdr.compressed_page_size - uncompressed_bytes;

AllocateCompressed(compressed_bytes);
ResizeableBuffer compressed_buffer;
compressed_buffer.resize(GetAllocator(), compressed_bytes);
reader.ReadData(*protocol, compressed_buffer.ptr, compressed_bytes);

DecompressInternal(chunk->meta_data.codec, compressed_buffer.ptr, compressed_bytes, block->ptr + uncompressed_bytes,
Expand All @@ -319,10 +320,6 @@ void ColumnReader::AllocateBlock(idx_t size) {
}
}

void ColumnReader::AllocateCompressed(idx_t size) {
compressed_buffer.resize(GetAllocator(), size);
}

void ColumnReader::PreparePage(PageHeader &page_hdr) {
AllocateBlock(page_hdr.uncompressed_page_size + 1);
if (chunk->meta_data.codec == CompressionCodec::UNCOMPRESSED) {
Expand All @@ -333,7 +330,8 @@ void ColumnReader::PreparePage(PageHeader &page_hdr) {
return;
}

AllocateCompressed(page_hdr.compressed_page_size + 1);
ResizeableBuffer compressed_buffer;
compressed_buffer.resize(GetAllocator(), page_hdr.compressed_page_size + 1);
reader.ReadData(*protocol, compressed_buffer.ptr, page_hdr.compressed_page_size);

DecompressInternal(chunk->meta_data.codec, compressed_buffer.ptr, page_hdr.compressed_page_size, block->ptr,
Expand Down
Loading
Loading