Releases: geopandas/pyogrio
Releases · geopandas/pyogrio
Version v0.12.1
Version v0.12.0
Potentially breaking changes
- Return JSON fields (as identified by GDAL) as dicts/lists in
read_dataframe;
these were previously returned as strings (#556). - Drop support for GDAL 3.4 and 3.5 (#584).
Improvements
- Add
datetime_as_stringandmixed_offsets_as_utcparameters toread_dataframe
to choose the way datetime columns are returned + several fixes when reading and
writing datetimes (#486). - Add listing of GDAL data types and subtypes to
read_info(#556). - Add support to read list fields without arrow (#558, #597).
Bug fixes
- Fix decode error reading an sqlite file on Windows (#568).
- Fix wrong layer name when creating .gpkg.zip file (#570).
- Fix segfault on providing an invalid value for
layerinread_info(#564). - Fix error when reading data with
use_arrow=Trueafter having used the
Parquet driver with GDAL>=3.12 (#601).
Packaging
- Wheels are now available for Python 3.14 (#579).
- The GDAL library included in the wheels is upgraded from 3.10.3 to 3.11.4 (#578).
- Add libkml driver to the wheels for more recent Linux platforms supported
by manylinux_2_28, macOS, and Windows (#561). - Add libspatialite to the wheels (#546).
- Minimum required Python version is now 3.10 (#557).
- Initial support for free-threaded Python builds, with the extension module
declaring free-threaded support and wheels for Python 3.13t and 3.14t being
built (#562).
Version 0.11.1
Version v0.11.0
Improvements
- Capture all errors logged by gdal when opening a file fails (#495).
- Add support to read and write ".gpkg.zip" (GDAL >= 3.7), ".shp.zip", and ".shz"
files (#527). - Compatibility with the string dtype in the upcoming pandas 3.0 release (#493).
Bug fixes
- Fix WKB writing on big-endian systems (#497).
- Fix writing fids to e.g. GPKG file with
use_arrow(#511). - Fix error in
write_dataframewhen writing an empty or all-None object
column withuse_arrow(#512).
Packaging
- The GDAL library included in the wheels is upgraded from 3.9.2 to 3.10.3 (#499).
Version 0.10.0
Improvements
- Add support to read, write, list, and remove
/vsimem/files (#457).
Bug fixes
- Silence warning from
write_dataframewithGeoSeries.notna()(#435). - Enable mask & bbox filter when geometry column not read (#431).
- Raise NotImplmentedError when user attempts to write to an open file handle (#442).
- Prevent seek on read from compressed inputs (#443).
Packaging
- For the conda-forge package, change the dependency from
libgdalto
libgdal-core. This package is significantly smaller as it doesn't contain
some large GDAL plugins. Extra plugins can be installed as seperate conda
packages if needed: more info here.
This also leads topyprojbecoming an optional dependency; you will need
to installpyprojin order to support spatial reference systems (#452). - The GDAL library included in the wheels is updated from 3.8.5 to GDAL 3.9.2 (#466).
- pyogrio now requires a minimum version of Python >= 3.9 (#473).
- Wheels are now available for Python 3.13.
Version 0.9.0
Version v0.8.0
Improvements
- Support for writing based on Arrow as the transfer mechanism of the data
from Python to GDAL (requires GDAL >= 3.8). This is provided through the
newpyogrio.raw.write_arrowfunction, or by using theuse_arrow=True
option inpyogrio.write_dataframe(#314, #346). - Add support for
fidsfilter toread_arrowandopen_arrow, and to
read_dataframewithuse_arrow=True(#304). - Add some missing properties to
read_info, including layer name, geometry name
and FID column name (#365). read_arrowandopen_arrownow provide
GeoArrow-compliant extension metadata,
including the CRS, when using GDAL 3.8 or higher (#366).- The
open_arrowfunction can now be used without apyarrowdependency. By
default, it will now return a stream object implementing the
Arrow PyCapsule Protocol
(i.e. having an__arrow_c_stream__method). This object can then be consumed
by your Arrow implementation of choice that supports this protocol. To keep
the previous behaviour of returning apyarrow.RecordBatchReader, specify
use_pyarrow=True(#349). - Warn when reading from a multilayer file without specifying a layer (#362).
- Allow writing to a new in-memory datasource using io.BytesIO object (#397).
Bug fixes
- Fix error in
write_dataframeif input has a date column and
non-consecutive index values (#325). - Fix encoding issues on windows for some formats (e.g. ".csv") and always write ESRI
Shapefiles using UTF-8 by default on all platforms (#361). - Raise exception in
read_arroworread_dataframe(..., use_arrow=True)if
a boolean column is detected due to error in GDAL reading boolean values for
FlatGeobuf / GPKG drivers (#335, #387); this has been fixed in GDAL >= 3.8.3. - Properly ignore fields not listed in
columnsparameter when reading from
the data source not using the Arrow API (#391). - Properly handle decoding of ESRI Shapefiles with user-provided
encoding
option forread,read_dataframe, andopen_arrow, and correctly encode
Shapefile field names and text values to the user-providedencodingfor
writeandwrite_dataframe(#384). - Fixed bug preventing reading from bytes or file-like in
read_arrow/
open_arrow(#407).
Packaging
- The GDAL library included in the wheels is updated from 3.7.2 to GDAL 3.8.5.
Potentially breaking changes
- Using a
whereexpression combined with a list ofcolumnsthat does not include
the column referenced in the expression is not recommended and will now
return results based on driver-dependent behavior, which may include either
returning empty results (even if non-empty results are expected fromwhereparameter)
or raise an exception (#391). Previous versions of pyogrio incorrectly
set ignored fields against the data source, allowing it to return non-empty
results in these cases.
Version 0.7.2
Version 0.7.1
Bug fixes
- Fix unspecified dependency on
packaging(#318).
Version 0.7.0
Improvements
- Support reading and writing datetimes with timezones (#253).
- Support writing dataframes without geometry column (#267).
- Calculate feature count by iterating over features if GDAL returns an
unknown count for a data layer (e.g., OSM driver); this may have signficant
performance impacts for some data sources that would otherwise return an
unknown count (count is used inread_info,read,read_dataframe) (#271). - Add
arrow_to_pandas_kwargsparameter toread_dataframe+ reduce memory usage
withuse_arrow=True(#273) - In
read_info, the result now also contains thetotal_boundsof the layer as well
as some extracapabilitiesof the data source driver (#281). - Raise error if
readorread_dataframeis called with parameters to read no
columns, geometry, or fids (#280). - Automatically detect supported driver by extension for all available
write drivers and addition ofdetect_write_driver(#270). - Addition of
maskparameter toopen_arrow,read,read_dataframe,
andread_boundsfunctions to select only the features in the dataset that
intersect the mask geometry (#285). Note: GDAL < 3.8.0 returns features that
intersect the bounding box of the mask when using the Arrow interface for
some drivers; this has been fixed in GDAL 3.8.0. - Removed warning when no features are read from the data source (#299).
- Add support for
force_2d=Truewithuse_arrow=Trueinread_dataframe(#300).
Other changes
-
test suite requires Shapely >= 2.0
-
using
skip_featuresgreater than the number of features available in a data
layer now returns empty arrays forreadand an empty DataFrame for
read_dataframeinstead of raising aValueError(#282). -
enabled
skip_featuresandmax_featuresforread_arrowand
read_dataframe(path, use_arrow=True). Note that this incurs overhead
because all features up to the next batch size abovemax_features(or size
of data layer) will be read prior to slicing out the requested range of
features (#282). -
The
use_arrow=Trueoption can be enabled globally for testing using the
PYOGRIO_USE_ARROW=1environment variable (#296).
Bug fixes
- Fix int32 overflow when reading int64 columns (#260)
- Fix
fid_as_index=Truedoesn't set fid as index usingread_dataframewith
use_arrow=True(#265) - Fix errors reading OSM data due to invalid feature count and incorrect
reading of OSM layers beyond the first layer (#271) - Always raise an exception if there is an error when writing a data source
(#284)
Potentially breaking changes
- In
read_info(#281):- the
featuresproperty in the result will now be -1 if calculating the
feature count is an expensive operation for this driver. You can force it to be
calculated using theforce_feature_countparameter. - for boolean values in the
capabilitiesproperty, the values will now be
booleans instead of 1 or 0.
- the
Packaging
- The GDAL library included in the wheels is updated from 3.6.4 to GDAL 3.7.2.