Conversation
- Use previous result if current result would result in an empty geometry - Explode Multi- into LineStrings (because MultiLineStrings can cause issues in other places)
p-snft
left a comment
There was a problem hiding this comment.
I agree that improvements to the current welding can still be made. Honestly, however, I did not get all of the details of the current implementation, which looks very complicated to me. Thus, it is hard for me to evaluate your changes.
|
|
||
|
|
||
| def gdf_to_df(gdf): | ||
| def gdf_to_df(df): |
There was a problem hiding this comment.
Does it make sense to rename this to something like drop_geometry_data? In particular, as the new parameter name suggests that that also plain DataFrames (not only GeoDataFrames) are okay.
There was a problem hiding this comment.
At first my idea was to only write df = gdf.drop(columns=gdf.geometry.name). But I thought it would be saver to check if the given DataFrame is actually a GeoDataFrame. In fact, the previous version of that function actually works with DataFrames, because it would not change anything if no column named "geometry" is found.
Therefore plain DataFrames were ok before, and are ok now. I feel like renaming the input to "df" is only logical, in that sense.
I do not feel the need to rename the function, but I would not argue against it.
| if len(gdf_line_net_new) == 0: | ||
| gdf_line_net_new = gdf_line_net_last | ||
| break |
There was a problem hiding this comment.
Can you please add a comment when this might happen?
There was a problem hiding this comment.
If I remember correctly, I had a case where weld_segments would just delete all lines in a network. Adding that change did not solve my actual problem (because it only returns the previous state of an almost deleted network) but it simplified searching for the root of the problem by not breaking the following code (which would throw other errors with an empty network).
I must admit that I did not store the problematic input network and cannot remember the details. As I said, it was a rare edge case.
| # Merge the MultiLineString into a single object | ||
| merged_line = linemerge(multi_line) | ||
| gdf_merged = gpd.GeoDataFrame(geometry=[merged_line], crs=crs) | ||
| gdf_merged = gdf_merged.explode() # Explode Multi- into LineStrings |
There was a problem hiding this comment.
There are other places in the code (e.g. specifically geometry_operations.split_multilinestr_to_linestr() called at the beginning of connect_points.process_geometry()) which make sure there are no MultiLineStrings in the network. I cannot tell where exactly, but this causes issues somewhere down the road.
Once again I sadly do not have a reproducible example of this, but this single line once fixed a problematic network I worked with.
|
I wrote However I can say that I have used code with these changes enabled for a couple of months and did not run into any new errors. That is often how long I test these kinds changes before I propose them here. But that is also why I cannot easily reproduce the problematic networks. I would welcome any replacement that is more elegant and faster. With that said, I still think the current version is worth it. I just benchmarked two cases:
|
I agree. As the suggested code improves the current version, I will now merge it. |
I have noticed PR #157 and I approve any attempts to make the welding faster. I did not have a chance to look at that approach yet.
In the meantime, the changes here were waiting to be pushed. They catch some rare edge cases where the welding did not behave as expected. Maybe this is not required if PR #157 gets merged instead.
For good measure, another tiny fix is included in this PR, where
gdf_to_df()depended ongdf.geometry.name == "geometry", which is not always true. The fix should be much saver.