-
Notifications
You must be signed in to change notification settings - Fork 132
support hdf5 serialization/unserialization of all the world data #153
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
I have not checked every single line, it's a bit too late for that (here). But I love it anyway.^^ What are the major advantages of this format, by the way? I know that pickle isn't fully compatible between Python 2 and 3, and I read about you pointing out some problems with Protobuf (which I honestly don't like that much either, because I believe that Google is in charge of it - and Google tends to change course by 180 degrees regardless of how many million lifes are on the line xP; I hate relying on their work). |
|
Honestly is just that I want open worlds from Java and using protobuf I cannot do it right now. HDF5 seems a good format, with a great support for many languages and Bret suggested it for a while so I thought... why not? |
|
tox was not aware of h5py. Let's try again |
|
but sure, I have to install also the library... for both Travis and Appveyor. This is going to be a lot of fun, I am sure. |
|
Yeah... new libraries are hell. I imagine that pyinstaller is going to be worse. ;) |
worldengine/hdf5_serialization.py
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure how it works, but: it seems like both of these are numpy-arrays - is it really necessary to copy them element by element instead of using numpy.copy() or something similar?
If not, there are some more arrays that are copied that way (further down: ocean, sea_depth, ... - maybe even the biome-map).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I tried this:
numpy.copyto(elevation_data, world.elevation['data'])
But I got this:
TypeError: argument 1 must be numpy.ndarray, not Dataset
I guess it is not exactly a numpy array
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
But I can use:
elevation_data.write_direct(world.elevation['data'])
|
I think this looks good - although it might be possible to get rid of almost all element-by-element copy-processes that are being done, since all participants are numpy-arrays anyway. Maybe I am overlooking something there? Other than looking at it I think that a field test is the easiest way to see if all data is properly written and read. |
|
What's up with the future replacement of pickle and protobuf?^^ |
|
I don't see this replacing protobuf, since protobuf will eventually support java. HDF5 is just a stop-gap. |
|
Update status: the code is done and the round-trip works (serialization -> unserialization -> back to the original world). Then I started thinking about having to install another library on AppVeyor, having to explain to users how to installed. So I cried a little, opened a bottle of whiskey and not touched this PR for a while... Now that I am sober again I am not sure... It is true that we could make hdf5 optional as GDAL is. What do you think guys? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The comparison part should be moved to its own function. I think this piece of code exists three times now. And should another save-format ever be supported, things would get even messier.
EDIT: Yes, three times.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I still think this code has to be moved into its own function to then replace the tests for all formats.
|
As long as it isn't trivial to add another format, I would actually prefer seeing us focus on exactly one. Pickle is kind of worthless as long as Python 2 and 3 are supported since the change in Unicode-support breaks save-compatibility between the Python versions. Protobuf has been at the latest alpha for quite a while and I personally don't like relying on Googles code - they have scrapped or "reimagined" other software before, no matter how many people relied on it. (Of course, we could stick to the latest working version in that case. And even modify it ourselves: https://github.com/google/protobuf ) So while I don't know if we need a third way of saving data, we could really use one that is actually reliable and works for all purposes we can come up with. (And then hopefully get rid of the other formats.) |
|
I think hdf5 should be optional like gdal, it is there and useful for when the right libs are in place. :) |
|
rebased |
worldengine/hdf5_serialization.py
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just a question since I don't know: Is this "int" the same for Python 2 and 3? (Maybe it doesn't even matter and save-compatibility will exist either way.)
EDIT: I guess the tests will find out once we load a Python 2 HDF5-world into Python 3. So just disregard my comment.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would hope so but I do not know. Should we use something like uint16?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't know. Unless the tests fail, this will be fine. (Can you run the tests on Python 2 and 3?)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm glad @tcld found this, yes we should avoid python datatypes when working with numpy and use numpy's standard datatypes:
https://docs.scipy.org/doc/numpy/reference/generated/numpy.dtype.html
numpy.dtype(numpy.uint16)
dtype('uint16')
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fixed. I am now using numpy.uint16, numpy.float and numpy.bool
|
In the meantime Travis complains: |
|
Numpy failed to compile... I'm going to assume that the wheel is not being installed here? |
|
That could be the case but I do not understand why this patch is affecting numpy |
|
It's not... likely a new release of numpy and we don't specify a specific version. |
|
Should we specify a min and max release? |
|
The latest version is about a month old. Why would it cause problems if there was a new version of a module? Does it maybe have to be recompiled? |
|
Perhaps it is trying to use a version for which there are no pre-compiled packages. Damn how much easier are these things on the JVM.... |
|
rebased after #187 has been merged |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could we maybe think of a way that generalizes the saving of all the different maps? Iterate over the worlds variables. If there's a plain numpy-array, write it. If it's a dict, write it's key-/value-pairs.
It isn't absolutely necessary, but it would be great if a much more future-proof.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we should create some intermediate concepts such as "Layer" and perhaps having subclasses like "LayerWithQuantiles" or "LayerWithThresholds". In this way we could streamline many parts of the code, including serialization.
|
Ok, it seems good to go from a tests point of view. Feel free to point out corrections and improvements! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In tox.ini there is a different numpy-version used that looks like it is newer than the highest version here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is true. Here the version has been chosen to make AppVeyor happy. Version 1.10.1 has no wheels packages available. However version 1.10.1 was working on Travis. Let me check if other version works on Travis too.
|
Most comments addressed and all tests passing. Time to merge and move forward with the release? |
|
Do you want to release immediately after this PR? I wouldn't mind adding the tests (although...not too important for a release, I guess) and maybe the frozen oceans (which are done here, I just need to update the data). |
|
about the serialization stuff, like this it is much easier to debug and to implement: when I support a new serialization method I start serializating the different elements one by one and as I figure things out I add another layer. So for me it is much simpler like this, until we introduce classes for the single layers and we reorder the World class. About other features, I guess we will be always working on cool stuff we want but a certain point we need to release stuff so that users can benefit from it. I would suggest we do a release very soon and after that we start doing releases more and more often. |
|
I think the frozen oceans are something that at least provides a visual benefit to users. And that's always good. ;P (Also: It is practically done, but nobody commented the latest version yet. I would have to adapt the HDF5-code a bit to make it work, though.) The serialization-code should be put into its own function. I think it is very ugly that it appears twice in the same file (and we aren't talking about five lines of code) and even differs slightly due to personal preference of the person who wrote the respective lines of code. |
|
Well, there are other tasks to do before releasing the new version. Whatever it is merged before we tag release 0.19.0 can go in. Otherwise it could end up in 0.19.1. This repository has ~700 commits. 200 commits have been added since we released version 0.18.0 (and we have not yet merged this PR). It is really time to share it with the world :) |
|
I would really like to see the satellite-map as a standard output for the next release, since it is such great work by @pangal. Including polar caps. Of course it's not a necessity but it might leave a good impression to have some more eye-candy. |
|
I definitely agree with that: I love the satellite view! |
|
I suggest merging this PR and then #185 (which also fixes several bugs of the satellite-view - it's actually not just about tests). After that WorldEngine should be in a good state to release. |
|
Let's start by merging this one then. At this point we should move to the next point of the schedule for releasing a new version which is getting the documentation ready. While we work on that we could merge more stuff but it is important to understand that it is not the end of the world if some feature ends up in the next release instead of this one. |
support hdf5 serialization/unserialization of all the world data
Fix #91