Make use of Numpy-Arrays #138

tcld · 2015-10-16T18:49:54Z

Another branch, as promised (#136). I replaced almost all arrays with numpy-arrays. Everything that is not mentioned in the serialization-tests is probably not touched, either. (I had to start somewhere.)

Here some performance-test results:

Parameters:

  width        = 384
  height       = 256
  plates       =   5
  oceansurface = 65% (only effective in the numpy_repl-branch, which is based on ocean_level_fix)
  seed         = 13047

Generation times for every branch:

  master     = 39.57 s
  perf_opt   = 29.22 s
  numpy_repl = 28.40 s
  (numpy_repl + platec1.4.0 = 21.48 s)
  (numpy_repl + platec1.4.0+ small improvements = 20.44 s)

Images (master - perf_opt - numpy_repl - platec1.4.0):

The latest changes were not supposed to speed things up too much, only some of the excessively used functions were replaced. I did what I could while keeping an eye on the target of getting numpy ready. In the future it won't be a pain to get a bit of optimization done anymore. And maybe the memory footprint will have changed a bit, I still have no way of reliably checking that.
I will do some tests with larger maps and see if anything changed there. (And I am really looking forward to the new version of platec. :) )

All tests except two are passed, although I did run one of the regeneration-scripts in the tests-folder to make it happen. At this point, especially due to the change to the seeding, worldengine-data should be updated a bit anyway, I think. (One of the failed tests, maybe even both, can probably be passed with fresh data - maybe it has to be Python 2/3-specific, though.)

EDIT: I took a look at the new platec version and added the time above. In total we are down to roughly 50% of the generation time from a couple of days ago! :)

EDIT2: As far as I see, the failed test comes from the fact that the old world-file was generated without numpy-arrays. I wouldn't know what to do about that - on my system regenerating the test-world was enough, though.

EDIT3: I did a little more profiling this evening. The air gets thin now, there are only three methods left that slightly justify spending time on with optimization (world.is_land()~11% of total time, world.tiles_around()~20%, common.anti_alias_point()~20% - that's roughly 50% in total!). And after looking at them for a while and trying some things I really couldn't come up with any improvement.
A funny sidenote is that the function calls for my testmap went done from more than 300 million to less than 100 million. Half of which is attributed to is_land(). Speed-wise this might be the end of the road, unless a lot more work is invested.

         95833445 function calls (95705636 primitive calls) in 92.104 seconds

   Ordered by: internal time

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
  4014080   19.543    0.000   19.543    0.000 worldengine/../worldengine/common.py:97(anti_alias_point)
  1262482   18.308    0.000   27.963    0.000 worldengine/../worldengine/world.py:375(tiles_around)
     1150   14.349    0.012   14.349    0.012 {built-in method step} -- this refers to platecs step()-method
 56451705   10.417    0.000   10.417    0.000 worldengine/../worldengine/world.py:341(is_land)
        1    2.312    2.312    4.155    4.155 worldengine/../worldengine/simulations/erosion.py:136(river_sources)

psi29a · 2015-10-17T07:36:23Z

Doing a rebuilt based on new master.

tcld · 2015-10-17T07:45:11Z

As said here #134 , I think that one of those last two commits (well, I hope it is one of those) might have introduced a problem. Once the results of a currently running generation are done, I will post an image, should anything go wrong. (Or just a message, if everything is ok.)

tcld · 2015-10-17T11:11:12Z

Update on this:
I separated this branch from the ocean_level_fix-branch (#134). This branch passed all minus two tests for me (after running tests/data/data_generator.py as far as I remember) and it still produces exactly the same output as the master-branch. And it only needs about 75% of the time, I have to add. ;)

The ocean_level-branch will have to wait for a bit of discussion with you guys since the output might not be desirable. But that can be fixed, after some input. :)

@psi29a What do I have to do to update whatever tests/data/data_generator.py spit at me? This branch will not pass the tests without it, I think. (There is always complaining about a numpy-method that is missing.)

tcld · 2015-10-17T11:22:35Z

worldengine/simulations/humidity.py

numpy is just awesome. :D

tcld · 2015-10-17T11:47:48Z

I added a ton of comments since these commits change quite a bit. Hopefully I didn't make anything unnecessarily cryptic.

psi29a · 2015-10-18T19:57:52Z

  File "/home/travis/build/Mindwerks/worldengine/worldengine/drawing_functions.py", line 101, in _find_land_borders
    if world.ocean[int(y / factor), int(x / factor)]:
TypeError: list indices must be integers, not tuple

This looks to me that ocean isn't numpy...

psi29a · 2015-10-18T20:00:12Z

  File "/home/travis/build/Mindwerks/worldengine/worldengine/world.py", line 483, in elevation_at
    return self.elevation['data'].T[pos]
AttributeError: 'list' object has no attribute 'T'

Is this a typo?

tcld · 2015-10-18T20:04:25Z

No. "T" is a call to the transpose-function of the numpy array.
http://docs.scipy.org/doc/numpy/reference/generated/numpy.ndarray.T.html

The error only occurs when an old world is used. However, I can replace the call to T for something more clear, it doesn't gain any performance anyway. (That's why I tried it.)

tcld · 2015-10-18T20:11:40Z

It is "fixed" now. I should have done that earlier.

tcld · 2015-10-18T21:49:05Z

I didn't notice this before:

This looks to me that ocean isn't numpy...

I have not really understood why but that is the same problem as with the missing transpose() method: Somehow when loading an old map the arrays are not (yet?) numpy arrays. Did I overlook a place where the arrays could get into the program without being converted? Is this due to the pickle-format?
After regenerating the test-world this worked on my local system. Not understanding why, though, is a problem, of course.

tcld · 2015-10-18T21:59:50Z

I think the problems come from this:
https://docs.python.org/2/library/pickle.html#pickle.load

I assume the test-world is a pickle-world and things go like this:

The Protobuf-loaders could be manipulated in World.py, but the old pickle-save does not contain numpy-arrays and thus after loading the arrays cannot be used as such.

If this sounds feasible, we should really regenerate that world and see how the two tests do that so far have been failing (for me, at least).

psi29a · 2015-10-19T09:47:45Z

That was my first thought as well... loading old datafiles will have data as array and not numpy array.

This means we'll likely have to regenerate the world files to accept the new format. :)

We'll be breaking compatibility with older world files as a result. This is likely the case for protobuf as well.

This is why we have unit and functional tests, to help us stop and think about the problem before a user is exposed to it. :)

tcld · 2015-10-19T09:59:02Z

All the arrays are converted after being loaded respectively before being saved with protobuf. I expect old protobuf saves to still work; I don't have one to test it, though.

I never made the biome-map use protobuf (but I think that is the only one, it just looked like quite a bit of work). So if the world were to be regenerated...at some point another break of compatibility would have to occur, I fear.

tcld · 2015-10-19T10:03:32Z

I think I might at least put very basic code in there to have the biome-map be a numpy-array (right now), without the usual small optimisations that follow all over the code. That way we don't have to break pickle-compatibility twice. (By the way: If protobuf were to be the favourite serializer, maybe the test-worldfile should be in protobuf-format? That way we wouldn't need two separate files for Python 2 and 3, either.)

Or is this kind of break maybe too much of a problem and the switch to numpy not feasible? I would hate to see that.

psi29a · 2015-10-19T10:19:27Z

I agree totally...

@ftomassetti I think we're getting to that point where we need to pull the trigger on axing pickle. ;)

…rigation-map.

…arrays. Some smaller functions make use of numpy now.

… river-maps with numpy maps.

…n performance.

…I made before.

…esponsible for half of all function calls during world-generation (i.e. tens of millions).

tcld · 2015-10-20T22:04:50Z

I could use some help right now: I generated a full set of data, including that for Python 2. Under Python 3 all tests pass, under Python 2 the ancient-map tests don't pass (that is 2/40). Did I do something wrong while generating? I don't know anything about the ancient maps yet, I thought they are something that is generated after the actual world is already finished and saved. Is that wrong?

(Once this is sorted out, I can push all the changes and this should be done.)

PS: So far I haven't touched the code for the ancient maps. I really don't know what's going on. Is there maybe another rogue RNG seed at work?

tcld · 2015-10-20T22:16:51Z

Here the coloured version - Python 3 first, Python 2 last:

The mountains in the middle are oriented differently. (protobuf)

These two are very different. :( (pickle)

tcld · 2015-10-20T22:21:50Z

I will take another look at this tomorrow morning. Maybe I made some mistake that happens to make Python 2 and 3 create different outputs.

EDIT: Maybe there is a chance that this is a legitimate bug? I don't know how well Python 2/3 compatibility had been tested before.

EDIT2: All other output-images seem to be fine. And since my Python 3 version passed the test, it couldn't be randomness introduced during the ancient-map generation. Are there any preparatory steps during world generation that could influence the outcome?

tcld · 2015-10-21T06:25:47Z

I think I found the problem.

Python 3.4.3 (default, Oct 14 2015, 20:28:29) 
[GCC 4.8.4] on linux
>>> import random
>>> random.seed(0)
>>> random.randint(0, 1000000)
885440

Python 2.7.6 (default, Jun 22 2015, 17:58:13) 
[GCC 4.8.2] on linux2
>>> import random
>>> random.seed(0)
>>> random.randint(0, 1000000)
844422

Maybe we need a new random module. And a new noise module? :(

For now I guess I will push all my changes and then we regard this is a problem to be fixed. In principle it concerns all drawing functions that make use of the random module.

ftomassetti · 2015-10-21T06:39:23Z

I am wondering: did you generate the two random numbers on the same machine just using different version of Python? I ask because I read "linux" and "linux2". If the random generator gives different numbers on different versions of Python that is bad :(
I am looking for library to replace it. I know it is frustrating to fix those tests but hopefully they will protect us by unintended changes.

tcld · 2015-10-21T06:43:32Z

Just opened an issue: #145
Yes, same machine. But as can be seen different compilers.

The changes in the Python-module can probably be spotted and compared. I don't know about gcc, though.

Should I go through with the pushes and we sort this out later? I will switch out problematic parts in the binaries for the Python 2 versions so the Travis tests should work out at least.

tcld · 2015-10-21T07:17:37Z

Finally: Here Mindwerks/worldengine-data#2 is the PR for the actual, final, eternal data that makes Python 2 pass all tests. (Python 3 doesn't like the ancient maps, everything else works - 38/40.)

If that PR is merged and the tests are rerun, things should work fine. I really, really hope. :)

tcld · 2015-10-21T07:20:33Z

How do you trigger a re-test?
EDIT: I still don't know but I had to do another push anyway.

tcld · 2015-10-21T07:45:06Z

Yippie! :)

psi29a · 2015-10-21T07:49:19Z

Make it so @ftomassetti ? :) I'm happy with this.

tcld · 2015-10-26T10:50:16Z

What's the situation here?

ftomassetti · 2015-10-26T10:51:44Z

Fine by me: all tests are passing :)
Should we hit the merge button or is there anything else any of you want to check?

tcld · 2015-10-26T11:01:37Z

I didn't want to change anything about this branch anymore, no. The ones after might need some polish, though.

Make use of Numpy-Arrays

ftomassetti · 2015-10-26T11:02:49Z

Merged, now we can see what adaptations the other PRs require

tcld force-pushed the numpy_repl branch 3 times, most recently from 560eee0 to 70fc6cf Compare October 17, 2015 01:10

tcld force-pushed the numpy_repl branch 2 times, most recently from e24735c to 4c5a0d2 Compare October 17, 2015 11:09

tcld reviewed Oct 17, 2015
View reviewed changes

tcld mentioned this pull request Oct 18, 2015

Numpy+images #141

Closed

tcld mentioned this pull request Oct 19, 2015

Error with the Greyscale file generation #55

Closed

tcld added 10 commits October 20, 2015 23:20

Double-checked the switch from Python array to numpy array for the ir…

4716947

…rigation-map.

Replaced ocean-, elevation-, precipitation- and water-map with numpy …

671b01d

…arrays. Some smaller functions make use of numpy now.

Replaced humidity, permeability-, temperature-, sea_depth-, lake- and…

aa5c63e

… river-maps with numpy maps.

Small changes to astar.py.

0d62c77

Tiny changes.

e66b38e

Removed calls to numpy.transpose() where it didn't give any benefit i…

452db57

…n performance.

Replaced the biome-map with a numpy-array and fixed two small errors …

de4ac88

…I made before.

Applied psi29s suggestions.

1f14c41

While at it I replaced calls to is_land() and is_ocean() which were r…

ebb2435

…esponsible for half of all function calls during world-generation (i.e. tens of millions).

Replaced two calls to array_to_matrix() with numpy-methods.

3ae6ea2

tcld mentioned this pull request Oct 21, 2015

random module gives varying output for different Python versions #145

Closed

tcld force-pushed the numpy_repl branch from 10e3e89 to 3ae6ea2 Compare October 21, 2015 07:32

tcld mentioned this pull request Oct 23, 2015

random -> numpy.random #146

Merged

ftomassetti added a commit that referenced this pull request Oct 26, 2015

Merge pull request #138 from tcld/numpy_repl

07e179d

Make use of Numpy-Arrays

ftomassetti merged commit 07e179d into Mindwerks:master Oct 26, 2015

tcld deleted the numpy_repl branch October 26, 2015 11:07

Make use of Numpy-Arrays #138

Make use of Numpy-Arrays #138

Uh oh!

Conversation

tcld commented Oct 16, 2015

Uh oh!

psi29a commented Oct 17, 2015

Uh oh!

tcld commented Oct 17, 2015

Uh oh!

tcld commented Oct 17, 2015

Uh oh!

tcld Oct 17, 2015

Choose a reason for hiding this comment

Uh oh!

psi29a Oct 19, 2015

Choose a reason for hiding this comment

Uh oh!

tcld commented Oct 17, 2015

Uh oh!

psi29a commented Oct 18, 2015

Uh oh!

psi29a commented Oct 18, 2015

Uh oh!

tcld commented Oct 18, 2015

Uh oh!

tcld commented Oct 18, 2015

Uh oh!

tcld commented Oct 18, 2015

Uh oh!

tcld commented Oct 18, 2015

Uh oh!

psi29a commented Oct 19, 2015

Uh oh!

tcld commented Oct 19, 2015

Uh oh!

tcld commented Oct 19, 2015

Uh oh!

psi29a commented Oct 19, 2015

Uh oh!

tcld commented Oct 20, 2015

Uh oh!

tcld commented Oct 20, 2015

Uh oh!

tcld commented Oct 20, 2015

Uh oh!

tcld commented Oct 21, 2015

Uh oh!

ftomassetti commented Oct 21, 2015

Uh oh!

tcld commented Oct 21, 2015

Uh oh!

tcld commented Oct 21, 2015

Uh oh!

tcld commented Oct 21, 2015

Uh oh!

tcld commented Oct 21, 2015

Uh oh!

psi29a commented Oct 21, 2015

Uh oh!

tcld commented Oct 26, 2015

Uh oh!

ftomassetti commented Oct 26, 2015

Uh oh!

tcld commented Oct 26, 2015

Uh oh!

ftomassetti commented Oct 26, 2015

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants