Skip to content
This repository was archived by the owner on Feb 4, 2020. It is now read-only.

Conversation

@Jimilian
Copy link
Contributor

I wrote small script to test simple json.dump vs json.dump with nice print:

import json
import timeit

iterations=10000
data = """{"CacheEntries": 0, "CacheHits": 1, "CacheMisses": 4, "CacheSize": 0, "CallsForExternalDebugInfo": 1, "CallsForLinking": 1, "CallsForPreprocessing": 1,"CallsWithInvalidArgument": 1, "CallsWithMultipleSourceFiles": 1, "CallsWithPch": 1, "CallsWithoutSourceFile": 1, "EvictedMisses": 1, "HeaderChangedMisses": 1,"SourceChangedMisses": 1}"""

print("iterations="+str(iterations))
print("data=",data)

print("Json pretty:", json.__version__)
print(">>",timeit.timeit("x=json.dumps("+data+", sort_keys=True, indent=4)","import json",number=iterations))

print("Json simple:", json.__version__)
print(">>",timeit.timeit("x=json.dumps("+data+")", "import json",number=iterations))

Results:

iterations=10000
data= {"CacheEntries": 0, "CacheHits": 1, "CacheMisses": 4, "CacheSize": 0, "CallsForExternalDebugInfo": 1, "CallsForLinking": 1, "CallsForPreprocessing": 1,"CallsWithInvalidArgument": 1, "CallsWithMultipleSourceFiles": 1, "CallsWithPch": 1, "CallsWithoutSourceFile": 1, "EvictedMisses": 1, "HeaderChangedMisses": 1,"SourceChangedMisses": 1}
Json pretty: 2.0.9
>> 0.3718637569982093
Json simple: 2.0.9
>> 0.10602316400036216

So, without pretty print it's twice/three times faster.

@codecov-io
Copy link

codecov-io commented Aug 15, 2016

Current coverage is 78.57% (diff: 100%)

Merging #205 into master will not change coverage

@@             master       #205   diff @@
==========================================
  Files             1          1          
  Lines           929        929          
  Methods           0          0          
  Messages          0          0          
  Branches        155        155          
==========================================
  Hits            730        730          
  Misses          166        166          
  Partials         33         33          

Powered by Codecov. Last update e25b61a...524939f

@webmaster128
Copy link
Contributor

The benchmark is irrelevant as it does not take the context into account. The serialization process is negligible compared to opening/writing/closing a file, reading 100–1000 include files or running a compiler. The absolute values show that as well: 0.37 seconds / 10.000 = 0,000037 seconds = 0,0371 ms = 37 µs.

So this change just reduces readability and maintainability.

@Jimilian
Copy link
Contributor Author

Yes, I see your point. But, first of all, it reduces time in lock (again, if I understood code correctly, that is not a case sometimes :) ). So, 37 µs is not a final value too.
Also, typically users should not try to read this json files manually - only using clache -s or something like that.
I'm not expecting big improvements (maybe several seconds), but I don't see any reason to keep pretty print.

Btw, I hope that you remember about opening/writing/closing files in context of #160. Because I'm expecting that it could strike back then you need to write 10 stats files instead of one.

@webmaster128
Copy link
Contributor

it reduces time in lock

Right, but it does not help.

I'm not expecting big improvements (maybe several seconds)

I see no improvement.

I don't see any reason to keep pretty print.

As developer I want to be able to check JSONs (especially manifests) in a regular and quick way by just double clicking the file. This was needed to identify a bug some time ago, helped understanding include file paths (where I think file path bug exists), now is needed to handle #179 and will after that be needed for a structural improvement that can save a lot of compiler calls.

@hubx
Copy link
Contributor

hubx commented Aug 15, 2016

With these micro optimizations its hard to estimate real implications. What about comparing it in real workloads? Such as the benchmark introduced with #197 or #83?

As developer I want to be able to check JSONs (especially manifests)

Understandable, but a common user might prefer faster compilation over readable output? Also many viewers/editor come with out-of-the-box with pretty printing for JSON, e.g. JSONView for Chrome, so we can have both in this case :)

@Jimilian
Copy link
Contributor Author

Jimilian commented Aug 16, 2016

I see no improvement.

@webmaster128 , it was a case with statistics, if we are talking about manifest difference is even bigger:

python3 serializators.py 
iterations=10000
data= {
  "includeFiles": [
    "somepath\myinclude.h"
  ],
  "includesContentToObjectMap": {
    "fdde59862785f9f0ad6e661b9b5746b7": "a649723940dc975ebd17167d29a532f8"
  }
}
Json pretty: 2.0.9
>> 0.24787091600592248
Json simple: 2.0.9
>> 0.0632409039826598

And it was small manifest. You can try with big one on your local machine, if you want.

So, if idea itself is ok for you, I can introduce new parameter - CLCACHE_PRETTY_JSON. So you will be able to use pretty format, rest of the world - a little bit faster compilation. Also, I can replace simple json to something faster (marshal/pickle/msgpack). It will save some milliseconds as well.

@Jimilian
Copy link
Contributor Author

My best expectations were far away from reality:

screen shot 2016-08-16 at 09 00 11

@webmaster128
Copy link
Contributor

If this 8 % gain for full projects can be repeated, I am willing to install additional tools for clcache development or introduce a CLCACHE_MAINTAINER_MODE (more general version of CLCACHE_PRETTY_JSON).

If someone tries to replace json with some whatever binary format, I am going to fork this project. I just replaced pickle with json for the sake of understanding what the fuck is going on. There are so many structural issues (bugs and missing optimizations) that can save dozenzs of percentages of build time, which you just don't see when some binary magic is going on.

And if someone relly needs to catch milliseconds, please consider porting the project to a compiled language instead of Python.

@Jimilian
Copy link
Contributor Author

@webmaster128, you did a great work here, so, please, don't hate me for some microoptimizations/suggestions without fully understanding of context.

@Un1oR, can you test this PR in your environment? It looks like you also have huge concurrency (clcache + Incredibuild?).

@webmaster128
Copy link
Contributor

@Jimilian Sorry for being a bit loude with my points. This not personal at all and in fact, I am very happy to see active corporate users using and contribting to this project.

@webmaster128
Copy link
Contributor

webmaster128 commented Aug 17, 2016

After running a bunch of tests on a single local 2 core machine, comparing master against #205, I am pretty sure this has no relevant effect.

clcache_benchmarks

clcache_benchmarks.pdf
clcache_benchmarks.zip

The underlying benchmark output is documented here: https://gist.github.com/webmaster128/9d279503c8996c5eb5a5d5f1c098fb4e

@Jimilian
Copy link
Contributor Author

Jimilian commented Aug 18, 2016

What is on Y axis? Milliseconds? Seconds? Op/s?

Anyway, we use 16 cores nodes with SSD and contention on every lock due calculating some unrelated stuff is a problem for us.
I'm pretty sure that clcache would be much better with #160. And after merging all PR-160 related stuff impact of this change could be 0, but now this PR can help people who are using clcache in same environment as our.

Also, do you have HDD or SSD on your local machine?

@webmaster128
Copy link
Contributor

Y axis is time in seconds, X axis is the run number (3 runs per setup). Everything runs on SSD.

@Jimilian
Copy link
Contributor Author

Jimilian commented Aug 18, 2016

CLCACHE_NODIRECT=1?
Btw, I tested #205 against master branch as well - not last released version.

@frerich
Copy link
Owner

frerich commented Aug 21, 2016

Looking forward to some results for other representative use cases. I'd just like to make a general note that the whole purpose of the clcache project is to make things go fast, so I have a strong bias towards faster solutions even if they make maintenance a little harder - but only if there is compelling evidence that this tradeoff is necessary at all. I believe that in a lot of cases, it's not an either/or thing.

@Jimilian
Copy link
Contributor Author

@webmaster128 I found and nice service that can make simplify life with unformatted json: http://prettyjson.com ("pretty" button is in right-bottom corner). So, can we merge it now (after conflict resolving)? :) Or it's better to find somebody who is using non direct mode, 16 or 32 cores and ssd to prove benefits?

@webmaster128
Copy link
Contributor

@Jimilian I'm fine with my bash alias
pretty_json_sorted='python3 -c "import json, sys; obj=json.load(sys.stdin); print(json.dumps(obj, ensure_ascii=False, indent=2, sort_keys=True))"'

I still hate the idea of this manuel step when developing and hate the idea, that it is getting even harder for other contributors to understand what is gosing on. There are so many structural issues with Manifests that it's not worth the discussion here, it's not worth keeping this open and it's not worth sending notification emails to people caring for real problems (#160, #215, #212, #208, #155, #196, #210).

As long as there are no isolated, reproducible tests proving the contrary, this ist just premature optimization. If those benchmarks exist, I am willing to rent some whatever-core machine and test it.

And if you really want to do somthing for your build performance, I think it is better develop a mode that does not count hits at all. This saves you a file-open, file-read, file-close, json-decode, json-encode, file-open, file-write, file-close per cache hit.

@Jimilian
Copy link
Contributor Author

@webmaster128 ok, you are right. Real problems are more important than this one. I just thought that I found low-handing fruit.

@Jimilian Jimilian closed this Aug 25, 2016
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants