Runs slow.  Anyone interested in improving performance?

I don't want to take the time right now to submit performance enhancements, but perhaps @moyix or some other person reading this not would like to do the work.
I find that a tremendous amount of time is spent with file reads, string concatenations, and substring operations.  There are two ways to speed things up that I have seen, and would be simple to implement:

1. In `StreamFile` class, cache the stream pages, so you only have to read them once from the file.  Or better, if the platform supports `mmap`, just mmap the entire PDB file, create a buffer for it, and take a slice of the buffer for a stream page whenever you need it.  In the non-mmap case, you could add a method to clear the cache, to be called, for instance, after parsing the entire stream.
2. In `StreamFile._read`, see how many pages are spanned by the request.  Use the above cache / mmap to get slices of individual pages.  Return the slice, or a concatenation of two slices, or use CStringIO to assemble more than two slices.  Using `_read_pages` is inefficient because then you have to take a slice of the result.

I think this would eliminate most of the time spent in parsing a PDB as a whole.  You could try profiling pdbparse with a large file, such as ntoskrnl.pdb.


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Runs slow. Anyone interested in improving performance? #43

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Runs slow. Anyone interested in improving performance? #43

Description

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions