Skip to content

Runs slow. Anyone interested in improving performance? #43

@mrolle45

Description

@mrolle45

I don't want to take the time right now to submit performance enhancements, but perhaps @moyix or some other person reading this not would like to do the work.
I find that a tremendous amount of time is spent with file reads, string concatenations, and substring operations. There are two ways to speed things up that I have seen, and would be simple to implement:

  1. In StreamFile class, cache the stream pages, so you only have to read them once from the file. Or better, if the platform supports mmap, just mmap the entire PDB file, create a buffer for it, and take a slice of the buffer for a stream page whenever you need it. In the non-mmap case, you could add a method to clear the cache, to be called, for instance, after parsing the entire stream.
  2. In StreamFile._read, see how many pages are spanned by the request. Use the above cache / mmap to get slices of individual pages. Return the slice, or a concatenation of two slices, or use CStringIO to assemble more than two slices. Using _read_pages is inefficient because then you have to take a slice of the result.

I think this would eliminate most of the time spent in parsing a PDB as a whole. You could try profiling pdbparse with a large file, such as ntoskrnl.pdb.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions