Skip to content

repo structure and packaging/distribution #15

@bertsky

Description

@bertsky

I just stumbled over this project. Excellent work!

IMO you should publish this in an academic paper, since many scholars (in DH and beyond) will be interested...

I have a few suggestions on making this more readily available to users. (And I'd be happy to at least partially volunteer, if you are interested.)

requirements.txt

To maximize utilty for other users, it's necessary to minimize the assumptions/constraints of your explicit dependencies. Currently, that list requires the exact version you happened to have installed when you did pip freeze – but your actual requirements will almost always be broader. Also, it's not necessary to list dependent packages there. E.g. instead of ...

h5py==2.8.0
numpy==1.15.1
keras==2.2.4
scipy==1.1.0

...you could just write keras<2.3.

But most of these are already required by Mask-RCNN anyway (see next point), so perhaps this list could be drastically reduced to just:

mask-rcnn>=2.1
pandas
requests
image

Mask-RCNN redistribution

It's not good practice to dump an open source repo's code into your own, for several reasons:

  • you loose its history and source annotation (you have to dig it up externally)
  • you loose upstream changes/fixes (you have to sync them in manually)
  • you loose the possibility to (conveniently) contribute your own changes to upstream

Instead, you should make mask-rcnn an external dependency, like the others. If you do need to make changes of your own, then do that in a Github fork, which you then can either integrate here as a git submodule or reference in requirements.txt via the GH URL of your fork.

Currently, IIUC, the only change you made is your custom.py and Jupyter notebooks, right?

separate code and data

You (thankfully) provided all your raw and intermediate data files along with the converter scripts and tools, so one can re-produce your work. But for re-use it's usually better to strictly separate data and code files. At least in the directory structure, but preferably also with distinct repos (pointing to the data repo in a submodule only).

distribute on PyPI

The most straightforward and visible distribution option for Python projects is the Python Packaging Index. To publish there, all you have to do is register for an account, bring your repo into shape (following the Python packaging guidelines – most notably, create a setup.py with entry_points and linking to requirements.txt), then use twine to build and upload your package.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions