-
Notifications
You must be signed in to change notification settings - Fork 4
Description
I just stumbled over this project. Excellent work!
IMO you should publish this in an academic paper, since many scholars (in DH and beyond) will be interested...
I have a few suggestions on making this more readily available to users. (And I'd be happy to at least partially volunteer, if you are interested.)
requirements.txt
To maximize utilty for other users, it's necessary to minimize the assumptions/constraints of your explicit dependencies. Currently, that list requires the exact version you happened to have installed when you did pip freeze – but your actual requirements will almost always be broader. Also, it's not necessary to list dependent packages there. E.g. instead of ...
h5py==2.8.0
numpy==1.15.1
keras==2.2.4
scipy==1.1.0
...you could just write keras<2.3.
But most of these are already required by Mask-RCNN anyway (see next point), so perhaps this list could be drastically reduced to just:
mask-rcnn>=2.1
pandas
requests
image
Mask-RCNN redistribution
It's not good practice to dump an open source repo's code into your own, for several reasons:
- you loose its history and source annotation (you have to dig it up externally)
- you loose upstream changes/fixes (you have to sync them in manually)
- you loose the possibility to (conveniently) contribute your own changes to upstream
Instead, you should make mask-rcnn an external dependency, like the others. If you do need to make changes of your own, then do that in a Github fork, which you then can either integrate here as a git submodule or reference in requirements.txt via the GH URL of your fork.
Currently, IIUC, the only change you made is your custom.py and Jupyter notebooks, right?
separate code and data
You (thankfully) provided all your raw and intermediate data files along with the converter scripts and tools, so one can re-produce your work. But for re-use it's usually better to strictly separate data and code files. At least in the directory structure, but preferably also with distinct repos (pointing to the data repo in a submodule only).
distribute on PyPI
The most straightforward and visible distribution option for Python projects is the Python Packaging Index. To publish there, all you have to do is register for an account, bring your repo into shape (following the Python packaging guidelines – most notably, create a setup.py with entry_points and linking to requirements.txt), then use twine to build and upload your package.