repo structure and packaging/distribution

I just stumbled over this project. Excellent work! 

IMO you should publish this in an academic paper, since many scholars (in DH and beyond) will be interested...

I have a few suggestions on making this more readily available to users. (And I'd be happy to at least partially volunteer, if you are interested.)

#### requirements.txt

To maximize utilty for other users, it's necessary to minimize the assumptions/constraints of your explicit dependencies. Currently, that list requires the exact version you happened to have installed when you did `pip freeze` – but your actual requirements will almost always be broader. Also, it's not necessary to list dependent packages there. E.g. instead of ...

```
h5py==2.8.0
numpy==1.15.1
keras==2.2.4
scipy==1.1.0
```
...you could just write `keras<2.3`.

But most of these are already required by Mask-RCNN anyway (see next point), so perhaps this list could be drastically reduced to just:

```
mask-rcnn>=2.1
pandas
requests
image
```

#### Mask-RCNN redistribution

It's not good practice to dump an open source repo's code into your own, for several reasons:
- you loose its history and source annotation (you have to dig it up externally)
- you loose upstream changes/fixes (you have to sync them in manually)
- you loose the possibility to (conveniently) contribute your own changes to upstream

Instead, you should make [mask-rcnn](https://github.com/matterport/Mask_RCNN/) an external dependency, like the others. If you do need to make changes of your own, then do that in a Github fork, which you then can either integrate here as a git submodule or reference in `requirements.txt` via the GH URL of your fork.

Currently, IIUC, the only change you made is your `custom.py` and Jupyter notebooks, right?

#### separate code and data

You (thankfully) provided all your raw and intermediate data files along with the converter scripts and tools, so one can re-produce your work. But for re-use it's usually better to strictly separate data and code files. At least in the directory structure, but preferably also with distinct repos (pointing to the data repo in a submodule only).

#### distribute on PyPI

The most straightforward and visible distribution option for Python projects is the [Python Packaging Index](https://pypi.org/). To publish there, all you have to do is register for an account, bring your repo into shape (following the [Python packaging guidelines](https://packaging.python.org/tutorials/installing-packages/) – most notably, create a `setup.py` with `entry_points` and linking to `requirements.txt`), then use `twine` to build and upload your package.




Provide feedback

Saved searches

Use saved searches to filter your results more quickly

repo structure and packaging/distribution #15

requirements.txt

Mask-RCNN redistribution

separate code and data

distribute on PyPI

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

repo structure and packaging/distribution #15

Description

requirements.txt

Mask-RCNN redistribution

separate code and data

distribute on PyPI

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions