This repository contains the data used for "Extraction of UML Class Diagrams from Natural Language Specification" (Yang et al. 2022)
For the implementation, check out https://github.com/songyang-dev/uml-translation-3step.
To get the entire dataset, you must download the release containing dataset.tar.gz.
It is too big to be directly committed to git. Find the most recent version in the Releases section (https://github.com/songyang-dev/uml-classes-and-specs/releases).
dataset.tar.gz: archive that contains all the following files. Available in the Releases section of this repo.
Important parts of the dataset:
fragments.csv: file that lists UML fragments and their characteristicslabels.csv: file that contains the labels received in the crowdsourcing effortmodels.csv: file that lists UML class diagrams and their characteristicszoo/: folder that contains all the UML data itself, such as pictures and UML encodings. Both labeled and unlabeled data are present. Only 5-10% of the UML are labeled.
Unzip the tarball first.
Open models.csv to read the list of available models. Copy its name and search in the zoo/ folder for .png files starting with that name. For example, the ACME model has an image in the zoo/ folder called ACME.png.
ls zoo/ACME.png
code zoo/ACME.png # any other image visualizerFragment files are named in the following pattern.
Class fragments:
(ModelName)_(class)(number).png
Relationship fragments:
(ModelName)_(rel)(number).png
Similarly, you can visualize them.
code zoo/CFG_class0.png- Browse through
labels.csvand find the line that has the label of interest. - Every label has a
fragment_id, which can be indexed infragments.csv. Find the ID for the label of interest. - Inside
fragments.csv, search for the line where the column value ofunique_idequalsfragment_idfrom Step 2. - Proceed like in the previous section.