Repository for UML-English data

This repository contains the data used for "Extraction of UML Class Diagrams from Natural Language Specification" (Yang et al. 2022)

For the implementation, check out https://github.com/songyang-dev/uml-translation-3step.

Getting the dataset

To get the entire dataset, you must download the release containing dataset.tar.gz.

It is too big to be directly committed to git. Find the most recent version in the Releases section (https://github.com/songyang-dev/uml-classes-and-specs/releases).

Structure of the dataset

dataset.tar.gz: archive that contains all the following files. Available in the Releases section of this repo.

Important parts of the dataset:

fragments.csv: file that lists UML fragments and their characteristics
labels.csv: file that contains the labels received in the crowdsourcing effort
models.csv: file that lists UML class diagrams and their characteristics
zoo/: folder that contains all the UML data itself, such as pictures and UML encodings. Both labeled and unlabeled data are present. Only 5-10% of the UML are labeled.

Making use of the dataset

Unzip the tarball first.

Opening the image of a certain UML model

Open models.csv to read the list of available models. Copy its name and search in the zoo/ folder for .png files starting with that name. For example, the ACME model has an image in the zoo/ folder called ACME.png.

ls zoo/ACME.png
code zoo/ACME.png # any other image visualizer

Opening the image of a certain fragment

Fragment files are named in the following pattern.

Class fragments:

(ModelName)_(class)(number).png

Relationship fragments:

(ModelName)_(rel)(number).png

Similarly, you can visualize them.

code zoo/CFG_class0.png

Finding the image of a fragment starting from a label

Browse through labels.csv and find the line that has the label of interest.
Every label has a fragment_id, which can be indexed in fragments.csv. Find the ID for the label of interest.
Inside fragments.csv, search for the line where the column value of unique_id equals fragment_id from Step 2.
Proceed like in the previous section.

Name		Name	Last commit message	Last commit date
Latest commit History 6 Commits
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
fragments.csv		fragments.csv
labels.csv		labels.csv
models.csv		models.csv

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Repository for UML-English data

Getting the dataset

Structure of the dataset

Making use of the dataset

Opening the image of a certain UML model

Opening the image of a certain fragment

Finding the image of a fragment starting from a label

About

Uh oh!

Releases 1

Packages

Uh oh!

License

songyang-dev/uml-classes-and-specs

Folders and files

Latest commit

History

Repository files navigation

Repository for UML-English data

Getting the dataset

Structure of the dataset

Making use of the dataset

Opening the image of a certain UML model

Opening the image of a certain fragment

Finding the image of a fragment starting from a label

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases 1

Packages 0

Uh oh!

Packages