Wikipedia Glove Experiments

This repository contains the experiments detailed in "Scalable Polyadic Queries" for SISAP2024.

Running Experiments

Generating Near Neighbour Graphs

Navigate and run "src/python/make_sed_nns.py" in the Python folder. This will build the near neighbour table using the SED metric. The path within the program will need to be changed to point to a .MAT version of 100D Wikipedia GloVe. This can be produced by opening the raw txt downloadable at https://nlp.stanford.edu/projects/glove/ in Matlab and saving as a .MAT file.

Running this file will create three files: wiki_glove_sm_t10, wiki_glove_sed_nns_sm_t10_dists.txt and wiki_glove_sed_nns_sm_t10_indices.txt. These should be placed in the data folder. All other necessary files are in this folder. A description of these files and their providence follows:

en_thesaurus.jsonl is the list of Wordnet synonyms as downloaded from https://github.com/zaibacu/thesaurus. This list can be extracted from the 5th entry in the JSONL file. Only words with 3+ synonyms, all of which were in the Oxford 5000 (json file obtained from https://github.com/tyypgzl/Oxford-5000-words), were retained from the Wordnet set. The results of this process are stored in filtered_syns.json.

Running Timing Experiments

To run the timing experiments, run the main method found in uk.ac.standrews.cs.descent.sed.RunMSEDExperiments, ensuring that all data files are in the data folder as described above. This will create a file on the top level called "synonym_results.json", which contains details reported upon in the paper of each experiment.

Our version of the data is contained in synonym_results.json.

Rebuilding Results Graphs

Run "src/python/analyse_results.ipynb". All graphs will be regenerated from the JSON file produced in the above step.

Name		Name	Last commit message	Last commit date
Latest commit History 10 Commits
.idea		.idea
data		data
src		src
.gitignore		.gitignore
README.md		README.md
pom.xml		pom.xml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Wikipedia Glove Experiments

Running Experiments

Generating Near Neighbour Graphs

Running Timing Experiments

Rebuilding Results Graphs

About

Uh oh!

Releases

Packages

Uh oh!

Contributors 2

Uh oh!

Languages

MetricSearch/SISAP2024PolyadicQuery

Folders and files

Latest commit

History

Repository files navigation

Wikipedia Glove Experiments

Running Experiments

Generating Near Neighbour Graphs

Running Timing Experiments

Rebuilding Results Graphs

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors 2

Uh oh!

Languages

Packages