This repository provides a dataset along with a framework to assist comparative experimental studies on learning-based automated program repair.
-
Dataset is based upon Deepbugs. The following files in the create-dataset/ folder that are in charge of data extraction are borrowed from DeepBugs although we changed them accordingly:
- extractFromJS.js
- extractorOfBinOps.js
- extractorOfCalls.js
- fileIDs.json
- jsExtractionUtil.js
- Util.py
-
Dataset comprises of named based bug patterns:
- Swapped function arguments
- Wrong binary operator
- Wrong operand in binary operation
Framework: - A framework where researchers can incorporate additional context and use on the existing dataset.
-
Install conda
wget https://repo.anaconda.com/miniconda/Miniconda3-latest-MacOSX-x86_64.sh -O ~/miniconda.sh bash ~/miniconda.sh -b -p $HOME/miniconda conda init source ~/miniconda/bin/activate source ~/.bash_profile -
Setup environment
-
Install conda:
conda create -n context_ml python=3.6 conda activate context_ml conda install -y python=3.6 -
Install tensorflow using
pip:pip install tensorflow==1.5 -
Install tensorflow using
conda:conda install -y -c conda-forge tensorflow=1.5.1 -
Required
pythonpackages for embedding generation:conda install -c anaconda nltk conda install -c anaconda gensimDetailed:
conda install -c anaconda nltk import nltk nltk.download('punkt') [nltk_data] Downloading package punkt to /Users/UserName/nltk_data... [nltk_data] Unzipping tokenizers/punkt.zip. -
Install
nvmandnpm:curl -o- https://raw.githubusercontent.com/nvm-sh/nvm/v0.35.3/install.sh | bash source .bashrc nvm install --lts nvm use --lts -
Install node dependencies:
cd src npm install
-
- Download dataSet
| ID | Buggy | Fixed | Dataset Generation Script |
|---|---|---|---|
| 1a | Word tokenization | Word tokenization | data_word_level_esprima.sh |
| 1b | Word tokenization Enhanced | Word tokenization Enhanced | data_word_level_word_tokenization_enhanced.sh |
| 2 | Deepbugs Representation | Deepbugs Representation | data_word_level_deepbugs.sh |
| 3 | Deepbugs Representation (with Types Incomplete with variable value) | Deepbugs Representation (with Types Incomplete with variable value) | data_word_level_deepbugs_with_type_and_variable.sh |
| 4 | Deepbugs Representation (with Types Incomplete without variable value) | Deepbugs Representation (with Types Incomplete without variable value) | data_word_level_deepbugs_with_type.sh |
| 5 | Code Simplification (Signatures) | Code Simplification (Signatures) | data_word_level_synthesized.sh |
| 6 | Code Simplification (Signatures with position anchor) | Code Simplification (Signatures with position anchor) | data_word_level_synthesized_with_anchor.sh |
| 7 | Code Simplification (Signatures with LIT/ID) | Code Simplification (Signatures with LIT/ID) | data_word_level_synthesized_with_ID_LIT.sh |
| 8 | Code Simplification (Signatures with position anchor and LIT/ID) | Code Simplification (Signatures with position anchor and LIT/ID) | data_word_level_synthesized_with_anchor_with_ID_LIT.sh |
| 9 | AST (of original code) | AST (of original code) | data_word_level_ast.sh |
| 10 | AST (of code simplification -> Type with variable value) | AST (of code simplification -> Type with variable value) | data_word_level_synthesized_with_variable_ast.sh |
| 11 | AST (of code simplification -> Types without variable value) | AST (of code simplification -> Types without variable value) | data_word_level_synthesized_without_variable_ast.sh |
| 12 | Preorder AST (of original code) | Preorder AST (of original code) | prepare_calls_ast_preorder.sh |
| 13 | Abstraction - Tufano | Abstraction - Tufano | prepare_calls_abstraction.sh |
| Buggy | Fixed | Dataset Generation Script |
|---|---|---|
| Code Simplification (function signatures with LIT/ID) | AST (of code simplification -> Types without variable value) | data_word_level_synthesized_with_ID_LIT_to_data_word_level_synthesized_without_variable_ast.sh |
| AST (of code simplification -> Types without variable value) | Code Simplification (function signatures with LIT/ID) | data_word_level_synthesized_without_variable_ast_to_data_word_level_synthesized_with_ID_LIT.sh |
| Word tokenization | AST | data_word_level_esprima_to_data_word_level_ast.sh |
| AST | Word tokenization | data_word_level_ast_to_data_word_level_esprima.sh |
| Buggy | Fixed | Dataset Generation Script |
|---|---|---|
| prepare_calls_tufano_abstraction_to_code_simplification_signatures_with_position_anchor.sh | ||
| prepare_code_simplification_signatures_with_position_anchor_to_calls_tufano_abstraction.sh |
| Embedding | Script |
|---|---|
| word2Vec-CBOW | getEmbeddings.sh |
| word2Vec-SkipGram | get-embedding-final-skipgram.sh |
| fastText | get-embedding-fasttext-final.sh |
| gloVe | cd glove && make && getEmbeddings_glove.sh |
| Embedding | Script |
|---|---|
| word2Vec-SkipGram | prepare_calls_abstraction_word2vec_skipgram.sh |
| fastText | prepare_calls_abstraction_fasttext.sh |
| gloVe | run prepare_calls_abstraction_glove.sh and then cd GloVe && make && getEmbeddings_glove.sh. Finally run ./train-final-save-log.sh |
python calculate_accuracy_and_rank.py test.correct test.buggy model.output