A temporary repository of our KDD 2023 accepted paper - Learning to Relate to Previous Turns in Conversational Search.
Main packages:
- python 3.8
- torch 1.8.1
- transformer 4.2.0
- numpy 1.22
- faiss-gpu 1.7.2
- pyserini 0.16
We take TopiOCQA dataset as example. (The same for the remaining datasets)
Four public datasets can be download from QReCC, TopiOCQA, CAsT-19 and CAsT-20. Data preprocessing can refer to the file end with "_raw" in "preprocess" folder.
First, generate the qrel file by function "create_label_rel_turn" in "preprocess_topiocqa.py"
Second, generate the pseudo relevant label (PRL) by
python test_rel_topiocqa.py --config=Config/test_rel_topiocqa.toml
The output file "(train)dev_rel_label.json" contains the PRL for each turn.
Use the pseudo relevant training data generated in step 2.
python train_selector.py --config=Config/train_selector.toml
Then the trained selector can be used for turn relevance judgment for off-the-shelf retriever.
python gen_tokenized_doc.py --config=Config/gen_tokenized_doc.toml
python gen_doc_embeddings.py --config=Config/gen_doc_embeddings.toml
Download ANCE model.
Change the config with using ANCE as backbone and "True" for "use_PRL"
python test_topiocqa.py --config=Config/test_topiocqa.toml
Using both the pseudo relevant training data generated in step 2 and conversational search data.
python train_selector_ranking.py --config=Config/train_selector_ranking.toml
Change the config with using S-R model trained in step 6 as backbone and "False" for "use_PRL"
python test_topiocqa.py --config=Config/test_topiocqa.toml
@inproceedings{mo2023learning,
title={Learning to Relate to Previous Turns in Conversational Search},
author={Mo, Fengran and Nie, Jian-Yun and Huang, Kaiyu and Mao, Kelong and Zhu, Yutao and Li, Peng and Liu, Yang},
booktitle={29th ACM SIGKDD Conference On Knowledge Discover and Data Mining (SIGKDD)},
year={2023}
}