[TMP-059] Grounded dialog models from observation of human-to-human conversation
(Semi)-automatic grounding of dialogue models through context embeddings
This microproject follows up on a previous one, aiming to design grounded dialogue models based on human-to-human conversation examples. Current dialogue models, which rely on fine-tuned large language models, lack grounding or require external, manually designed grounding. This project seeks to (semi)-automatically ground these models by embedding dialogue context into vector spaces through models trained on conversational data. We represent dialogue states as vectors, treating the entire conversation as a trajectory in vector space. By merging and modeling these trajectories, we can create dialogue skeletons, such as finite-state graphs, for tasks like data exploration, content visualization, and topic detection.
This approach aims to streamline the design of reliable conversation models while analyzing human-to-human dialogue progression, including negotiation of common ground. The project will explore optimal dialogue context embeddings, such as temporal resolution, and methods for merging dialogue trajectories. Variational Recurrent Neural Networks with discrete embeddings (Shi et al., NAACL 2019) will be a primary focus, though other architectures will also be considered.
We will experiment with both text and voice dialogues using the MultiWOZ corpus (EMNLP 2018) and DIASER extension (LREC 2022) for text-based data, and the MultiWOZ spoken data and other voice-based data for spoken dialogue.
We were part of the JSALT workshop in Le Mans (https://jsalt2023.univ-lemans.fr/en/index.html ), with full-time work on the topic of task-oriented dialogue structure extraction and realistic dialogue data and evaluation. The motivation is deploying AI agents instead of human agents in call centers, where calls follow similar patterns but these are not analyzed. The results include:
- A private dataset of dialogues which have not been leaked into LLM training
- Dialogue embeddings training toolkit
- An analysis of multiple algorithms for extracting dialogue structure from data
- Shared audio and text representation models – these allow to create a voice-based dialogue system implicitly, without having a speech recognition module, as the audio representations are aligned to text embeddings
- Long-context speech recognition (using dialogue context) based on LLMs with adaptors
While most of these works are not (yet) published, a detailed overview of the results is given in the final presentation video of the workshop.
Tangible Outcomes
- Burdisso et al. (EMNLP 2024 main conference): Dialog2Flow: Pre-training Soft-Contrastive Action-Driven Sentence Embeddings for Automatic Dialog Flow Extraction. https://arxiv.org/abs/2410.18481 .
- A unified multi-domain dialogue dataset is introduced and released along with the paper “Dialog2Flow: Pre-training Soft-Contrastive Action-Driven Sentence Embeddings for Automatic Dialog Flow Extraction” (Burdisso et al. – EMNLP 2024 main conference). HuggingFace page: https://huggingface.co/datasets/sergioburdisso/dialog2flow-dataset . Source code was released along with the paper “Dialog2Flow: Pre-training Soft-Contrastive Action-Driven Sentence Embeddings for Automatic Dialog Flow Extraction” (Burdisso et al. – EMNLP 2024 main conference).
- Source code comes with tool-like scripts to convert any collection of dialogs to a dialog flow automatically. https://github.com/idiap/dialog2flow (it will be available by Dec 2024)
- The code repository for long-context ASR is public: https://github.com/keya-dialog/LoCo-ASR/tree/main
- a jupyter notebook tutorial for joint speech-text embeddings for spoken language understanding. https://github.com/keya-dialog/LoCo-ASR/tree/main
- Summer school presentation on dialogue + tutorial on QLoRA finetuning an LLM for dialogue was produced for the purpose of the workshop and is available online.
- presentation slides:
- part 1 (on dialogue modelling): https://raw.githubusercontent.com/keya-dialog/jsalt-dialogue-lab/refs/heads/main/conv_ai_v6.pdf
- part 2 (on LLMs): https://raw.githubusercontent.com/keya-dialog/jsalt-dialogue-lab/refs/heads/main/llms_v6.pdf
- presentation slides:
- Tutorial code: https://github.com/keya-dialog/jsalt-dialogue-lab
- Summer school presentations in JSALT. Final JSALT presentation video, featuring detailed description of all results:
Partners
- Brno U, Petr Schwarz
- CUNI, Ondrej Dusek
- Eötvös Loránd University (ELTE), Andras Lorincz
- IDIAP, Petr Motlicek