We propose a module that performs automatic analysis of user input in spoken dialogue systems using machine learning algorithms. The input to the module is material received from the speech recogniser and the dialogue manager of the spoken dialogue system, the output is a four-level pragmatic-semantic representation of the user utterance. Our investigation shows that when the four interpretation levels are combined in a complex machine learning task, the performance of the module is significantly better than the score of an informed baseline strategy. However, via a systematic, automatised search for the optimal subtask combinations we can gain substantial improvement produced by both classifiers for all four interpretation subtasks. A case study is conducted on dialogues between an automatised, experimental system that gives information on the phone about train connections in the Netherlands, and its users who speak in Dutch. We find that drawing on unsophisticated, potentially noisy features that characterise the dialogue situation, and by performing automatic optimisation of the formulated machine learning task it is possible to extract sophisticated information of practical pragmatic-semantic value from spoken user input with robust performance. This means that our module can with a good score interpret whether the user of the system is giving slot-filling information, and for which query slots (e.g., departure station, departure time, etc.), whether the user gave a positive or a negative answer to the system, or whether the user signals that there are problems in the interaction.
|Qualification||Doctor of Philosophy|
|Award date||20 Dec 2004|
|Place of Publication||Enschede|
|Publication status||Published - 2004|