TY - GEN
T1 - Towards language-agnostic alignment of product titles and descriptions
T2 - 2019 World Wide Web Conference, WWW 2019
AU - Stein, Daniel
AU - Shterionov, Dimitar
AU - Way, Andy
PY - 2019/5/13
Y1 - 2019/5/13
N2 - The quality of e-Commerce services largely depends on the accessibility of product content as well as its completeness and correctness. Nowadays, many sellers target cross-country and cross-lingual markets via active or passive cross-border trade, fostering the desire for seamless user experiences. While machine translation (MT) is very helpful for crossing language barriers, automatically matching existing items for sale (e.g. the smartphone in front of me) to the same product (all smartphones of the same brand/type/colour/condition) can be challenging, especially because the seller's description can often be erroneous or incomplete. This task we refer to as item alignment in multilingual e-commerce catalogues. To facilitate this task, we develop a pipeline of tools for item classification based on cross-lingual text similarity, exploiting recurrent neural networks (RNNs) with and without pre-trained word-embeddings. Furthermore, we combine our language agnostic RNN classifiers with an in-domain MT system to further reduce the linguistic and stylistic differences between the investigated data, aiming to boost our performance. The quality of the methods as well as their training speed is compared on an in-domain data set for English-German products.
AB - The quality of e-Commerce services largely depends on the accessibility of product content as well as its completeness and correctness. Nowadays, many sellers target cross-country and cross-lingual markets via active or passive cross-border trade, fostering the desire for seamless user experiences. While machine translation (MT) is very helpful for crossing language barriers, automatically matching existing items for sale (e.g. the smartphone in front of me) to the same product (all smartphones of the same brand/type/colour/condition) can be challenging, especially because the seller's description can often be erroneous or incomplete. This task we refer to as item alignment in multilingual e-commerce catalogues. To facilitate this task, we develop a pipeline of tools for item classification based on cross-lingual text similarity, exploiting recurrent neural networks (RNNs) with and without pre-trained word-embeddings. Furthermore, we combine our language agnostic RNN classifiers with an in-domain MT system to further reduce the linguistic and stylistic differences between the investigated data, aiming to boost our performance. The quality of the methods as well as their training speed is compared on an in-domain data set for English-German products.
UR - http://www.scopus.com/inward/record.url?scp=85066881683&partnerID=8YFLogxK
U2 - 10.1145/3308560.3316602
DO - 10.1145/3308560.3316602
M3 - Conference contribution
AN - SCOPUS:85066881683
T3 - The Web Conference 2019 - Companion of the World Wide Web Conference, WWW 2019
SP - 387
EP - 392
BT - The Web Conference 2019 - Companion of the World Wide Web Conference, WWW 2019
PB - Association for Computing Machinery
Y2 - 13 May 2019 through 17 May 2019
ER -