This paper introduces probabilistic databases with unmerged duplicates (DB ud ), i.e., databases containing probabilistic information about instances found to describe the same real-world objects. We discuss the need for efficiently querying such databases and for supporting practical query scenarios that require analytical or summarized information. We also sketch possible methodologies and techniques that would allow performing efficient processing of queries over such probabilistic databases, and especially without the need to materialize the (potentially, huge) collection of all possible deduplication worlds.
|Title of host publication||Proceedings of the International Conference on Scalable Uncertainty Management (SUM2014)|
|Place of Publication||Cham|
|Publication status||Published - 2014|
|Name||Lecture Notes in Computer Science|
Ioannou, E., & Garofalakis, M. N. (2014). Analytics over probabilistic unmerged duplicates. In Proceedings of the International Conference on Scalable Uncertainty Management (SUM2014) (pp. 203-208). (Lecture Notes in Computer Science; Vol. 8720). Springer. https://doi.org/10.1007/978-3-319-11508-5_17