TY - GEN
T1 - Deriving probabilistic databases with inference ensembles
AU - Stoyanovich, Julia
AU - Davidson, Susan
AU - Milo, Tova
AU - Tannen, Val
PY - 2011
Y1 - 2011
N2 - Many real-world applications deal with uncertain or missing data, prompting a surge of activity in the area of probabilistic databases. A shortcoming of prior work is the assumption that an appropriate probabilistic model, along with the necessary probability distributions, is given. We address this shortcoming by presenting a framework for learning a set of inference ensembles, termed meta-rule semi-lattices, or MRSL, from the complete portion of the data. We use the MRSL to infer probability distributions for missing data, and demonstrate experimentally that high accuracy is achieved when a single attribute value is missing per tuple. We next propose an inference algorithm based on Gibbs sampling that accurately predicts the probability distribution for multiple missing values. We also develop an optimization that greatly improves performance of multi-attribute inference for collections of tuples, while maintaining high accuracy. Finally, we develop an experimental framework to evaluate the efficiency and accuracy of our approach.
AB - Many real-world applications deal with uncertain or missing data, prompting a surge of activity in the area of probabilistic databases. A shortcoming of prior work is the assumption that an appropriate probabilistic model, along with the necessary probability distributions, is given. We address this shortcoming by presenting a framework for learning a set of inference ensembles, termed meta-rule semi-lattices, or MRSL, from the complete portion of the data. We use the MRSL to infer probability distributions for missing data, and demonstrate experimentally that high accuracy is achieved when a single attribute value is missing per tuple. We next propose an inference algorithm based on Gibbs sampling that accurately predicts the probability distribution for multiple missing values. We also develop an optimization that greatly improves performance of multi-attribute inference for collections of tuples, while maintaining high accuracy. Finally, we develop an experimental framework to evaluate the efficiency and accuracy of our approach.
UR - http://www.scopus.com/inward/record.url?scp=79957874172&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=79957874172&partnerID=8YFLogxK
U2 - 10.1109/ICDE.2011.5767854
DO - 10.1109/ICDE.2011.5767854
M3 - Conference contribution
AN - SCOPUS:79957874172
SN - 9781424489589
T3 - Proceedings - International Conference on Data Engineering
SP - 303
EP - 314
BT - 2011 IEEE 27th International Conference on Data Engineering, ICDE 2011
T2 - 2011 IEEE 27th International Conference on Data Engineering, ICDE 2011
Y2 - 11 April 2011 through 16 April 2011
ER -