TY - GEN
T1 - Herb
T2 - 2022 International Joint Conference on Neural Networks, IJCNN 2022
AU - Liao, Qianying
AU - Cabral, Bruno
AU - Fernandes, Joao Paulo
AU - Lourenco, Nuno
N1 - Publisher Copyright:
© 2022 IEEE.
PY - 2022
Y1 - 2022
N2 - Building a Machine Learning model requires the use of large amounts of data. Due to privacy and regulatory concerns, these data might be owned by multiple sites and are often not mutually shareable. Our work deals with private learning and inference for the Weighted Random Forest model when data records are vertically distributed among multiple sites. Previous privacy-preserving vertical tree-based frameworks either adapt Secure Multi-party Computation or share intermediate results and are hard to generalize or scale. In contrast, our proposal contains efficient collaborative calculation algorithms of the Gini Index and Entropy for computing the impurity of decision tree nodes while protecting all intermediate values and disclosing minimal information. We offer a learning protocol based on the Paillier Cryptosystem and Digital Envelope. Also, we provide an inference protocol found on the Look-up Table. Our experiments show that the proposed protocols do not cause predictive performance loss while still establishing and utilizing the model within a reasonable time. The results imply that practitioners can overcome the barrier of data sharing and produce random forest models for data-heavy domains with strict privacy requirements, such as Health Prediction, Fraud Detection, and Risk Evaluation.
AB - Building a Machine Learning model requires the use of large amounts of data. Due to privacy and regulatory concerns, these data might be owned by multiple sites and are often not mutually shareable. Our work deals with private learning and inference for the Weighted Random Forest model when data records are vertically distributed among multiple sites. Previous privacy-preserving vertical tree-based frameworks either adapt Secure Multi-party Computation or share intermediate results and are hard to generalize or scale. In contrast, our proposal contains efficient collaborative calculation algorithms of the Gini Index and Entropy for computing the impurity of decision tree nodes while protecting all intermediate values and disclosing minimal information. We offer a learning protocol based on the Paillier Cryptosystem and Digital Envelope. Also, we provide an inference protocol found on the Look-up Table. Our experiments show that the proposed protocols do not cause predictive performance loss while still establishing and utilizing the model within a reasonable time. The results imply that practitioners can overcome the barrier of data sharing and produce random forest models for data-heavy domains with strict privacy requirements, such as Health Prediction, Fraud Detection, and Risk Evaluation.
KW - decision tree
KW - digital envelope
KW - paillier cryptosystem
KW - privacy-preserving machine learning
KW - vertical paradigm
UR - http://www.scopus.com/inward/record.url?scp=85140742218&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85140742218&partnerID=8YFLogxK
U2 - 10.1109/IJCNN55064.2022.9892321
DO - 10.1109/IJCNN55064.2022.9892321
M3 - Conference contribution
AN - SCOPUS:85140742218
T3 - Proceedings of the International Joint Conference on Neural Networks
BT - 2022 International Joint Conference on Neural Networks, IJCNN 2022 - Proceedings
PB - Institute of Electrical and Electronics Engineers Inc.
Y2 - 18 July 2022 through 23 July 2022
ER -