TY - GEN
T1 - HERB+
T2 - 27th IEEE Pacific Rim International Symposium on Dependable Computing, PRDC 2022
AU - Liao, Qianying
AU - Santos, Alexandre Cortez
AU - Cabral, Bruno
AU - Fernandes, João Paulo
AU - Lourenco, Nuno
N1 - Publisher Copyright:
© 2022 IEEE.
PY - 2022
Y1 - 2022
N2 - Supervised machine learning does not hold without data. However, the needed data can be distributed in different locations and are non-shareable under privacy constraints. Methods to circumvent disclosure restrictions in collaborative machine learning are in strong demand. Thus, we propose HERB+ (Homomorphic Encryption for Random forest and gradient Boosting plus), a confidential learning framework for tree-based models under the scenario of vertically dispersed data. While previous related work focused on a specific algorithm, this work presents a wide variety of privacy-preserved and distributed tree-based algorithms (i.e., Decision Tree, Random Forest, and Gradient Boosting Decision Trees for both classification and regression tasks). HERB+ provides the most detailed and general discussions on using Fully Homomorphic Encryption for computing distributed tree-based algorithms during the training process. Our experiments show that although the learning protocols' efficiencies are not optimal, the predictive performance and privacy are preserved. The results imply that practitioners can overcome the barrier of data sharing and produce tree-based models for data-heavy domains with strict privacy requirements, such as Health Prediction, Fraud Detection, and Risk Evaluation.
AB - Supervised machine learning does not hold without data. However, the needed data can be distributed in different locations and are non-shareable under privacy constraints. Methods to circumvent disclosure restrictions in collaborative machine learning are in strong demand. Thus, we propose HERB+ (Homomorphic Encryption for Random forest and gradient Boosting plus), a confidential learning framework for tree-based models under the scenario of vertically dispersed data. While previous related work focused on a specific algorithm, this work presents a wide variety of privacy-preserved and distributed tree-based algorithms (i.e., Decision Tree, Random Forest, and Gradient Boosting Decision Trees for both classification and regression tasks). HERB+ provides the most detailed and general discussions on using Fully Homomorphic Encryption for computing distributed tree-based algorithms during the training process. Our experiments show that although the learning protocols' efficiencies are not optimal, the predictive performance and privacy are preserved. The results imply that practitioners can overcome the barrier of data sharing and produce tree-based models for data-heavy domains with strict privacy requirements, such as Health Prediction, Fraud Detection, and Risk Evaluation.
KW - BFV
KW - BGV
KW - CART
KW - CKKS
KW - decision tree
KW - fully homomorphic encryption
KW - gradient boosting
KW - privacy-preserving machine learning
KW - random forest
KW - vertical distribution
UR - http://www.scopus.com/inward/record.url?scp=85147843154&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85147843154&partnerID=8YFLogxK
U2 - 10.1109/PRDC55274.2022.00035
DO - 10.1109/PRDC55274.2022.00035
M3 - Conference contribution
AN - SCOPUS:85147843154
T3 - Proceedings of IEEE Pacific Rim International Symposium on Dependable Computing, PRDC
SP - 212
EP - 223
BT - Proceedings - 2022 IEEE 27th Pacific Rim International Symposium on Dependable Computing, PRDC 2022
PB - IEEE Computer Society
Y2 - 28 November 2022 through 1 December 2022
ER -