TY - GEN
T1 - Algorithms for low-latency remote file synchronization
AU - Yan, Hao
AU - Irmak, Utku
AU - Suei, Torsten
PY - 2008
Y1 - 2008
N2 - The remote file synchronization problem is how to update an outdated version of afile located on one machine to the current version located on another machine with a minimal amount of network communication. It arises in many scenarios including web site mirroring, file system backup and replication, or web access over slow links. A widely used open-source tool called rsync uses a single round of messages to solve this problem (plus an initial round for exchanging meta information). While research has shown that significant additional savings in bandwidth are possible by using multiple rounds, such approaches are often not desirable due to network latencies, increased protocol complexity, and higher I/O and CPU overheads at the endpoints. In this paper, we study single-round synchronization techniques that achieve savings in bandwidth consumption while preserving many of the advantages of the rsync approach. In particular, we propose a new and simple algorithm for file synchronization based on set reconciliation techniques. We then show how to integrate sampling techniques into our approach in order to adoptively select the most suitable algorithm and parameter setting for a given data set. Experimental results on several data sets show that the resulting protocol gives significant benefits over rsync, particularly on data sets with high degrees of redundancy between the versions.
AB - The remote file synchronization problem is how to update an outdated version of afile located on one machine to the current version located on another machine with a minimal amount of network communication. It arises in many scenarios including web site mirroring, file system backup and replication, or web access over slow links. A widely used open-source tool called rsync uses a single round of messages to solve this problem (plus an initial round for exchanging meta information). While research has shown that significant additional savings in bandwidth are possible by using multiple rounds, such approaches are often not desirable due to network latencies, increased protocol complexity, and higher I/O and CPU overheads at the endpoints. In this paper, we study single-round synchronization techniques that achieve savings in bandwidth consumption while preserving many of the advantages of the rsync approach. In particular, we propose a new and simple algorithm for file synchronization based on set reconciliation techniques. We then show how to integrate sampling techniques into our approach in order to adoptively select the most suitable algorithm and parameter setting for a given data set. Experimental results on several data sets show that the resulting protocol gives significant benefits over rsync, particularly on data sets with high degrees of redundancy between the versions.
UR - http://www.scopus.com/inward/record.url?scp=51349094370&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=51349094370&partnerID=8YFLogxK
U2 - 10.1109/INFOCOM.2007.40
DO - 10.1109/INFOCOM.2007.40
M3 - Conference contribution
AN - SCOPUS:51349094370
SN - 9781424420261
T3 - Proceedings - IEEE INFOCOM
SP - 655
EP - 663
BT - INFOCOM 2008
T2 - INFOCOM 2008: 27th IEEE Communications Society Conference on Computer Communications
Y2 - 13 April 2008 through 18 April 2008
ER -