项目摘要
The total volume of global digital data created or copied is estimated to double approximately every three years. This rapid growth has led to an increasing need for reliable and universal access to data in personal, enterprise, and scientific environments. To meet these requirements, data synchronization, which refers to the process of maintaining consistency between different versions of data stored on separate hosts, has become a crucial aspect of managing data. However, state-of-the-art synchronization tools have significant shortcomings and inefficiencies, resulting in increased costs and high-latency access. This project aims to develop data synchronization algorithms with optimal communication bandwidth based on error-correcting codes and to broaden the applicability of synchronization to real-world settings where current tools are inadequate. In addition to scientific and technological advances, the project has the potential to facilitate access to distributed storage systems for users with limited access to broadband Internet, such as in rural areas; help reduce energy consumption associated with data transmission; and provide opportunities to engage and train undergraduate researchers. The goals of the project will be achieved through three research thrusts. The first thrust aims to increase the efficiency of data synchronization by designing low-redundancy systematic edit-correcting codes, along with efficient encoding and decoding algorithms. The second thrust focuses on synchronizing compressed data. As conventional compression typically destroys the similarity between related files, the project will develop mutually compatible compression and synchronization methods which, given the prevalence of data compression, have the potential to significantly expand the use of synchronization for large datasets. Finally, the third thrust will address often-overlooked real-world constraints on synchronization from theoretical and practical points of view. In particular, bounds on the information exchange will be established where one party is under communication or computational constraints. Furthermore, incremental and adaptive synchronization protocols will be developed to efficiently synchronize data when the statistics of the stochastic processes governing data update and modification are unknown.This award reflects NSF's statutory mission and has been deemed worthy of support through evaluation using the Foundation's intellectual merit and broader impacts review criteria.
估计创建或复制的全球数字数据的总数估计每三年大约翻一番。这种快速的增长导致了在个人,企业和科学环境中对可靠和普遍访问数据的越来越多的需求。为了满足这些要求,数据同步是指维持存储在单独主机上的不同版本的数据之间的一致性的过程,已成为管理数据的关键方面。但是,最先进的同步工具具有明显的缺点和效率低下,从而增加了成本和高延迟访问权限。该项目旨在根据错误校正代码开发数据同步算法,并以最佳的通信带宽开发,并扩大同步对当前工具不足的现实世界设置的适用性。除了科学和技术进步外,该项目还可以促进访问有限访问宽带互联网(例如在农村地区)的用户的分布式存储系统;有助于减少与数据传输相关的能耗;并提供参与和培训本科研究人员的机会。该项目的目标将通过三个研究作用来实现。第一个推力旨在通过设计低固定性系统编辑校正代码以及有效的编码和解码算法来提高数据同步的效率。第二个推力重点是同步压缩数据。由于常规的压缩通常会破坏相关文件之间的相似性,因此该项目将开发相互兼容的压缩和同步方法,鉴于数据压缩的流行率,它有可能显着扩展对大数据集的同步使用。最后,第三个推力将从理论和实际的观点从同步方面解决经常被掩盖的现实世界限制。特别是,将在一方受到通信或计算约束的情况下建立信息交换的范围。此外,当尚不清楚有关数据更新和修改的随机过程的统计数据时,将开发增量和自适应同步协议,以有效地同步数据。该奖项反映了NSF的法定任务,并通过使用该基金会的智力优点和广泛的影响来评估NSF的法定任务,并被认为是值得的。
项目成果
期刊论文数量(0)
专著数量(0)
科研奖励数量(0)
会议论文数量(0)
专利数量(0)
数据更新时间:{{ journalArticles.updateTime }}
数据更新时间:{{ journalArticles.updateTime }}
数据更新时间:{{ monograph.updateTime }}
数据更新时间:{{ sciAwards.updateTime }}
数据更新时间:{{ conferencePapers.updateTime }}
数据更新时间:{{ patent.updateTime }}