项目摘要
Supervised machine learning has found widespread application, often achieving state-of-the-art performance. However, these algorithms rely on labeled training instances, which can be challenging to acquire. Labeled instances are often done by humans and require time and money to obtain. Active Learning strives to minimize labeling costs by identifying the most informative instances for annotation. While Active Learning techniques have shown promise in producing high-performance models with fewer labels, their applications remain constrained due to the necessity for multiple interaction rounds with annotators, which can be time-consuming or infeasible. This project aims to advance Active Learning algorithms and understanding of their fundamental capabilities in scenarios with limited interaction rounds. A broad spectrum of machine learning applications is expected to benefit from the results of this research, reducing the time and cost associated with obtaining sufficient data for training accurate models. Additionally, this project engages underrepresented minority students through hands-on research and learning activities, develops course modules on resource-efficient machine learning, and disseminates our findings to industry and academia via an extensive online Active Learning tutorial.This project will launch a comprehensive investigation of few-round active learning, where the learner can actively request feedback on specific data points within a limited number of rounds. To achieve this, the project will interleave two algorithmic tasks: robust data utility quantification and planning with limited adaptivity. First, the investigators will explore methods to measure the utility of unlabeled data, taking into account data size, underlying data characteristics, and downstream learning tasks. Subsequently, the team will develop algorithms that optimize the data utility metric while simultaneously improving the metric's quality over time in a few-round active learning setting. The project findings will establish principled approaches for addressing a novel exploration-exploitation dilemma specific to few-round active learning and provide a fundamental understanding of adaptivity's role in budgeted learning. Finally, the project will evaluate the proposed approaches across various high-impact machine learning applications, including autonomous driving, smart buildings, dialog systems, and biochemical engineering.This award reflects NSF's statutory mission and has been deemed worthy of support through evaluation using the Foundation's intellectual merit and broader impacts review criteria.
监督的机器学习发现了广泛的应用程序,通常可以实现最先进的性能。但是,这些算法依赖于标记的培训实例,这可能具有挑战性。标记的实例通常是由人类完成的,需要时间和金钱才能获得。 主动学习努力通过确定注释最有用的实例来最大程度地减少标签成本。尽管有效的学习技术在产生具有更少标签的高性能模型方面已显示出希望,但由于有必要与注释者进行多个交互作用,因此它们的应用仍受到限制,这可能是耗时或不可行的。该项目旨在提高积极的学习算法及其在互动率有限的情况下对其基本能力的理解。预计大量的机器学习应用程序将从这项研究的结果中受益,从而减少与获得足够数据相关的时间和成本来获得培训准确模型。此外,该项目通过动手研究和学习活动使人数不足的少数群体学生参与,开发有关资源有效的机器学习的课程模块,并通过广泛的在线积极学习教程将我们的发现传播给行业和学术界。该项目将对少数积极的学习进行全面的调查,以便在此启动少数积极的学习,在这里,学习者可以在其中索取对特定数据范围内有限数量的综合数量的反馈。为了实现这一目标,该项目将交流两个算法任务:适应性的数据实用程序量化和计划有限。首先,研究人员将探讨如何考虑数据大小,潜在的数据特征和下游学习任务来衡量未标记数据的实用性。随后,团队将开发算法来优化数据实用标准,同时在几次积极的学习环境中随着时间的推移提高度量标准的质量。该项目的发现将建立针对一些新颖的探索探索难题的原则方法,这些难题是针对几场活跃学习的特定的,并提供了对适应性在预算学习中的作用的基本理解。最后,该项目将评估各种高影响力的机器学习应用程序所提出的方法,包括自动驾驶,智能建筑物,对话系统和生化工程。该奖项反映了NSF的法定任务,并被认为是值得通过基金会的知识分子优点和更广泛影响的审查标准来通过评估来支持的。
项目成果
期刊论文数量(0)
专著数量(0)
科研奖励数量(0)
会议论文数量(0)
专利数量(0)
数据更新时间:{{ journalArticles.updateTime }}
数据更新时间:{{ journalArticles.updateTime }}
数据更新时间:{{ monograph.updateTime }}
数据更新时间:{{ sciAwards.updateTime }}
数据更新时间:{{ conferencePapers.updateTime }}
数据更新时间:{{ patent.updateTime }}