项目摘要
Deep learning has demonstrated unprecedented performance across various domains in engineering and science. However, the theoretical understanding of their success has remained elusive. Very recently, researchers discovered and characterized an elegant mathematical structure within the learned features and classifiers called Neural Collapse. This phenomenon persists across a variety of different network architectures, datasets, and data domains. This project will leverage the symmetry of Neural Collapse to develop a rigorous mathematical theory to explain when and why it happens and how it can be used to quantify generalization performance and provide guidelines to understand and improve transferability. By advancing the mathematical foundations of deep learning, this project is expected to influence not only the machine learning community, but also related areas such as optimization, signal and image processing, and natural language processing. The project also involves an integrated outreach and education plan, including promoting accessibility and awareness of computing and STEM concepts for K-12 students.This project will expand our understanding of the principles behind non-convex optimization of training deep learning models, and provide new mathematical insights on their generalization and transferability properties, leading to practical implications. In particular, the project is focused on the following three overarching research thrusts: (i) provide a unified framework to analyze convergence guarantees for training deep and overparametrized models through general loss functions to states of neural collapse, first for simplified cases and then for more general deep models that exhibit progressive neural collapse, with multi-labels and data imbalance; (ii) harness the structure of neural collapse to provide tighter generalization bounds for deep models, by characterizing the structure of the resulting classifiers and their mild dependence on the training data, as well as by making natural distributional assumptions; (iii) leverage the generalization of progressive neural collapse to new environments to understand transferability of deep models to new domains and tasks, and develop principled approaches for improving transferability and efficient fine-tuning.This award reflects NSF's statutory mission and has been deemed worthy of support through evaluation using the Foundation's intellectual merit and broader impacts review criteria.
深度学习表明,工程和科学领域的各个领域表现出了前所未有的表现。但是,对他们成功的理论理解仍然难以捉摸。最近,研究人员发现并描述了在称为神经崩溃的学识渊博的特征和分类器中的优雅数学结构。这种现象一直存在于各种不同的网络体系结构,数据集和数据域中。该项目将利用神经崩溃的对称性来开发一种严格的数学理论,以解释何时以及如何使用它来量化概括性能并提供指导方针,以理解和提高可传递性。通过推进深度学习的数学基础,该项目不仅会影响机器学习社区,还会影响相关领域,例如优化,信号和图像处理以及自然语言处理。该项目还涉及一项综合的外展和教育计划,包括促进K-12学生的计算和STEM概念的可访问性和意识。该项目将扩展我们对非convex优化培训深度学习模型背后的原理的理解,并提供有关其概括性和转移性属性的新数学洞察力,从而带来实际含义。特别是,该项目的重点是以下三个总体研究推力:(i)提供一个统一的框架来分析收敛保证,以通过对神经崩溃状态的一般损失功能进行培训,以训练深层和过度参数化的模型,首先是简化的病例,然后是更深层的深层模型,以表现出具有多型Labels和数据IMBAIL和数据Imbalance的渐进性神经崩溃; (ii)利用神经崩溃的结构,通过表征所得分类器的结构及其对训练数据的温和依赖,并通过做出自然的分布假设来为深层模型提供更严格的概括界限; (iii)利用进行性神经崩溃对新环境的概括,以了解深层模型向新领域和任务的转移性,并开发有原则性的方法来提高可转移性和有效的微调。这一奖项反映了NSF的法定任务,并通过评估该基金会的知识分子功能和广泛的影响来评估NSF的法定任务。
项目成果
期刊论文数量(0)
专著数量(0)
科研奖励数量(0)
会议论文数量(0)
专利数量(0)
数据更新时间:{{ journalArticles.updateTime }}
数据更新时间:{{ journalArticles.updateTime }}
数据更新时间:{{ monograph.updateTime }}
数据更新时间:{{ sciAwards.updateTime }}
数据更新时间:{{ conferencePapers.updateTime }}
数据更新时间:{{ patent.updateTime }}