Research

We are a group of machine learning researchers working on “Foundational AI Research at Scale”. We have built the first error recovery based generative inference framework, namely GEAR, that works in a “plug-and-play manner”. In the efficient fine-tuning space, we developed an adaptive LoRA freezing method, the SoTA PEFT method for language models. We build one of the fastest billion-scale graph learning platforms, and one of the first foundation models for knowledge graph reasoning. We are working on long-form video representation research using sparse transformers and graphs and producing state-of-the-art results on several benchmarks. We do forefront research on efficient multimodal generative AI including text-to-image diffusion models, image editing with precise 3D control. We do advanced research on AI for chip design and foundation model optimization. Finally, we are accelerating and democratizing AI for science - focusing on new benchmarks and tools for Material Science and Protein Synthesis.

Here are some themes and techniques that we currently work on:

Foundation Model Optimization (FoMo).

The foundation model optimization research of AIL focuses on model architecture develpment, efficient fine-tuning, latency and/or throughput efficient inference methodologies. The borader goal is to support million scale tokens at limited compute and memory budget. In the model architecture development space, our research primarily focuses on sub-quadratic attention mechanisms. Towards that goal, our work SAL-ViT (ICCV 2023) presents an hybrid architecture having quadratic attention and sub-quadratic attentions at different layers based on their operational performance sensitivity. Refer to the paper.

  • Secondly, we have dedicated research efforts ongoing to advance the foundation model fine-tuning research for their efficient trainability on downstream task with limited data and resources (CVPR 2024, ICLR 2024). A recent work from our lab that maintains the current SoTA in PEFT is here.

  • Lastly, to advance the LLM inference efficiency, we have recently collaborated on a multi-institution project, namely project GEAR (generative inference via approximation and error recovery). To know more about the project please refer to the project page.

  • Other research in this thread includes data-free quantization for privacy preserving fine-tuning, foundation model applications in RTL design, robsutness vulnerability of efficient model deplyment.

Long-Form Video Representation Learning.

We push the boundaries of long-form video representation learning, devising architectural motifs which leads to context aggregation over 10X-50X longer time support compared to existing methods. We take the approach of sparse models, either by explicitly modeling videos as a spatio-temporal graph or by learning sparse video-text transformers. Our method achieves state-of-the-art results at a fraction of the memory and compute cost of dense transformers. On a wide range of settings, the longer temporal support enabled by the novel representations consistently increases accuracy. They outperform on several downstream applications and benchmarks including but not limited to video recognition, video question-answering, episodic memory tasks, active speaker detection, action detection, temporal segmentation, multimodal retrieval, and video summarization. Recently, we started exploring video state-space-models for aggregating ~20X longer temporal context.

Refer to the blog we wrote for our research on structured representation learning for long-term video understanding. We opensourced the toolbox for graph-based video representation learning.

Refer to the website to learn more on sparse video-text transformers.

Graph Foundation Models.

We developed ULTRA, a foundation model for knowledge graph (KG) reasoning. A single pre-trained ULTRA model performs link prediction tasks on any multi-relational graph with any entity / relation vocabulary. Performance-wise averaged on 50+ KGs, a single pre-trained ULTRA model is better in the 0-shot inference mode than many SOTA models trained specifically on each graph. Following the pretrain-finetune paradigm of foundation models, you can run a pre-trained ULTRA checkpoint immediately in the zero-shot manner on any graph as well as use more fine-tuning.

ULTRA provides unified, learnable, transferable representations for any KG. Under the hood, ULTRA employs graph neural networks and modified versions of NBFNet. ULTRA does not learn any entity and relation embeddings specific to a downstream graph but instead obtains relative relation representations based on interactions between relations. Refer to the website to learn more on our graph foundation model.

Democratizing machine learning on billion-scale graphs is a core focus of Intel AI Lab. Our goal has been to shift this important AI training workload from expensive GPUs to inexpensive CPUs. Our SAR framework already allows a seamless transition from training on a single machine to fully distributed training with linear peak-memory scaling guarantees - this set the fastest reported training times on CPUs for billion scale graph learning.

AI for Science.

We have been leading a variety of research efforts on AI for Science at Intel Labs along with external collaboration spanning various research institutions, including:

  • Matter Lab led by Alán Aspuru-Guzik at the University of Toronto Overview.
  • MILA, including multiple academic PIs Overview.
  • Intel + Merck Group Research Center on AI for Sustainable Semiconductor Manufacturing. Overview.
  • We organized the 1st AI for Accelerated Materials Design (AI4Mat) Workshop at NeurIPS 2022, followed by the next one at NeurIPS 2023. Overview.

AI for Chip Design.

Learning mechanisms for Electronic Design Automation (EDA) are increasingly becoming a focal point in research. These mechanisms hold the promise of delivering performance gains and quality improvements by orders of magnitude. Currently, our work is centered on the floor planning problem. We have developed solutions that outperform classical search solutions in both speed and quality.