Kensho NeurIPS 2023 Recap — Kensho | S&P Global’s AI Engine

Recapping the most compelling ideas and emerging research from our time at NeurIPS 2023, the world's premier AI conference.

By Arijit Sehanobish and Shirley Anderson

Welcome to our quick recap of NeurIPS 2023, a yearly conference that stands at the forefront of advancement in artificial intelligence. Last December, we had the incredible opportunity to attend many engaging talks and tutorials.

Overall, attending NeurIPS as an Applied Scientist in NLP (Arijit) and as an AI product designer (Shirley) offers a unique chance to immerse ourselves in the latest advancements, network with peers and industry leaders, and contribute to the ongoing progress of the field.

The benefits of attending NeuIPS include learning about cutting-edge research and industry trends and networking with peers.

Cutting-Edge Research: NeurIPS is one of the premier conferences in machine learning and artificial intelligence, attracting top researchers worldwide. It’s a hub for the latest advancements in NLP, offering insights into state-of-the-art models, techniques, and methodologies.
Networking Opportunities: NeurIPS brings together experts, researchers, and practitioners in the field of NLP. Attending the conference provides us with ample opportunities to network with peers, exchange ideas, and establish connections that could lead to collaborations or future learning opportunities.
Learning from Keynotes and Workshops: NeurIPS features keynote presentations by leading figures in AI and NLP, as well as workshops focusing on specialized topics within the field. These sessions offer valuable insights, perspectives, and practical knowledge that can enhance our understanding and expertise in NLP.
Paper Presentations and Poster Sessions: NeurIPS showcases a wide range of research papers and poster presentations covering various aspects of NLP, including novel algorithms, applications, and theoretical advancements. Attending these sessions allows us to stay updated on the latest developments and gain inspiration for our work.
Industry Trends and Applications: NeurIPS isn’t solely focused on academic research; it also highlights industry trends and applications of NLP.

Specific highlights from Arijit

I will mostly talk about the panel discussion and the keynotes.

The first up was Lora Aroyo from Google Research. Her talk was titled “Many faces of Responsible AI.” She highlighted the challenge that most data is characterized as binary i.e. positive and negative examples. That oversimplification is not compatible with real life, particularly ambiguity in data interpretation and inter-reader variability. Finally, she presented the ‘Safety with Diversity’ method and a benchmark dataset with variability of LLM safety judgements across various demographic groups of users. The main takeaway from this part of the presentation was the diversity of raters and data plays a crucial role in evaluating models. Failing to acknowledge the wide range of human perspectives and the ambiguity present in content can hinder the alignment of machine learning performance with real-world expectations.

Next, I saw Professor Christopher Re from Stanford University talking about “Systems for Foundation Models, and Foundation Models for Systems.” This was a broad ranging talk discussing how foundation models can be used for data cleaning tasks and the effectiveness of such models in this task. The next theme of his talk was about efficient computation and architecture. To that end, he discussed FlashAttention which is now used very widely by researchers and practitioners. He then discussed his exciting work on using structured matrices like monarch matrices to lower the computation budget of Transformers training. Finally, he touched on the exciting new directions on alternative architectures like State Space Models and Mamba which have sub-quadratic time complexity.

Finally, I saw the exciting panel discussion called ‘Beyond Scaling Panel’ moderated by Sasha Rush from Cornell and HuggingFace. The participants include Aakanksha Chowdhery (Google DeepMind, core contributor for Gemini, PaLM and Pathways) Angela Fan (Meta Generative AI, core contributor of Llama and Meta AI Assistant), Percy Liang (Stanford University and Director of Center for Research on Foundation Models (CRFM)) and Jie Tang (Tsinghua University, creator of ChatGLM and CogVLM).

The key takeaways from this panel discussion were the following:

Training Language models is easy. With proliferation of well-curated datasets and infrastructure like HuggingFace, even an undergraduate can train a LM on a GPU or even multiple GPUs hosted on a single machine. However the main challenge is to scale this to more GPUs across machines or even data centers spread across the world.
With the scarcity of data for model training, researchers are delving into different methodologies. One avenue involves training multimodal models using text, video, images, and audio together, hoping that skills acquired from diverse modalities can transfer to text. There was also a discussion about using synthetic data to address this data challenge.
There was a discussion whether open foundation models are potentially harmful for AI safety, as they can be exploited by malicious actors. However, Dr. Liang argues that open models also contribute positively to safety. He argues that by being accessible, they provide more researchers with opportunities to conduct AI safety research and to review the models for potential vulnerabilities.
Moreover, there was a discussion about using autoregressive models for image/video generation. Specifically, this was done in the Gemini project. There have also been explorations into using diffusion models for text generation, but these have not yet been proven effective.
Annotating data requires a notably higher level of expertise in the annotation field than it did five years ago. With the anticipated performance boost of AI assistants in the future, gathering valuable feedback data from users may lessen dependence on extensive data from annotators.

Other than these exciting talks, there are many LLM-related papers delving into understanding in-context learning, fairness, reasoning, tool usage and many others. In fact, these sub-use cases have generated so much interest that there are separate oral sessions on some of these topics like Chain-of-Thought/reasoning, Efficiency Learning, Tool usage, LLM (more broadly) and also in RL.

Specific highlights from Shirley

As an AI product designer, my favorite parts of the conference involve real-world AI and human-AI system design applications. Here are some of my favorite learnings:

On working with real humans in AI systems:

Google Deepmind showed a sample cancer case study caught by all six radiologists but missed by the AI system. Such edge cases suggest complementary roles for the AI system and human readers in reaching accurate conclusions.

There are 3 levels of human-AI collaboration

Beginner level: Complimentary (AI that: knows when it doesn’t know, can defer to humans)
Intermediate level: Co-operative (AI that: knows what it doesn’t know and can ask for more information)
Advanced level: Fully interactive (AI that: knows when it doesn’t know, can ask for more information, can interact with humans to make the best decision)

Selective prediction, human-AI teaming, and deferral systems have design and cost considerations nuances. For deferral systems, the designer must consider what is communicated to the human, AI output, explanations, and confidence scores.

Advantages of black-box deferral models:

You can work with 3rd party vendors
You can work with IP restrictions
You can also work with regulatory requirements
Simple to implement, recalibrate under distribution shift

Given the AI summer we’re experiencing, it’s essential to think about extending the human experience and not just about extending the technology itself.

In design, there’s a concept of Norman doors, which also applies to AI. Have you ever pushed a door when you should have pulled, or vice versa? If so, you’ve experienced a Norman door. It’s a confusing or difficult-to-use door that’s frustrating for everyone. The good news is that product designers can avoid creating Norman doors by prioritizing usability and human experience in their designs. Norman AI has a similar idea, focusing on the technology itself instead of how people interact with it. Ultimately, we want to move from a technology-centric paradigm regarding AI systems to a human-centric paradigm to encourage human flourishing and increase AI project successes. This might involve design thinking, getting to the root cause of business problems, and designing human-AI collaboration systems that prioritize human needs early in development. Are there other human factors to consider? Beyond the visible elements of the interfaces and psychological and cognitive aspects, it is essential to be aware of the underlying trust and reliance in the system and its implications in the real world and different levels of human-AI collaborations.

Reflecting on everything NeurIPS 2023 threw at us, it’s pretty clear we’re in for an exciting ride in the AI and fintech space. The conference was a goldmine of conversations from the latest in foundational models to responsible AI. Looking ahead, we both further reinforced the desire to solve real problems for people with advanced technology that is approachable. Whether it’s about getting humans and AI to work better together or finding new ways to problem-solve with AI, let’s make it happen!

NeurIPS 2023 recap

Recapping the most compelling ideas and emerging research from our time at NeurIPS 2023, the world's premier AI conference.

Specific highlights from Arijit

Specific highlights from Shirley

Analyzing complex documents just got easier

Learnings from the lab: Querying S&P Global’s tabular data using LLMs