■ Research and innovationShaping the future through cutting-edge research and development
Research and experimentation are core to Kensho. We push the boundaries of what’s possible, and our teams are often first to explore, publish, and patent new approaches, advancing research and shaping how AI is built and deployed.
■ Who we areResearch runs through everything we build.
A dedicated R&D team leads our pure research, contributing to the field through peer-reviewed publications and collaborations with academic and industry partners. Applied research extends across Kensho's ML and engineering teams, where we build novel, practical solutions for our customers through rigorous research, feedback, and development.
■ Key focus areasCurrent research
-
We adapt and develop state-of-the-art methodologies to optimize how agents interact with data retrieval tools and how contextual queries are routed between multiple agents. Methods include dynamic prompting, query expansion, self-reflection, and intelligent routing across any number of datasets, which enable LLMs to effectively leverage S&P Global’s's AI-ready data. We also develop agentic architectures built to orchestrate complex workflows such as conducting research and generating long-form content.
-
We explore and develop cutting-edge strategies for unstructured and structured data retrieval, enabling agentic workflows atop complex financial data. We push unstructured data retrieval far beyond naive vector search RAG, leveraging structural information and relationships from our document processing toolkit to enrich data for complex data discovery and question answering at performant inference speeds. For structured and tabular data, we combine LLM-optimized tools with a proprietary Text-to-SQL framework. Across both, we apply strategies such as reinforcement learning, self-consistency, and self-reflection to optimize our agents for financial datasets to deliver sharper customer insights across complex data sources.
-
We research and develop multi-tiered evaluation frameworks and benchmarks for finance and business. Our evaluations assess models across a broad spectrum of tasks, from simple data retrieval and long-context QA to advanced multi-step reasoning and program synthesis. These frameworks support complex agentic products spanning tool usage, code and SQL generation, computer use, and report generation. To measure both end-to-end solution quality and individual component performance, we employ diverse strategies—ranging from AST parsing to LLM-juries calibrated by domain experts—while rigorously auditing data relevance and output consistency. Beyond applied metrics, we actively research and challenge the fundamentals of evaluation itself. We explore critical questions such as the reliability of LLM-as-a-judge, the impact of high-quality labels, methods for efficiently predicting model performance on unseen evaluations, and the real-world properties that current benchmarks fail to address. Ultimately, our comprehensive approach enables data-driven decisions and continuous model improvement.
-
We analyze and improve tokenization methods, which determine how all input data should be parsed and represented to a model. Tokenization naturally impacts a model's ability to process and understand data such as human language text, and remains an under-explored LLM research frontier. We develop fundamental tokenization capabilities we can use to better present data to LLMs, including our rich, domain-specific business and finance data. This work includes drastically improving the speed and efficiency of training and using tokenizers, discovering LLM limitations with processing numeric data in various languages and scripts, moving beyond the restrictions of the ubiquitous BPE tokenizer, and building our own enhanced tokenizer.
-
We explore how to best harness the structural and relational information within and across documents, enabling models to move beyond surface-level text and achieve richer document understanding. Our research across document components and their relationships informs ingestion strategies that ensure both structural and semantic coherence as well as providing better structured more usable content downstream. This coherence and richer representation unlocks greater performance and even new capabilities compared with traditional extraction. Models need this context and structure to answer targeted questions, surface broader insights across wider knowledge bases, and ground validated assertions against a document corpus.
-
We are on a mission to extract and structure all information contained within documents, from plain text to the most complex visual elements. Our proprietary models extract figure data from document images and graphs, unlocking highly accurate quantitative question-answering capabilities. In our pursuit of comprehensive document AI, we are also building the next generation of table extraction models including Vision Language Models (VLMs) to parse tables in their natural context within page images.
■ PublicationsAdvancing industry knowledge
Our research regularly appears in top conferences, academic journals, and leading NLP textbooks. These publications reflect a commitment to advancing industry knowledge while grounding innovation in practical applications. Explore our full list of publications.
| ▪ Apr. 2026 | Tokenization | Faster Superword Tokenization | Craig W. Schmidt, Chris Tanner, Yuval Pinter |
| ▪ Apr. 2026 | Evaluation | Cost-Efficient Estimation of General Abilities Across Benchmarks | Michael Krumdick, Adam Wiemerslage, Seth Ebner, Charles Lovering, Chris Tanner |
| ▪ Apr. 2026 | Evaluation | FrontierFinance: A Long-Horizon Computer-Use Benchmark of Real-World Financial Tasks | Michael Krumdick, Varshini Reddy, Shivani Chaudhary, William Day, Maarij Ahmed, Hayan Haqqi, Muhammad Ahsen Fahim, Hanzallah Amjad, Ahmad Orakzai, Aqsa Gul, Chris Tanner |
| ▪ Apr. 2026 | Evaluation | No Free Labels: Limitations of LLM-as-a-Judge without Human Grounding | Michael Krumdick, Charles Lovering, Varshini Reddy, Seth Ebner, Chris Tanner |
| ▪ ACL 2026 | Evaluation | The Effect of Scripts and Formats on LLM Numeracy | Varshini Reddy, Craig W. Schmidt, Seth Ebner, Adam Wiemerslage, Yuval Pinter, Chris Tanner |
| ▪ ACL 2026 | Document Understanding | On Finding Inconsistencies in Documents | Charles J. Lovering, Seth Ebner, Brandon Smock, Michael Krumdick, Saad Rabbani, Ahmed Muhammad, Varshini Reddy, Chris Tanner |
| ▪ Dec. 2025 | Extraction | PubTables-v2: A new large-scale dataset for full-page and multi-page table extraction | Brandon Smock, Valerie Faucon-Morin, Max Sokolov, Libin Liang, Tayyibah Khanam, Maury Courtland |
| ▪ NeurIPS 2025 | Evaluation | Complexity Scaling Laws for Neural Models using Combinatorial Optimization | Lowell Weissman, Michael Krumdick, A. Lynn Abbott |
| ▪ NeurIPS 2025 | Evaluation | BLEUBERI: BLEU is a surprisingly effective reward for instruction following | Yapei Chang, Yekyung Kim, Michael Krumdick, Amir Zadeh, Chuan Lie, Chris Tanner, Mohit Iyyer |
| ▪ COLM 2025 | Tokenization | Boundless Byte Encoding: Breaking the Pre-Tokenization Barrier | Craig Schmidt, Varshini Reddy, Chris Tanner, Yuval Pinter |
| ▪ UIST 2025 | Evaluation | When Context Grows, So Does the Challenge: Human Oversight in LLM Evaluation of Financial Tables | Arijit Sehanobish, Shirley Anderson, Guillaume Michel, Mike Arov |
| ▪ Interspeech 2025 | NLP | SPGISpeech 2.0: Transcribed multi-speaker financial audio for speaker-tagged transcription | Raymond Grossman, Taejin Park, Kunal Dhawan, Andrew Titus, Sophia Zhi, Yulia Shchadilova, Weiqing Wang, Jagadeesh Balam, Boris Ginsburg |
| ▪ ICML 2025 | Tokenization | Entropy-Driven Pre-tokenization for Byte Pair Encoding | Yifan Hu, Frank Liang, Dachuan Zhao, Jonathan Geuter, Varshini Reddy, Craig W Schmidt, Chris Tanner |
| ▪ ICML 2025 | Tokenization | How Much is Enough? The Diminishing Returns of Tokenization Training Data | Varshini Reddy, Craig Schmidt, Yuval Pinter, Chris Tanner |
| ▪ ICLR 2025 | GenAI | On-Device Watermarking: A Socio-Technical Imperative For Authenticity In The Age of Generative AI | Houssam Kherraz |
| ▪ IUI/HCI 2025 | GenAI | Generative AI Interface Design Considerations for Private Equity | Shirley Anderson, Yuanfei Zhao |
| ▪ ACL 2025 | Evaluation | Language Probability Models are Not Calibrated in Numerical Contexts | Charles Lovering, Michael Krumdick, Viet Dac Lai, Seth Ebner, Nilesh Kumar, Varshini Reddy, Rik Koncel-Kedziorski, Chris Tanner |
| ▪ EMNLP 2025 | Evaluation | SEC-QA: A Systematic Evaluation Corpus for Financial QA | Viet Dac Lai, Michael Krumdick, Charles Lovering, Varshini Reddy, Craig Schmidt, Chris Tanner |
| ▪ EMNLP 2024 | Tokenization | Tokenization is More Than Compression | Craig W Schmidt, Varshini Reddy, Haoran Zhang, Alec Alameddine, Omri Uzan, Yuval Pinter, Chris Tanner |
| ▪ EMNLP 2024 | Document Understanding | An Analysis of Multilingual FActScore | Vu Trong Kim, Michael Krumdick, Varshini Reddy, Franck Dernoncourt, Viet Dac Lai |
| ▪ ACL 2024 | Tokenization | Greed is All You Need: An Evaluation of Tokenizer Inference Methods | Omri Uzan, Craig W Schmidt, Chris Tanner, Yuval Pinter |
| ▪ ACL 2024 | Evaluation | DocFinQA: A Long-Context Financial Reasoning Dataset | Varshini Reddy, Rik Koncel-Kedziorski, Viet Dac Lai, Michael Krumdick, Charles Lovering, Chris Tanner |
| ▪ ACL 2024 | Evaluation | BizBench: A Quantitative Reasoning Benchmark for Business and Finance | Michael Krumdick, Rik Koncel-Kedziorski, Viet Dac Lai, Varshini Reddy, Charles Lovering, Chris Tanner |
| ▪ NAACL 2024 | NLP | MCECR: A Novel Dataset for Multilingual Cross-Document Event Coreference Resolution | Amir Pouran Ben Veyseh, Viet Dac Lai, Chien Nguyen, Franck Dernoncourt, Thien Nguyen |
| ▪ LREC-Coling 2024 | NLP | CulturaX: A Cleaned, Enormous, and Multilingual Dataset for Large Language Models in 167 Languages | Thuat Nguyen, Chien Van Nguyen, Viet Dac Lai, Hieu Man, Nghia Trung Ngo, Franck Dernoncourt, Ryan A. Rossi, Thien Huu Nguyen |
| ▪ LREC-Coling 2024 | NLP | CAMAL: A Novel Dataset for Multi-label Conversational Argument Move Analysis | Viet Dac Lai, Duy Ngoc Pham, Jonathan Steinberg, Jamie Mikeska, Thien Huu Nguyen |
| ▪ NCME 2024 | ML | Using Machine Learning to Detect Student Learning Levels along a Learning Progression | Duy Pham, Viet Dac Lai |
| ▪ ICLR 2024 | ML | Scalable Neural Network Kernels | Arijit Sehanobish, Krzysztof Choromanski, Yunfan Zhao, Avinava Dubey, Valerii Likhosherstov |
| ▪ ACL 2023 | ML | The economic trade-offs of large language models: A case study | Kristen Howell, Gwen Christian, Pavel Fomitchov, Gitit Kehat, Julianne Marzulla, Leanne Rolston, Jadin Tredup, Ilana Zimmerman, Ethan Selfridge, Joseph Bradley |
| ▪ ACL 2023 | NLP | Learning Answer Generation using Supervision from Automatic Question Answering Evaluators | Matteo Gabburo, Siddhant Garg, Rik Koncel-Kedziorski, Alessandro Moschitti |
| ▪ ICDAR 2023 | Extraction | GriTS: Grid Table Similarity Metric for Table Structure Recognition | Brandon Smock, Rohith Pesala, Robin Abraham |
| ▪ ICDAR 2023 | Extraction | Aligning benchmark datasets for table structure recognition | Brandon Smock, Rohith Pesala, Robin Abraham |
| ▪ ICDAR 2023 | Document Understanding | A Graphical Approach to Document Layout Analysis | Jilin Wang, Michael Krumdick, Baojia Tong, Hamima Halim, Maxim Sokolov, Vadym Barda, Delphine Vendryes, Chris Tanner |
| ▪ EACL 2023 | NLP | What happens before and after: Multi-Event Commonsense in Event Coreference Resolution | Sahithya Ravi, Chris Tanner, Raymond Ng, Vered Shwartz |
| ▪ ICML 2023 | ML | Efficient Graph Field Integrators Meet Point Clouds | Krzysztof Choromanski, Arijit Sehanobish, Han Lin, Yunfan Zhao, Eli Berger, Tetiana Parshakova, Alvin Pan, David Watkins, Tianyi Zhang, Valerii Likhosherstov, Somnath Basu Roy Chowdhury, Avinava Dubey, Deepali Jain, Tamas Sarlos, Snigdha Chaturvedi, Adrian Weller |
| ▪ Interspeech 2023 | ML | Boosting Punctuation Restoration with Data Generation and Reinforcement Learning | Viet Dac Lai, Abel Salinas, Hao Tan, Trung Bui, Quan Tran, Seunghyun Yoon, Hanieh Deilamsalehy, Franck Dernoncourt, Thien Huu Nguyen |
| ▪ ICLR 2023 | ML | Mask Conditional Synthetic Satellite Imagery | Van Anh Le, Varshini Reddy, Zixi Chen, Mengyuan Li, Xinran Tang, Anthony Ortiz, Simone Fobi Nsutezo, Caleb Robinson |
■ DatasetsBetter data means better AI
We build high-quality datasets to advance AI and ML. Some are open-sourced to support the research community. Others underpin Kensho's own models and products.
SPGISpeech 2.0
A large-scale dataset containing thousands of hours of professionally transcribed and formatted financial audio for transcription, acoustic modeling, and ASR.
PubTables-v2
A first-of-its-kind dataset developed to advance document AI by empowering end-to-end table extraction tasks.
Finance Fundamentals
A collection of datasets featuring quantitative reasoning benchmarks, financial domain knowledge, and document-based questions to evaluate and train LLMs.
FIND
A document benchmarking dataset enabling models to detect, describe, and provide evidence of inconsistencies in long, technical, and complex documents.
■ Research to productFrom research to real-world solutions
Kensho turns AI research into the products S&P Global and its customers rely on every day. What starts in our labs ends up powering decisions across global markets.