Kensho
Kensho
Free Trial
Kensho Logo

Extract

Bulk text, table and key-value extraction made easy.

Get Started

About Extract

Save time on document processing through automated text and table extraction

Simple optical character recognition (OCR) tools are outdated.

Whether you’re looking to make your financial documents machine readable, trying to join table data to your proprietary database, or looking for specific data points across multiple documents, Kensho Extract can help.

Hero

See for yourself

Add structure to unstructured documents.

Kensho Extract is a fundamental machine learning (ML) capability that allows users to access both text and tables in a simple-to-use format for further analysis and action.

Document being processed by Kensho Extract
Document section
Subtitle
Table title
Gross revenue Q2 2021:
$2,106
Net income H1 2021: $1,553
Consolidated Statements of Income (Unaudited) - USD ($)
shares in Millions, $ in Millions
3 Months Ended6 Months Ended
Jun. 30, 2021
Jun. 30, 2020
Jun. 30, 2021
Jun. 30, 2020
Income Statement [Abstract]    
Revenue$ 2,106$ 1,943$ 4,122$ 3,729
Expenses:    
Operating-related expenses5334931,0601,011
Selling and general expenses374295735609
Depreciation23194239
Amortization of intangibles22325361
Total expenses9528391,8901,720
Gain on dispositions(1)(2)(8)
Operating profit1,1541,1052,2342,017
Other income, net(22)(10)(29)(9)
Interest expense, net32406374
Income before taxes on income1,1441,0752,2001,952
Provision for taxes on income287233534421
Net income8578421,6661,531
Less: net income attributable to noncontrolling interests$ (59)(50)(113)(100)
Net income attributable to S&P Global Inc. $ 792$ 1,553$ 1,431
Net income:    
Basic (USD per share)$ 3.31$ 3.29$ 6.45$ 5.92
Diluted (USD per share)$ 3.30$ 3.28$ 6.42$ 5.90
Weighted-average number of common shares outstanding:    
Basic (shares)240.8240.9240.7241.5
Document being processed by Kensho Extract
Document being processed by Kensho Extract
Document section
Subtitle
Table title
Gross revenue Q2 2021:
$2,106
Net income H1 2021: $1,553
Consolidated Statements of Income (Unaudited) - USD ($)
shares in Millions, $ in Millions
3 Months Ended6 Months Ended
Jun. 30, 2021
Jun. 30, 2020
Jun. 30, 2021
Jun. 30, 2020
Income Statement [Abstract]    
Revenue$ 2,106$ 1,943$ 4,122$ 3,729
Expenses:    
Operating-related expenses5334931,0601,011
Selling and general expenses374295735609
Depreciation23194239
Amortization of intangibles22325361
Total expenses9528391,8901,720
Gain on dispositions(1)(2)(8)
Operating profit1,1541,1052,2342,017
Other income, net(22)(10)(29)(9)
Interest expense, net32406374
Income before taxes on income1,1441,0752,2001,952
Provision for taxes on income287233534421
Net income8578421,6661,531
Less: net income attributable to noncontrolling interests$ (59)(50)(113)(100)
Net income attributable to S&P Global Inc. $ 792$ 1,553$ 1,431
Net income:    
Basic (USD per share)$ 3.31$ 3.29$ 6.45$ 5.92
Diluted (USD per share)$ 3.30$ 3.28$ 6.42$ 5.90
Weighted-average number of common shares outstanding:    
Basic (shares)240.8240.9240.7241.5
Document being processed by Kensho Extract
Document being processed by Kensho Extract
Document section
Subtitle
Table title
Gross revenue Q2 2021:
$2,106
Net income H1 2021: $1,553
Consolidated Statements of Income (Unaudited) - USD ($)
shares in Millions, $ in Millions
3 Months Ended6 Months Ended
Jun. 30, 2021
Jun. 30, 2020
Jun. 30, 2021
Jun. 30, 2020
Income Statement [Abstract]    
Revenue$ 2,106$ 1,943$ 4,122$ 3,729
Expenses:    
Operating-related expenses5334931,0601,011
Selling and general expenses374295735609
Depreciation23194239
Amortization of intangibles22325361
Total expenses9528391,8901,720
Gain on dispositions(1)(2)(8)
Operating profit1,1541,1052,2342,017
Other income, net(22)(10)(29)(9)
Interest expense, net32406374
Income before taxes on income1,1441,0752,2001,952
Provision for taxes on income287233534421
Net income8578421,6661,531
Less: net income attributable to noncontrolling interests$ (59)(50)(113)(100)
Net income attributable to S&P Global Inc. $ 792$ 1,553$ 1,431
Net income:    
Basic (USD per share)$ 3.31$ 3.29$ 6.45$ 5.92
Diluted (USD per share)$ 3.30$ 3.28$ 6.42$ 5.90
Weighted-average number of common shares outstanding:    
Basic (shares)240.8240.9240.7241.5
Document being processed by Kensho Extract
Document being processed by Kensho Extract
Document section
Subtitle
Table title
Gross revenue Q2 2021:
$2,106
Net income H1 2021: $1,553
Consolidated Statements of Income (Unaudited) - USD ($)
shares in Millions, $ in Millions
3 Months Ended6 Months Ended
Jun. 30, 2021
Jun. 30, 2020
Jun. 30, 2021
Jun. 30, 2020
Income Statement [Abstract]    
Revenue$ 2,106$ 1,943$ 4,122$ 3,729
Expenses:    
Operating-related expenses5334931,0601,011
Selling and general expenses374295735609
Depreciation23194239
Amortization of intangibles22325361
Total expenses9528391,8901,720
Gain on dispositions(1)(2)(8)
Operating profit1,1541,1052,2342,017
Other income, net(22)(10)(29)(9)
Interest expense, net32406374
Income before taxes on income1,1441,0752,2001,952
Provision for taxes on income287233534421
Net income8578421,6661,531
Less: net income attributable to noncontrolling interests$ (59)(50)(113)(100)
Net income attributable to S&P Global Inc. $ 792$ 1,553$ 1,431
Net income:    
Basic (USD per share)$ 3.31$ 3.29$ 6.45$ 5.92
Diluted (USD per share)$ 3.30$ 3.28$ 6.42$ 5.90
Weighted-average number of common shares outstanding:    
Basic (shares)240.8240.9240.7241.5
Document being processed by Kensho Extract

Extract can be used independently or in conjunction with other services offered by Kensho. Combining our document layout analysis and table structure recognition models, Extract allows users to:

  • Quickly transform unstructured documents into a machine-readable format that organizes the headers, titles, paragraphs, tables and footers detected within the document in natural reading order.
  • Interpret messy page layouts, structuring text into cohesive paragraphs that can then be effectively analyzed and searched.
Get Started

Kensho Extract Features

Text Extraction icon
Text Extraction

Parse apart your documents and turn them into an easy-to-consume machine readable format.

Table Extraction icon
Table Extraction

Find and extract the tables you care about for easy analysis or database updates.

Key Value Extraction icon
Key Value Extraction

Find specific values in your documents to reduce your manual data operations efforts.

Kensho Extract Use Cases

Translation icon
Translation

Extract text and tables while maintaining page structure for easy translation to other languages.

Annotation icon
Annotation

Augment your documents by pairing Kensho Extract with our NERD and LINK services.

Disambiguation icon
Disambiguation

Find specific values in your documents to reduce your manual data operations efforts.

Natural Language Processing (NLP) icon
Natural Language Processing (NLP)

Make it easy to run your own NLP models on documents without having to deal with data extraction or structuring yourself.

Kensho Extract can be accessed in two ways:

API icon
API

A simple, easy-to-use API for fast, programatic, high-throughput extraction

UI icon
UI

An intuitive UI for your team to review extraction results and make corrections.

For Developers

API Guides & Tutorials

See our developer documentation for information on how to transcribe your files, stream audio, and more with the Extract APIs

Extract Developer Docs

See how we can help

The fundamental block for all of these initiatives is having access to clean, structured data.

Unfortunately, the data most companies have is neither structured nor clean - whether hidden in slide decks, pdfs, or in a database that has mutated a dozen times since inception, data is frequently all but inaccessible without investing a lot of incredibly valuable expert time in trying to understand the information and then attempt to structure it via liberal use of excel spreadsheets.

We feel your pain.

S&P Global employs thousands of trained analysts who process more than 5 million pages of financial content on a yearly basis. Luckily, all that effort has created one of the largest data sets of machine learning training data for corporate financial documents, allowing us to speed up our internal operations anywhere from 50% - 100% depending on the task at hand.

Contact Us
First Extract Project

Frequently Asked Questions

We support any type of document that contains readable text, though poorly formatted documents are likely to result in lower extraction quality. Kensho Extract performs best with PDF files.

Yes, we support extraction in any language, although performance will be better for left-to-right languages.

Yes! You can extract tables and text from documents in their correct reading order.

Extract’s ability to automate extraction based on topic really depends on the use case. In some instances and with some training, Kensho Extract will be able to identify and send back just the table(s) or section(s) that interest you, leaving out everything else. Reach out to us for help on your specific needs!