About Extract
Save time on document processing through automated text and table extraction
Simple optical character recognition (OCR) tools are outdated.
Whether you’re looking to make your financial documents machine readable, trying to join table data to your proprietary database, or looking for specific data points across multiple documents, Kensho Extract can help.
See for yourself
Add structure to unstructured documents.
Kensho Extract is a fundamental machine learning (ML) capability that allows users to access both text and tables in a simple-to-use format for further analysis and action.
$2,106
Consolidated Statements of Income (Unaudited) - USD ($) shares in Millions, $ in Millions | 3 Months Ended | 6 Months Ended | ||
---|---|---|---|---|
Jun. 30, 2021 | Jun. 30, 2020 | Jun. 30, 2021 | Jun. 30, 2020 | |
Income Statement [Abstract] | ||||
Revenue | $ 2,106 | $ 1,943 | $ 4,122 | $ 3,729 |
Expenses: | ||||
Operating-related expenses | 533 | 493 | 1,060 | 1,011 |
Selling and general expenses | 374 | 295 | 735 | 609 |
Depreciation | 23 | 19 | 42 | 39 |
Amortization of intangibles | 22 | 32 | 53 | 61 |
Total expenses | 952 | 839 | 1,890 | 1,720 |
Gain on dispositions | (1) | (2) | (8) | |
Operating profit | 1,154 | 1,105 | 2,234 | 2,017 |
Other income, net | (22) | (10) | (29) | (9) |
Interest expense, net | 32 | 40 | 63 | 74 |
Income before taxes on income | 1,144 | 1,075 | 2,200 | 1,952 |
Provision for taxes on income | 287 | 233 | 534 | 421 |
Net income | 857 | 842 | 1,666 | 1,531 |
Less: net income attributable to noncontrolling interests | $ (59) | (50) | (113) | (100) |
Net income attributable to S&P Global Inc. | $ 792 | $ 1,553 | $ 1,431 | |
Net income: | ||||
Basic (USD per share) | $ 3.31 | $ 3.29 | $ 6.45 | $ 5.92 |
Diluted (USD per share) | $ 3.30 | $ 3.28 | $ 6.42 | $ 5.90 |
Weighted-average number of common shares outstanding: | ||||
Basic (shares) | 240.8 | 240.9 | 240.7 | 241.5 |
$2,106
Consolidated Statements of Income (Unaudited) - USD ($) shares in Millions, $ in Millions | 3 Months Ended | 6 Months Ended | ||
---|---|---|---|---|
Jun. 30, 2021 | Jun. 30, 2020 | Jun. 30, 2021 | Jun. 30, 2020 | |
Income Statement [Abstract] | ||||
Revenue | $ 2,106 | $ 1,943 | $ 4,122 | $ 3,729 |
Expenses: | ||||
Operating-related expenses | 533 | 493 | 1,060 | 1,011 |
Selling and general expenses | 374 | 295 | 735 | 609 |
Depreciation | 23 | 19 | 42 | 39 |
Amortization of intangibles | 22 | 32 | 53 | 61 |
Total expenses | 952 | 839 | 1,890 | 1,720 |
Gain on dispositions | (1) | (2) | (8) | |
Operating profit | 1,154 | 1,105 | 2,234 | 2,017 |
Other income, net | (22) | (10) | (29) | (9) |
Interest expense, net | 32 | 40 | 63 | 74 |
Income before taxes on income | 1,144 | 1,075 | 2,200 | 1,952 |
Provision for taxes on income | 287 | 233 | 534 | 421 |
Net income | 857 | 842 | 1,666 | 1,531 |
Less: net income attributable to noncontrolling interests | $ (59) | (50) | (113) | (100) |
Net income attributable to S&P Global Inc. | $ 792 | $ 1,553 | $ 1,431 | |
Net income: | ||||
Basic (USD per share) | $ 3.31 | $ 3.29 | $ 6.45 | $ 5.92 |
Diluted (USD per share) | $ 3.30 | $ 3.28 | $ 6.42 | $ 5.90 |
Weighted-average number of common shares outstanding: | ||||
Basic (shares) | 240.8 | 240.9 | 240.7 | 241.5 |
$2,106
Consolidated Statements of Income (Unaudited) - USD ($) shares in Millions, $ in Millions | 3 Months Ended | 6 Months Ended | ||
---|---|---|---|---|
Jun. 30, 2021 | Jun. 30, 2020 | Jun. 30, 2021 | Jun. 30, 2020 | |
Income Statement [Abstract] | ||||
Revenue | $ 2,106 | $ 1,943 | $ 4,122 | $ 3,729 |
Expenses: | ||||
Operating-related expenses | 533 | 493 | 1,060 | 1,011 |
Selling and general expenses | 374 | 295 | 735 | 609 |
Depreciation | 23 | 19 | 42 | 39 |
Amortization of intangibles | 22 | 32 | 53 | 61 |
Total expenses | 952 | 839 | 1,890 | 1,720 |
Gain on dispositions | (1) | (2) | (8) | |
Operating profit | 1,154 | 1,105 | 2,234 | 2,017 |
Other income, net | (22) | (10) | (29) | (9) |
Interest expense, net | 32 | 40 | 63 | 74 |
Income before taxes on income | 1,144 | 1,075 | 2,200 | 1,952 |
Provision for taxes on income | 287 | 233 | 534 | 421 |
Net income | 857 | 842 | 1,666 | 1,531 |
Less: net income attributable to noncontrolling interests | $ (59) | (50) | (113) | (100) |
Net income attributable to S&P Global Inc. | $ 792 | $ 1,553 | $ 1,431 | |
Net income: | ||||
Basic (USD per share) | $ 3.31 | $ 3.29 | $ 6.45 | $ 5.92 |
Diluted (USD per share) | $ 3.30 | $ 3.28 | $ 6.42 | $ 5.90 |
Weighted-average number of common shares outstanding: | ||||
Basic (shares) | 240.8 | 240.9 | 240.7 | 241.5 |
$2,106
Consolidated Statements of Income (Unaudited) - USD ($) shares in Millions, $ in Millions | 3 Months Ended | 6 Months Ended | ||
---|---|---|---|---|
Jun. 30, 2021 | Jun. 30, 2020 | Jun. 30, 2021 | Jun. 30, 2020 | |
Income Statement [Abstract] | ||||
Revenue | $ 2,106 | $ 1,943 | $ 4,122 | $ 3,729 |
Expenses: | ||||
Operating-related expenses | 533 | 493 | 1,060 | 1,011 |
Selling and general expenses | 374 | 295 | 735 | 609 |
Depreciation | 23 | 19 | 42 | 39 |
Amortization of intangibles | 22 | 32 | 53 | 61 |
Total expenses | 952 | 839 | 1,890 | 1,720 |
Gain on dispositions | (1) | (2) | (8) | |
Operating profit | 1,154 | 1,105 | 2,234 | 2,017 |
Other income, net | (22) | (10) | (29) | (9) |
Interest expense, net | 32 | 40 | 63 | 74 |
Income before taxes on income | 1,144 | 1,075 | 2,200 | 1,952 |
Provision for taxes on income | 287 | 233 | 534 | 421 |
Net income | 857 | 842 | 1,666 | 1,531 |
Less: net income attributable to noncontrolling interests | $ (59) | (50) | (113) | (100) |
Net income attributable to S&P Global Inc. | $ 792 | $ 1,553 | $ 1,431 | |
Net income: | ||||
Basic (USD per share) | $ 3.31 | $ 3.29 | $ 6.45 | $ 5.92 |
Diluted (USD per share) | $ 3.30 | $ 3.28 | $ 6.42 | $ 5.90 |
Weighted-average number of common shares outstanding: | ||||
Basic (shares) | 240.8 | 240.9 | 240.7 | 241.5 |
Extract can be used independently or in conjunction with other services offered by Kensho. Combining our document layout analysis and table structure recognition models, Extract allows users to:
- Quickly transform unstructured documents into a machine-readable format that organizes the headers, titles, paragraphs, tables and footers detected within the document in natural reading order.
- Interpret messy page layouts, structuring text into cohesive paragraphs that can then be effectively analyzed and searched.
Kensho Extract Features
Parse apart your documents and turn them into an easy-to-consume machine readable format.
Find and extract the tables you care about for easy analysis or database updates.
Find specific values in your documents to reduce your manual data operations efforts.
Kensho Extract Use Cases
Extract text and tables while maintaining page structure for easy translation to other languages.
Augment your documents by pairing Kensho Extract with our NERD and LINK services.
Find specific values in your documents to reduce your manual data operations efforts.
Make it easy to run your own NLP models on documents without having to deal with data extraction or structuring yourself.
Kensho Extract can be accessed in two ways:
A simple, easy-to-use API for fast, programatic, high-throughput extraction
An intuitive UI for your team to review extraction results and make corrections.
For Developers
API Guides & Tutorials
See our developer documentation for information on how to transcribe your files, stream audio, and more with the Extract APIs
See how we can help
The fundamental block for all of these initiatives is having access to clean, structured data.
Unfortunately, the data most companies have is neither structured nor clean - whether hidden in slide decks, pdfs, or in a database that has mutated a dozen times since inception, data is frequently all but inaccessible without investing a lot of incredibly valuable expert time in trying to understand the information and then attempt to structure it via liberal use of excel spreadsheets.
We feel your pain.
S&P Global employs thousands of trained analysts who process more than 5 million pages of financial content on a yearly basis. Luckily, all that effort has created one of the largest data sets of machine learning training data for corporate financial documents, allowing us to speed up our internal operations anywhere from 50% - 100% depending on the task at hand.
Frequently Asked Questions
We support any type of document that contains readable text, though poorly formatted documents are likely to result in lower extraction quality. Kensho Extract performs best with PDF files.
Yes, we support extraction in any language, although performance will be better for left-to-right languages.
Yes! You can extract tables and text from documents in their correct reading order.
Extract’s ability to automate extraction based on topic really depends on the use case. In some instances and with some training, Kensho Extract will be able to identify and send back just the table(s) or section(s) that interest you, leaving out everything else. Reach out to us for help on your specific needs!