Kensho NERD: Introducing People Linking

NERD's most-requested feature is here: the ability to recognize and disambiguate real people — not just companies — in financial documents.

A Walt Disney press release featuring organization entities in blue and people entities in pink

Until now, Kensho NERD has primarily focused on finding companies and other organizations within a given text. While this is incredibly useful for search and analysis, a natural next step is to look for other entity types, such as people. Financial documents like SEC filings and earnings call transcripts contain important people-centric information: executive changes, board announcements, Q&A between analysts and presenters, and more. By tagging people and linking them to structured data profiles in S&P Global’s CapitalIQ database, NERD empowers new modes of data discovery.

Personally, my favorite way to think about NERD is to ask what questions it allows me to answer. With NERD for people, we can form novel queries, such as:

  • Which executives get the most speaking time in their earnings calls?

  • Which analysts cover which companies?

  • Which analysts are asking about ESG?

  • Who is the most talked-about tech CEO?

  • What is the sentiment of a particular executive about a concept I’m interested in?

Naturally, this last analytic is made even easier to determine with Kensho Classify, NERD’s sister project that tags user-defined concepts in text.

How we solved people linking

Connecting people’s names, whether in databases or in text, to the actual person, has been a difficult challenge in the NLP space for two key reasons:

First, names of individuals take on many surface forms. Let’s consider Timothy Donald Cook, the current CEO of Apple. Or perhaps you know him as Tim Cook. Although in earnings calls, he’s often referenced simply as Tim or Mr. Cook. However, for more formal settings like SEC filings, Timothy D. Cook is preferred. All of these variations are linguistic representations of the same person, but it takes some disambiguation to link all of these disparate names to the Apple chief.

Second, there is a lot of overlap when it comes to both first and last names. With a reference to “Tim,” we find 13,000 potential matches in our knowledge base of professionals. There are only 3,000 people with the last name “Cook.” And when looking for the full name “Tim Cook,” there are still 17 options. We don’t always have a first and a last name when disambiguating people, but even when we do, there are still many choices.

So, how did we do it? The answer lies in one of the key features behind NERD: coherence. When trying to link a tricky name like “Mr. Cook,” NERD doesn’t just look at the text of the name; NERD uses the full context of the document to find the match that makes the most sense given the other entities present. Since NERD is best in class at linking companies and organizations, if we can successfully disambiguate Apple Inc. in a document, then the likelihood that Mr. Cook refers to Tim Cook rises dramatically. We choose the people entities that cohere best with the company entities — this significantly shrinks the space of people that we are considering as potential matches and allows NERD to find the right person with incredibly high precision (in other words, with very few false positives).

Power to the people

In a recent sample of 5,000 earnings call transcripts, NERD recognized over 22,000 people entities. When scaled up to our full pipeline of financial documents, we expect millions of new people annotations that can be used to enable better search and analysis over documents. NERD for People empowers users with a whole new category of data enrichment, and it’s available right now.

Previous
Previous

Databricks Data + AI Summit (DAIS) 2023 top themes

Next
Next

ML Ops explained: Q&A With Senior ML Ops Engineer Matthew Theisen