Unstructured data such as videos, word documents or pdf files, images, and audio constitute a big portion of enterprise data. However, it is difficult to work with unstructured data and gain insights from it. Most AI and ML algorithms consume structured training data during model building. To be able to process unstructured text documents, AI systems need to perform information extraction. There are three main sub-tasks to extract information from any given unstructured data. They are:
- Named entity recognition
- Named entity linking
- Relation extraction
In this research, our focus is Entity Linking. Entity Linking (a.k.a named entity linking) is a primary NLP task that can help you overcome these challenges. Entity linking assigns a unique identity to each entity in a given document identified by named entity recognition. It links each entity with its corresponding description in a knowledge base.
What is the difference between entity linking and named entity recognition?
Both are used to extract information from text documents. NER identifies and classifies named entities presented in a text. It automatically categorizes entities into predefined categories such as organization, person, and so on. For example, NER identifies Alfred Jones as a named entity and assigns it to the “Person” category, but we don’t know which Alfred Jones is mentioned in the text. Entity linking can tell us exactly which Alfred Jones is mentioned in the text document. It links entities presented in the text to their descriptions in a knowledge base.
Semantic entity-based search benefits from entity linking. Unlike traditional keyword-based search, semantic entity-based search assists users in finding what they are looking for. It takes into consideration the searcher’s intent, tries to understand the relationship between words, and query context. For example, if you search for the word “bass”, a traditional keyword-based search could be bass (a type of fish) or bass ( a kind of musical instrument). However, semantic entity-based search gives you exactly what you want because it understands the query context and gives you the most accurate result.
For example, Google seeks to understand users’ intentions based on their search queries. It distinguishes between different entities based on people, dates, organizations, or places. Users’ search history, location, and global search history on a topic help Google deliver more accurate and relevant search results to users ( see example 1).
To understand users’ intentions, Google uses a knowledge base, which is an extensive database of public information. It helps to establish connections between a particular entity and another. However, the dynamic nature of the knowledge base is one of the challenges in information retrieval. NER may process a new text document and extract new entities. Meanwhile, the knowledge base should be updated so that each entity in the text document is linked to its corresponding entity in the knowledge base. Otherwise, failing to integrate updates into the knowledge base will result in missing entities.
Example 1: Semantic entity-based search
To recommend new articles that may be of interest to the user, the general content of a text document is analyzed through the categorization of topics. Entity linking aids in the analysis of any text content.
For example, Twitter analyzes your daily activities, interactions with other users, and collects information from your most recent tweets. It searches Twitter users for you based on mutual friends or your topics of interest and recommends trends to you.
The potential problem in analyzing content for entity linking is name variants. The same person’s name can be mentioned in different ways in the text document, but it is considered equivalent to that name. For example, Adam Jeff Wilson’s name can also be mentioned or spelled in different forms such as AW, or Adam Wilson. In the text document, these multiple name forms are considered equivalent regardless of how they are used. In order to avoid any potential issues, it is necessary to consider different name variants, and broaden name queries.
Question Answering Systems
Unlike search engines, Q&A systems provide more specific answers to users’ questions. For example, in order to answer the question “when is Morrissey’s birthday?” a question answering system must first determine which Morrissey is mentioned in the text. In this example, the entity is Morrissey, and entity linking finds the corresponding information from the knowledge base for Morrissey query in order to answer the user’s question as accurately as possible.
One potential problem with Q&A systems is entity ambiguity. It occurs when the same entity is used to refer to multiple meanings. In a text document, for example, Paris could be the capital of France or Paris Hilton.
Assume you have two names: Alfred Smith Jones and Alfred James Jones. Person names can be recorded and stored in a knowledge base by surname and name. In this case, the two Alfreds will be recorded as Jones Alfred and represented as the same person.
If you have any questions about entity linking, please contact us:
Next to Read
Your email address will not be published. All fields are required.