Artificial intelligence is disrupting industries with various use cases and content automation is one of those applications.For example, rather than writing thousands of different descriptions for their catalogue, retail and e-commerce companies are relying on Natural Language Generation (NLG) to convert structured data like product specs into a description that is easier to consume for humans. NLG is the technology behind text content automation with its capability to convert data into words, sentences, articles and even film scripts.
We highlighted all important aspects of NLG, including why it matters, how it works, challenges, applications & applicable areas.
What is Natural Language Generation?
Natural Language Generation (NLG), a subcategory of Natural Language Processing (NLP), is a software process that automatically transforms structured data into human-readable text. Using NLG, Businesses can generate thousands of pages of data-driven narratives in minutes using the right data in the right format.
NLG is a subcategory of content automation focused on text automation.
Why is Natural Language Generation important?
A Gartner report in 2019 predicts that “By 2022, 25% of enterprises will use some form of natural language generation technology.” Though most industry analyst estimates are not accurate, they tend to be directionally right. We also agree that the market is set to expand because
- available data is increasing and text is easier to digest than data and can help communicate data more effectively
- in the age of digitalization & AI; consumers expect personalization and NLG can provide it at scale
How does NLG work?
An automated text generation process involves 6 stages. For the sake of simplicity, we’ll explain each stage from an example of robot journalist news on a football match:
Content Determination
The limits of the content should be determined. The data often contains more information than necessary. In football news example, content regarding goals, cards, and penalties will be important for readers.
Data interpretation
The analyzed data is interpreted. Thanks to machine learning techniques, patterns can be recognized in the processed data. This is where data is put into context. For instance, information such as the winner of the match, goal scorers & assisters, minutes when goals are scored are identified in this stage.
Document planning
In this stage, the structures in the data are organized with the goal of creating a narrative structure and document plan. Football news generally starts with a paragraph that indicates the score of the game with a comment that describes the level of intensity and competitiveness in the game, then the writer reminds the pre-game standings of teams, describes other highlights of the game in the next paragraphs, and ends with player and coach interviews.
Sentence Aggregation
It is also called micro planning, and this process is about choosing the expressions and words in each sentence for the end-user. In other words, this stage is where different sentences are aggregated in context because of their relevance. For example, below first two sentences provide different meanings. However, if the second event occurs right before half time, then these two sentences can be aggregated like the third sentence:
“[X team] maintained their lead into halftime. “
“VAR overruled a decision to award [Y team]’s [Football player Z] a penalty after replay showed [Football player T]’s apparent kick didn’t connect.”
“[X team] maintained their lead into halftime after VAR overruled a decision to award [Y team]’s [Football player Z] a penalty after replay showed [Football player T]’s apparent kick didn’t connect.”
Grammaticalization
Grammaticalization stage makes sure that the whole report follows the correct grammatical form, spelling, and punctuation. This includes validation of actual text according to the rules of syntax, morphology, and orthography. For instance, football games are written in the past tense.
Language Implementation
This stage involves inputting data into templates and ensuring that the document is output in the right format and according to the preferences of the user.
What are the application areas of Natural Language Generation?
Since NLG aims to make sense of the data and create human-readable insights, it can be applied to all areas dealing with reporting, content creation, and content personalization.
Retail & Wholesale
NLG solutions can provide product descriptions and categorization for online shopping and e-commerce and help personalize customer communication via chatbots. Steven Morell, CRO of AX Semantics, is explaining how an e-commerce site can automate their product description writing process with AX Semantics‘ NLG tool.
Banking & Finance
The banking industry highly relies on data and insights for performance reporting. Additionally, profit and loss reports can be automated using NLG systems. NLG techniques can be used to support fintech chatbots that interact with customers for personal financial management advice.
Manufacturing
As IoT applications are implemented more widely in production sites, they generate a significant volume of data useful for performance improvement and maintenance. NLG can automate the communication of important findings such as IoT device status and maintenance reporting so employees can take action faster.
Media
NLG solutions can aid summarization and content creation. Especially sports and financial news (also called robot journalists) tend to follow similar templates, and text explaining such events can be easily created.
For more information on robot journalists and other AI applications in media, feel free to check our related article.
Insurance
NLG solutions can help to improve the communication of personalized plans for customers.
Transportation
Chatbots can deliver alerts about delays and schedules. NLG tools can be used to create personalized, easy to read travel plans.
Politics
Probably the most dangerous use case is using NLG solutions to spread personalized propaganda and misinformation. Unfortunately, this is the risk of making the current flow of political disinformation even more dangerous and personalized.
What are real-world content automation examples thanks to NLG?
Here are some real-world content automation examples using NLG:
- GPT-3 is the most recent news in content automation. Here is an article on “robots come in peace” which is written by GPT-3, OpenAI’s language generator. Though GPT-3 creates well-written narratives, it lacks in logical understanding, which makes its articles prone to error.
- In 2019, Springer published its first machine-generated book.
- Gmail’s Smart Compose provides recommendations on what should be typed next in an email. It also learns from your selections to enhance the recommendation algorithm for upcoming emails.
- The paraphrasing tool QuillBot that uses NLG
- All conversational AI/ chatbot applications are also examples of NLG.
News
- The Associated Press uses NLG to create corporate earnings reports automatically.
- The Washington Post is using their in-house automated storytelling technology, called Heliograf, to cover all Washington, D.C.-area high school football games every week.
- This is a showcase of a website with all football and ice hockey, in Sweden. All articles about every game, from kids’ games to the top leagues, are written by Lingmill’s text robot.
What are the challenges of content automation with NLG?
Data availability and quality
Automated contents require high-quality structured data. Therefore content automation fits well in areas such as finance, sports, or weather, where data providers make sure that data is accurate and reliable.
Originality & Writing quality
Natural language generation is limited to providing answers to prewritten questions by analyzing the given data. Algorithms cannot ask new questions, detect needs, recognize threats, solve problems, or give their thoughts and interpretation on topics such as social and policy change.
Thanks to machine learning, the quality of NLG content is likely to keep improving. However, auto-generated articles tend to be less original than human-written ones.
Bias
NLG algorithms rely on data and assumptions. Both may contain biases and errors. As a result, algorithms could produce prejudiced outcomes that were unintended and contain errors.
Feel free to check our article if you want to learn more on biases in AI algorithms, including types, examples, best practices & leading tools to reduce bias.
If you have questions on Natural Language Generation vendors, feel free to check our sortable, regularly updated list of NLG companies or contact us: