Enterprise Generative AI: 10+ Use Cases & Best Practices for Enterprises in 2024
Generative AI, also called GenAI, presents novel opportunities for enterprises compared to middle-market companies or startups including:
The opportunity to build your company’s models without exposing private data to 3rd parties
However, generative AI is a new technology with unique challenges for enterprises:
- Valuable proprietary data can be exposed which was stated by 36% of enterprises as a reason not to use commercial LLMs1
- Generative AI tools will create new services and solutions. Technology leaders can leverage them to enter new markets gaining market share at the expense of incumbents.
- Generative AI models, also called generative models, will bring new automation opportunities with the potential to increase customer satisfaction or reduce costs. Competitors can leverage them to get ahead.
- Reputational or operational risks due to generative models’ bias or hallucinations
Executives wonder how their organizations can reap the benefits of generative AI while overcoming these challenges. Below, we outline:
Generative AI use cases for large companies
Guidelines to leverage the full potential of generative AI solutions, including how to build and manage your company’s generative models.
How should enterprises leverage generative AI?
We charted a detailed path for businesses to leverage generative AI.
While most firms may not need to build their models, most large enterprises (i.e. Forbes Global 2000) are expected to build or optimize one or more generative AI models specific to their business requirements within the next few years. Finetuning can enable businesses to achieve these goals:
- Achieve higher accuracy by customizing model output in detail for their own domain
- Save costs. Customizable models with licenses permitting commercial use have been measured to be almost as accurate as proprietary models at significantly lower cost.2
- Reduce attack surface for their confidential data
Firms like Bloomberg are generating world-class performance by building their own generative AI tools leveraging internal data. 3
What are the guidelines for enterprise AI models?
At a minimum an enterprise generative AI model should be:
Trusted
Consistent
Most current LLMs can provide different outputs for the same input. This limits the reproducibility of testing which can lead to releasing models that are not sufficiently tested.
Controlled
Be hosted at an environment (on-prem or cloud) where enterprise can control the model at a granular level. The alternative is using online chat interfaces or APIs like OpenAI’s LLM APIs.
The disadvantage of relying on APIs is that the user may need to expose confidential proprietary data to the API owner. This increases the attack surface for proprietary data. Global leaders like Amazon and Samsung experienced data leaks of internal documents and valuable source code when their employees used ChatGPT.4 5
OpenAI later reversed its data retention policies and launched an enterprise offering.6 However, there are still risks to using cloud based GenAI systems. For example the API provider or bad actors working at the API provider may:
Access the enterprise’s confidential data and use it to improve their own solutions
Accidentally leak enterprise data
Explainable
Unfortunately, most generative AI models are not capable of explaining why they provide certain outputs. This limits their use as enterprise users that would like to base important decision making on AI powered assistants would like to know the data that drove such decisions.
Reliable
Hallucination (i.e. making up falsehoods) is a feature of LLMs and it is unlikely to be completely resolved. Enterprise genAI systems require the necessary processes and guardrails to ensure that harmful hallucinations are minimized or detected or identified by humans before they can harm enterprise operations.
Secure
Enterprise-wide models may have interfaces for external users. Bad actors can use techniques like prompt injection to have the model perform unintended actions or share confidential data.
Ethical
Ethically trained
Model should be trained on ethically sourced data where Intellectual Property (IP) belongs to the enterprise or its supplier and personal data is used with consent.
1- Generative AI IP issues, such as training data that includes copyrighted content where the copyright doesn’t belong to the model owner, can lead to unusable models and legal processes.
2- Use of personal information in training models can lead to compliance issues. For example, OpenAI’s ChatGPT needed to be disclose its data collection policies and allow users to remove their data after the Italian Data Protection Authority (Garante)’s concerns.7
Read generative AI copyright issues & best practices to learn more.
Fair
Bias in training data can impact model effectiveness.
Licensed
The enterprise need to have a commercial license to use the model. For example using models like Meta’s LLaMa have noncommercial licenses preventing their legal use in most use cases in a for-profit enterprise. Models with permissive licenses like Vicuna built on top of LLaMa also end up having noncommercial licenses since they leverage the LLaMa model.89
Sustainable
Training generative AI models from scratch is expensive and consumes significant amounts of energy, contributing to carbon emissions. Business leaders should be aware of the full cost of generative AI technology and identify ways to minimize its ecological and financial costs.
Enterprises can strive towards most of these guidelines and they exist on a continuum except the issues of licensing, ethical concerns and control.
- It is clear how to achieve correct licensing and to avoid ethical concerns but these are hard goals to achieve
- Achieving control requires firms to build their own foundation models however most businesses are not clear about how to achieve this
How can enterprises build foundation models?
There are 2 approaches to build your firms’ LLM infrastructure on a controlled environment.
1- Build Your Own Model (BYOM)
Allows world-class performance costing a few million $ including computing (1.3M GPU hours on 40GB A100 GPUs in case of BloombergGPT) and data science team costs.10
2- Improve an existing model
2.1- Fine-tuning is a cheaper machine learning technique for improving the performance of pre-trained large language models (LLMs) using selected datasets.
Instruction fine-tuning was previously done with large datasets but now it can be achieved with a small dataset (e.g. 1,000 curated prompts and responses in case of LIMA).11 The importance of a robust data collection approach optimizing data quality and quantity is highlighted in early commercial LLM fine-tuning experiments.12
Compute costs in research papers have been as low $100 while achieving close to world-class performance.13
Model fine-tuning is an emerging with domain with new approaches like Inference-Time Intervention (ITI), an approach to reduce model hallucinations, being published every week.14
2.2- Reinforcement Learning from Human Feedback (RLHF): A fine-tuned model can be further improved by human in the loop assessment. 15 16 17
2.3- Retrieval augmented generation (RAG) allows businesses to pass crucial information to models during generation time. Models can use this information to produce more accurate responses.
Given the high costs involved in BYOM, we recommend businesses to initially use optimized versions of existing models. Language model optimization an emerging domain with new approaches being developed on a weekly basis. Therefore businesses should be open to experimentation and be ready to change their approach.
Which models should enterprises use to train cost-effective foundation models?
Machine learning platforms released foundation models with commercial licenses relying mostly on text on the internet as the primary data source. These models can be used as base models to build enterprise large language models:
– Llama 2 by Meta comes with a commercial use license with some limitations.18
– Falcon LLM, developed by Technology Innovation Institute (TII) in Abu Dhabi, comes with a commercial license, and used to lead Hugging Face’s LLM benchmark among pretrained models as of September/2023.19 20
– Mistral 7B is the first model developed by the European generative AI startup Mistral. With its permissive license (i.e. Apache 2.0) that allows commercial use, it can be attractive for businesses looking for a smaller, easier-to-finetune model.21
– BLOOM by Huggingface with RAIL license which only restricts potentially harmful uses. 22
– Dolly 2.0 instruction-tuned by Databricks based on EleutherAI’s pythia model family.23
– Open source RWKV-4 “Raven” models24
– Eleuther AI Models25
What is the right tech stack for building large language models?
Generative AI is an artificial intelligence technology and large businesses have been building AI solutions for the past decade. Experience has shown that leveraging Machine Learning Operations (MLOps) platforms significantly accelerate model development efforts.
In addition to their MLOps platforms, enterprise organizations can rely on a growing list of Large Language Model Operations (LLMOps) tools and frameworks like Langchain, Semantic Kernel or watsonx.ai to customize and build their models, AI risk management tools like Nemo Guardrails.
In early days of new technologies, we recommend executives to prioritize open platforms to build future-proof systems. In emerging technologies, vendor lock-in is an important risk. Businesses can get stuck with outdated systems as rapid and seismic technology changes take place.
Finally, data infrastructure of a firm is among the most important underlying technologies for generative AI:
Vast amounts of internal data need to be organized, formatted.
Data quality and observability efforts should ensure that firms have access to high quality, unique, easily-usable datasets with clear metadata.
Synthetic data capabilities may be necessary for model training
How to evaluate large models’ performance?
Without measurement of effectiveness, the value of generative AI efforts can not be quantified. However, LLM evaluation is a difficult problem due to issues in benchmark datasets, benchmarks seeping into training data, inconsistency of human reviews and other factors.26 27.
We recommend an iterative approach that increases investment in evaluation as models get closer to be used in production:
– Use benchmark test scores to prepare shortlists. This is available publicly for a large number of open source models.28 29
– Rely on Elo scores,30 used in ranking players in zero-sum games like chess, compare the models to be selected. If there are higher performing models which are not available to be used (e.g. due to licensing or data security issues), they can be used to compare the responses of different models. 31 If such models are not available, domain experts can compare the accuracy of different models.
What are the alternatives to controlling models?
Enterprise organizations can leverage pre-trained and fine-tuned models from tech giants or AI companies (e.g. OpenAI) in cases where they are in one of these situations:
- Experimenting with data that doesn’t include sensitive information to prove a hypothesis
Not concerned about increasing attack surface of the input data
Confident that their inputs will not be intercepted by 3rd parties or stored
Confident that even if their inputs are stored, they are stored for a limited time and will not be leaked while stored
In such cases, technology teams can use APIs to access models at affordable costs per API call. They can use these approaches:
Zero-shot learning, also called prompt engineering, involves structuring the prompt to help improve the LLM output
Few-shot learning, also called in-context learning, involves adding examples before the prompt to improve response quality.32
This can also include chain-of-thought prompting.33
Retrieval augmented generation (RAG) can also be used with commercials models if the enterprise is content with the data security policies of the foundation model provider.
Fine-tuning is also available to further improve model performance of commercial models offered via APIs.34
What should enterprises do about generative AI before building their foundation models?
Building your enterprise model can take months since the steps below need to be completed. Each of these steps can take weeks to months, and they can not be fully parallelized:
- Data collection can take weeks to months. AI data collection services can accelerate this process by helping companies generate balanced, high-quality instruction datasets and other data for building or fine-tuning models. You can also work with data crowdsourcing platforms for more diverse datasets.
Hiring data scientists with LLM expertise or hiring consultants can take weeks to months.
Training and deployment
Integrating models to business processes and systems
Therefore, we recommend business leaders to encourage experimentation. GenAI requires a paradigm shift: Our understanding of machines need to evolve from senseless robots to co-creators. Organizations need to start working with GenAI to start this mindset shift. They need to educate their employees about the potential of generative AI and empower them to change how they work. As consultants like to say, the most important element in any transformation (including AI transformation) is people.
Teams can leverage existing APIs to automate processes in domains where value of confidential data is lower and system integration is easier. Example domains where teams can leverage GenAI to improve productivity and increase teams’ familiarity with generative AI without building own models:
New content creation and optimizing generated content for marketing campaigns
Code generation for front-end software
Conversational AI for customer engagement and support
There are tens of more generative AI applications
What are enterprise generative artificial intelligence use cases?
The web is full of B2C use cases such as writing emails with generative AI support that don’t require deep integration or specialized models. However, enterprise value of generative AI comes from enterprise AI applications listed below:
Common use cases
Enterprise Knowledge Management (EKM): While SMEs and mid-market firms do not have challenges in organizing their limited data, Fortune 500 or Global Forbes 2000 need enterprise knowledge management tools for numerous use cases. Generative AI can serve them. Applications include:
Insight extraction by tagging unstructured data like documents
Summarization of unstructured data
Enterprise search which goes further than keyword search taking into account relationships between words
Part of enterprise search includes answering employee questions about:
Company’s practices (e.g. HR policies)
Internal company data like sales forecasts
A combination of internal and external data. For instance: How would potential future sanctions targeting MLOps systems sales to our 3rd largest geographic market affect our corporate performance?
Larger organizations serve global customers and machine translation ability of LLMs are valuable in use cases like:
Website localization
Creating documentation like technical manuals at scale for all geographies
Social media listening targeting a global audience
Multilingual sentiment analysis
Industry specific applications
Most enterprise value is likely to come from using generative AI technologies for innovation in companies’ specific industries: This could be in the form of new products and services or new ways of working (e.g. process improvement with GenAI). Our lists of generative AI applications can serve as starting points:
FAQ
What is generative AI?
Generative AI includes text, image and audio output of artificial intelligence models which are also called large language models LLMs, language models, foundation models or generative AI models.
What are the examples of enterprise generative AI?
McKinsey’s Lilli AI leverages McKinsey’s proprietary data to answer consultant’s questions and cites its sources. McKinsey followed an LLM-agnostic approach and leverages multiple LLMs from Cohere and OpenAI in Lilli.
Walmart developed My Assistant generative AI assistant for its 50,000 non-store employees.
If you have other questions or need help in finding vendors, we can help:
External Links
- 1. “Beyond the Buzz: A Look at Large Language Models in Production“, Predibase. Accessed September 11, 2023
- 2. “Llama 2 is about as factually accurate as GPT-4 for summaries and is 30X cheaper“. Anyscale. August 23, 2023. Retrieved September 3, 2023.
- 3. “Introducing BloombergGPT, Bloomberg’s 50-billion parameter large language model, purpose-built from scratch for finance“. Bloomberg. March 30, 2023. Accessed May 24, 2023
- 4. “Amazon warns employees not to share confidential information with ChatGPT after seeing cases where its answer “closely matches” existing material from inside the company“. Insider. Jan 24, 2023, Accessed May 28, 2023
- 5. “Samsung Bans Staff’s AI Use After Spotting ChatGPT Data Leak“. Bloomberg. May 2, 2023. Accessed May 28, 2023
- 6. “Introducing ChatGPT Enterprise“, OpenAI, Accessed Sep 10, 2023
- 7. “OpenAI: ChatGPT back in Italy after meeting watchdog demands.”AP. April 28, 2023. Accessed May 24, 2023
- 8. “Introducing LLaMA: A foundational, 65-billion-parameter large language model“. Meta. February 24, 2023.
- 9. “Large Language Models for Commercial Use“. Truefoundry. Apr 27, 2023. Accessed May 24, 2023.
- 10. Wu S.; Irsoy O.; Lu S.; Dabravolski V.; Dredze M.; Gehrmann S.; Kambadur P.; Rosenberg D.; Mann G. “BloombergGPT: A Large Language Model for Finance“
- 11. Zhou C.; Liu P.; Xu P.; Iyer S.; Sun J.; Mao Y.; Ma X.; Efrat A.; Yu P.; Yu L.; Zhang S.; Ghosh G.; Lewis M.; Zettlemoyer L.; Levy O. “LIMA: Less Is More for Alignment“]
- 12. “Dataset Engineering for LLM finetuning“. Flowrite. Mar 28, 2023. Accessed on May 24,2023
- 13. Dettmers T.; Pagnoni A.; Holtzman A.; Zettlemoyer L. “QLoRA: Efficient Finetuning of Quantized LLMs“.
- 14. Li K.; Patel O.; Viegas F.; Pfister H.; Wattenberg M.,”Inference-Time Intervention: Eliciting Truthful Answers from a Language Model“.
- 15. Ouyang L.; Wu J.; Jiang X.; Almeida D.; Wainwright C. L.; Mishkin P.; Zhang C.; Agarwal S.; Slama K.; Ray A.; Schulman J.; Hilton J.; Kelton F.; Miller L.; Simens M.; Askell A.; Welinder P.; Christiano P.; Leike J.; Lowe R. “Training language models to follow instructions with human feedback“.
- 16. Jesse Mu. “Natural Language Processing with Deep Learning“.
- 17. Chip Huyen (May 2, 2023). “RLHF: Reinforcement Learning from Human Feedback“. Retrieved May 24, 2023.
- 18. “LLAMA 2 Community License Agreement“. Meta. Retrieved September 3, 2023.
- 19. “Open LLM Leaderboard“. Hugging Face. Retrieved September 10, 2023.
- 20. “Introducing Falcon 180B“. TII.Retrieved September 10, 2023.
- 21. “Mistral 7B“. Mistral. September 27, 2023. Retrieved October 1, 2023.
- 22. “bigscience/bloom“. Hugging Face. Retrieved May 24, 2023.
- 23. “Free Dolly“. Databricks. Retrieved May 24, 2023.
- 24. “BlinkDL/rwkv-4-raven“. Hugging Face. Retrieved May 24, 2023.
- 25. “Find all our models, codebases, and datasets“. EleutherAI. Retrieved May 24, 2023.
- 26. T. Liao, R. Taori, I. D. Raji, and L. Schmidt. “Are We Learning Yet? A Meta-Review of Evaluation Failures Across Machine Learning“. In Thirty-fifth Conference on Neural Information Processing Systems Datasets and Benchmarks Track (Round 2), 2021.
- 27. Dettmers T.; Pagnoni A.; Holtzman A.; Zettlemoyer L. “QLoRA: Efficient Finetuning of Quantized LLMs“.
- 28. “Open LLM Leaderboard“. Hugging Face. Retrieved May 28, 2023.
- 29. Zheng L.; Sheng Y.; Chiang W. L.; Zhang H.; Gonzalez J. E.; Stoica I. (May 03, 2023)”Chatbot Arena: Benchmarking LLMs in the Wild with Elo Ratings“
- 30. “Elo Rating System“. Wikipedia. Retrieved May 28, 2023.
- 31. Dettmers T.; Pagnoni A.; Holtzman A.; Zettlemoyer L. “QLoRA: Efficient Finetuning of Quantized LLMs“.
- 32. Brown T. B.; Mann B.; Ryder N.; Subbiah M.; Kaplan J.; Dhariwal P.; Neelakantan A.; Shyam P.; Sastry G.; Askell A.; Agarwal S.; Herbert-Voss A.; Krueger G.; Henighan T.; Child R.; Ramesh A.; Ziegler D. M.; Wu J.; Winter C.; Hesse C.; Chen M.; Sigler E.; Litwin M.; Gray S.; Chess B.; Clark J.; Berner C.; McCandlish S.; Radford A.; Sutskever I.; Amodei D. “Language Models are Few-Shot Learners“
- 33. Wei J.; Wang X.; Schuurmans D.; Bosma M.; Ichter B.; Xia F.; Chi E. H.; Le Q. V.; Zhou D. “Chain-of-Thought Prompting Elicits Reasoning in Large Language Models“
- 34. “Fine-tuning“. OpenAI. Retrieved May 24, 2023.
- 35. “The CEO’s Guide to the Generative AI Revolution“. BCG. March 7, 2023. Retrieved June 18, 2023.
Cem has been the principal analyst at AIMultiple since 2017. AIMultiple informs hundreds of thousands of businesses (as per similarWeb) including 60% of Fortune 500 every month.
Cem's work has been cited by leading global publications including Business Insider, Forbes, Washington Post, global firms like Deloitte, HPE, NGOs like World Economic Forum and supranational organizations like European Commission. You can see more reputable companies and media that referenced AIMultiple.
Throughout his career, Cem served as a tech consultant, tech buyer and tech entrepreneur. He advised businesses on their enterprise software, automation, cloud, AI / ML and other technology related decisions at McKinsey & Company and Altman Solon for more than a decade. He also published a McKinsey report on digitalization.
He led technology strategy and procurement of a telco while reporting to the CEO. He has also led commercial growth of deep tech company Hypatos that reached a 7 digit annual recurring revenue and a 9 digit valuation from 0 within 2 years. Cem's work in Hypatos was covered by leading technology publications like TechCrunch and Business Insider.
Cem regularly speaks at international technology conferences. He graduated from Bogazici University as a computer engineer and holds an MBA from Columbia Business School.
To stay up-to-date on B2B tech & accelerate your enterprise:
Follow on
Comments
Your email address will not be published. All fields are required.