Generative AI, also called GenAI, presents novel opportunities for enterprises compared to middle-market companies or startups including:
The opportunity to build your company’s models without exposing private data to 3rd parties
However, generative AI is a new technology with unique challenges for enterprises:
- Valuable proprietary data can be exposed which was stated by 36% of enterprises as a reason not to use commercial LLMs1
- Generative AI tools will create new services and solutions. Technology leaders can leverage them to enter new markets gaining market share at the expense of incumbents.
- Generative AI models, also called generative models, will bring new automation opportunities with the potential to increase customer satisfaction or reduce costs. Competitors can leverage them to get ahead.
- Reputational or operational risks due to generative models’ bias or hallucinations
Executives wonder how their organizations can reap the benefits of generative AI while overcoming these challenges. Below, we outline:
Generative AI use cases for large companies
Guidelines to leverage the full potential of generative AI solutions, including how to build and manage your company’s generative models.
How should enterprises leverage generative AI?
We charted a detailed path for businesses to leverage generative AI.
While most firms may not need to build their models, most large enterprises (i.e. Forbes Global 2000) are expected to build or optimize one or more generative AI models specific to their business requirements within the next few years. Finetuning can enable businesses to achieve these goals:
- Achieve higher accuracy by customizing model output in detail for their own domain
- Save costs. Customizable models with licenses permitting commercial use have been measured to be almost as accurate as proprietary models at significantly lower cost.2
- Reduce attack surface for their confidential data
Firms like Bloomberg are generating world-class performance by building their own generative AI tools leveraging internal data. 3
What are the guidelines for enterprise AI models?

At a minimum an enterprise generative AI model should be:
Trusted
Consistent
Most current LLMs can provide different outputs for the same input. This limits the reproducibility of testing which can lead to releasing models that are not sufficiently tested.
Controlled
Be hosted at an environment (on-prem or cloud) where enterprise can control the model at a granular level. The alternative is using online chat interfaces or APIs like OpenAI’s LLM APIs.
The disadvantage of relying on APIs is that the user may need to expose confidential proprietary data to the API owner. This increases the attack surface for proprietary data. Global leaders like Amazon and Samsung experienced data leaks of internal documents and valuable source code when their employees used ChatGPT.4 5
OpenAI later reversed its data retention policies and launched an enterprise offering.6 However, there are still risks to using cloud based GenAI systems. For example the API provider or bad actors working at the API provider may:
Access the enterprise’s confidential data and use it to improve their own solutions
Accidentally leak enterprise data
Explainable
Unfortunately, most generative AI models are not capable of explaining why they provide certain outputs. This limits their use as enterprise users that would like to base important decision making on AI powered assistants would like to know the data that drove such decisions. XAI for LLMs is still an area of research.
Reliable
Hallucination (i.e. making up falsehoods) is a feature of LLMs and it is unlikely to be completely resolved. Enterprise genAI systems require the necessary processes and guardrails to ensure that harmful hallucinations are minimized or detected or identified by humans before they can harm enterprise operations.
Secure
Enterprise-wide models may have interfaces for external users. Bad actors can use techniques like prompt injection to have the model perform unintended actions or share confidential data.
Ethical
Ethically trained
Model should be trained on ethically sourced data where Intellectual Property (IP) belongs to the enterprise or its supplier and personal data is used with consent.
1- Generative AI IP issues, such as training data that includes copyrighted content where the copyright doesn’t belong to the model owner, can lead to unusable models and legal processes.
2- Use of personal information in training models can lead to compliance issues. For example, OpenAI’s ChatGPT needed to be disclose its data collection policies and allow users to remove their data after the Italian Data Protection Authority (Garante)’s concerns.7
Read generative AI copyright issues and solutions to learn more.
Fair
Bias in training data can impact model effectiveness.
Licensed
The enterprise need to have a commercial license to use the model. For example using models like Meta’s LLaMa have noncommercial licenses preventing their legal use in most use cases in a for-profit enterprise. Models with permissive licenses like Vicuna built on top of LLaMa also end up having noncommercial licenses since they leverage the LLaMa model.8 9
Sustainable
Training generative AI models from scratch is expensive and consumes significant amounts of energy, contributing to carbon emissions. Business leaders should be aware of the full cost of generative AI technology and identify ways to minimize its ecological and financial costs.
Enterprises can strive towards most of these guidelines and they exist on a continuum except the issues of licensing, ethical concerns and control.
- It is clear how to achieve correct licensing and to avoid ethical concerns but these are hard goals to achieve
- Achieving control requires firms to build their own foundation models however most businesses are not clear about how to achieve this
How can enterprises build foundation models?
There are 2 approaches to build your firms’ LLM infrastructure on a controlled environment.
1- Build Your Own Model (BYOM)
Allows world-class performance costing a few million $ including computing (1.3M GPU hours on 40GB A100 GPUs in case of BloombergGPT) and data science team costs.10
2- Improve an existing model
2.1- Fine-tuning is a cheaper machine learning technique for improving the performance of pre-trained large language models (LLMs) using selected datasets.
Instruction fine-tuning was previously done with large datasets but now it can be achieved with a small dataset (e.g. 1,000 curated prompts and responses in case of LIMA).11 The importance of a robust data collection approach optimizing data quality and quantity is highlighted in early commercial LLM fine-tuning experiments.12
Compute costs in research papers have been as low $100 while achieving close to world-class performance.13
Model fine-tuning is an emerging with domain with new approaches like Inference-Time Intervention (ITI), an approach to reduce model hallucinations, being published every week.14
2.2- Reinforcement Learning from Human Feedback (RLHF): A fine-tuned model can be further improved by human in the loop assessment. 15 16 17
2.3- Retrieval augmented generation (RAG) allows businesses to pass crucial information to models during generation time. Models can use this information to produce more accurate responses.
Given the high costs involved in BYOM, we recommend businesses to initially use optimized versions of existing models. Language model optimization an emerging domain with new approaches being developed on a weekly basis. Therefore businesses should be open to experimentation and be ready to change their approach.
Which models should enterprises use to train cost-effective foundation models?
Machine learning platforms released foundation models with commercial licenses relying mostly on text on the internet as the primary data source. These models can be used as base models to build enterprise large language models:
– DeepSeek-R1 is the highest performing reasoning model with a permissive MIT license. However, DeepSeek team may have used OpenAI’s models in its training process which, if proven, could require a change in its licensing model in the US.18
– Llama 3 by Meta comes with a commercial use license with some limitations for very large businesses.19
– Mistral 8x22B is the latest open-weights model developed by the European generative AI startup Mistral. With its permissive license (i.e. Apache 2.0) that allows commercial use without specific restrictions for large businesses, it can be attractive for all businesses.20 Mistral also provides models like Mistral Large but that model has more restrictive licensing.21
– IBM’s Granite models are high performing according to code generation benchmarks and are available with the permissive Apache 2.0 license.22
– DBRX is an open-weights model developed by the data platform Databricks. It comes with a commercial license with similar limitations to Meta’s models. Limitations apply to businesses serving more than 700M active users. 23
– X released its 314 billion parameter Grok-1 model with the permissive Apache 2.0 license but did not release Grok-2 model as open source yet.24
What is the right tech stack for building large language models?
Generative AI is an artificial intelligence technology and large businesses have been building AI solutions for the past decade. Experience has shown that leveraging Machine Learning Operations (MLOps) platforms significantly accelerate model development efforts.
In addition to their MLOps platforms, enterprise organizations can rely on a growing list of Large Language Model Operations (LLMOps) tools and frameworks like Langchain, Semantic Kernel or watsonx.ai to customize and build their models, AI risk management tools like Nemo Guardrails.
In early days of new technologies, we recommend executives to prioritize open platforms to build future-proof systems. In emerging technologies, vendor lock-in is an important risk. Businesses can get stuck with outdated systems as rapid and seismic technology changes take place.
Finally, data infrastructure of a firm is among the most important underlying technologies for generative AI:
Vast amounts of internal data need to be organized, formatted.
Data quality and observability efforts should ensure that firms have access to high quality, unique, easily-usable datasets with clear metadata.
Synthetic data capabilities may be necessary for model training
How to evaluate large models’ performance?
Without measurement of effectiveness, the value of generative AI efforts can not be quantified. However, LLM evaluation is a difficult problem due to issues in benchmark datasets, benchmarks seeping into training data, inconsistency of human reviews and other factors.25 26 .
We recommend an iterative approach that increases investment in evaluation as models get closer to be used in production:
– Use benchmark test scores to prepare shortlists. This is available publicly for a large number of open source models.27 28
– Rely on Elo scores,29 used in ranking players in zero-sum games like chess, compare the models to be selected. If there are higher performing models which are not available to be used (e.g. due to licensing or data security issues), they can be used to compare the responses of different models. 30 If such models are not available, domain experts can compare the accuracy of different models.
What are the alternatives to controlling models?
Enterprise organizations can leverage pre-trained and fine-tuned models from tech giants or AI companies (e.g. OpenAI) in cases where they are in one of these situations:
- Experimenting with data that doesn’t include sensitive information to prove a hypothesis
Not concerned about increasing attack surface of the input data
Confident that their inputs will not be intercepted by 3rd parties or stored
Confident that even if their inputs are stored, they are stored for a limited time and will not be leaked while stored
In such cases, technology teams can use APIs to access models at affordable costs per API call. They can use these approaches:
Zero-shot learning, also called prompt engineering, involves structuring the prompt to help improve the LLM output
Few-shot learning, also called in-context learning, involves adding examples before the prompt to improve response quality.31

This can also include chain-of-thought prompting.32
Retrieval augmented generation (RAG) can also be used with commercials models if the enterprise is content with the data security policies of the foundation model provider.
Fine-tuning is also available to further improve model performance of commercial models offered via APIs.33
What should enterprises do about generative AI before building their foundation models?
Building your enterprise model can take months since the steps below need to be completed. Each of these steps can take weeks to months, and they can not be fully parallelized:
- Data collection can take weeks to months. AI data collection services can accelerate this process by helping companies generate balanced, high-quality instruction datasets and other data for building or fine-tuning models. You can also work with data crowdsourcing platforms for more diverse datasets.
Hiring data scientists with LLM expertise or hiring consultants can take weeks to months.
Training and deployment
Integrating models to business processes and systems
We recommend business leaders encourage experimentation with GenAI. It requires a paradigm shift: We must view machines not as senseless robots but as co-creators. Organizations should start using GenAI to foster this mindset shift, educating employees about its potential and empowering them to change how they work. As consultants often say, the key to any transformation—AI transformation included—is people.

Teams can leverage existing APIs to automate processes in domains where value of confidential data is lower and system integration is easier. Example domains where teams can leverage GenAI to improve productivity and increase teams’ familiarity with generative AI without building own models:
New content creation and optimizing generated content for marketing campaigns
Code generation for front-end software
Conversational AI for customer engagement and support
There are tens of more generative AI applications
What are enterprise generative artificial intelligence use cases?
The web is full of B2C use cases such as writing emails with generative AI support that don’t require deep integration or specialized models. However, enterprise value of generative AI comes from enterprise AI applications listed below:
Common use cases
Enterprise Knowledge Management (EKM): While SMEs and mid-market firms do not have challenges in organizing their limited data, Fortune 500 or Global Forbes 2000 need enterprise knowledge management tools for numerous use cases. Generative AI can serve them. Applications include:
Insight extraction by tagging unstructured data like documents
Summarization of unstructured data
Enterprise search which goes further than keyword search taking into account relationships between words
Part of enterprise search includes answering employee questions about:
Company’s practices (e.g. HR policies)
Internal company data like sales forecasts
A combination of internal and external data. For instance: How would potential future sanctions targeting MLOps systems sales to our 3rd largest geographic market affect our corporate performance?
Larger organizations serve global customers and machine translation ability of LLMs are valuable in use cases like:
Website localization
Creating documentation like technical manuals at scale for all geographies
Social media listening targeting a global audience
Multilingual sentiment analysis
Industry specific applications
Most enterprise value is likely to come from using generative AI technologies for innovation in companies’ specific industries: This could be in the form of new products and services or new ways of working (e.g. process improvement with GenAI). Our lists of generative AI applications can serve as starting points:
What is the level of interest in enterprise generative AI?
Though there are many signs that show that enterprise generative AI is booming (e.g. generative AI related revenues of consultants), this has not been reflected in search engine queries yet. However, there is increasing interest in enterprise AI which was likely triggered by the launch of ChatGPT:
FAQ
What is generative AI?
Generative AI includes text, image and audio output of artificial intelligence models which are also called large language models LLMs, language models, foundation models or generative AI models.
What are the examples of enterprise generative AI?
McKinsey’s Lilli AI leverages McKinsey’s proprietary data to answer consultant’s questions and cites its sources. McKinsey followed an LLM-agnostic approach and leverages multiple LLMs from Cohere and OpenAI in Lilli.
Walmart developed My Assistant generative AI assistant for its 50,000 non-store employees.
If you have other questions or need help in finding vendors, we can help:
External Links
- 1. Survey Report: Large Language Models in Production.
- 2. Unpopular opinion: Current AI is mostly engineering without science and… | Cem Dilmegani.
- 3. Introducing BloombergGPT, Bloomberg’s 50-billion parameter large language model, purpose-built from scratch for finance | Press | Bloomberg LP.
- 4. Amazon Warns Staff Not to Share Confidential Information With ChatGPT - Business Insider. Business Insider
- 5. Samsung Bans ChatGPT, Google Bard, Other Generative AI Use by Staff After Leak - Bloomberg. Bloomberg
- 6. Introducing ChatGPT Enterprise | OpenAI.
- 7. OpenAI: ChatGPT back in Italy after meeting watchdog demands | AP News. AP News
- 8. Introducing LLaMA: A foundational, 65-billion-parameter language model.
- 9. Large Language Models for Commercial Use | TrueFoundry. TrueFoundry
- 10. Wu S.; Irsoy O.; Lu S.; Dabravolski V.; Dredze M.; Gehrmann S.; Kambadur P.; Rosenberg D.; Mann G. “BloombergGPT: A Large Language Model for Finance“
- 11. [2305.11206] LIMA: Less Is More for Alignment.
- 12. Flowrite is now a part of MailMaestro.
- 13. [2305.14314] QLoRA: Efficient Finetuning of Quantized LLMs.
- 14. [2306.03341] Inference-Time Intervention: Eliciting Truthful Answers from a Language Model.
- 15. Ouyang L.; Wu J.; Jiang X.; Almeida D.; Wainwright C. L.; Mishkin P.; Zhang C.; Agarwal S.; Slama K.; Ray A.; Schulman J.; Hilton J.; Kelton F.; Miller L.; Simens M.; Askell A.; Welinder P.; Christiano P.; Leike J.; Lowe R. “Training language models to follow instructions with human feedback“.
- 16. Jesse Mu. “Natural Language Processing with Deep Learning“.
- 17. RLHF: Reinforcement Learning from Human Feedback.
- 18. Subscribe to read. Financial Times
- 19. Meta Llama 3 License.
- 20. Cheaper, Better, Faster, Stronger | Mistral AI.
- 21. Au Large | Mistral AI.
- 22. IBM’s Granite code model family is going open source - IBM Research. IBM
- 23. Introducing DBRX: A New State-of-the-Art Open LLM | Databricks Blog.
- 24. GitHub - xai-org/grok-1: Grok open release.
- 25. T. Liao, R. Taori, I. D. Raji, and L. Schmidt. “Are We Learning Yet? A Meta-Review of Evaluation Failures Across Machine Learning“. In Thirty-fifth Conference on Neural Information Processing Systems Datasets and Benchmarks Track (Round 2), 2021.
- 26. [2305.14314] QLoRA: Efficient Finetuning of Quantized LLMs.
- 27. Open LLM Leaderboard - a Hugging Face Space by open-llm-leaderboard. Open LLM Leaderboard
- 28. Chatbot Arena: Benchmarking LLMs in the Wild with Elo Ratings | LMSYS Org.
- 29. Elo rating system - Wikipedia. Contributors to Wikimedia projects
- 30. [2305.14314] QLoRA: Efficient Finetuning of Quantized LLMs.
- 31. [2005.14165] Language Models are Few-Shot Learners.
- 32. Wei J.; Wang X.; Schuurmans D.; Bosma M.; Ichter B.; Xia F.; Chi E. H.; Le Q. V.; Zhou D. “Chain-of-Thought Prompting Elicits Reasoning in Large Language Models“
- 33. OpenAI Platform.
- 34. The CEO’s Guide to the Generative AI Revolution | BCG. BCG Global
Comments
Your email address will not be published. All fields are required.