Generative AI, also called GenAI, presents novel opportunities for enterprises compared to middle-market companies or startups including:
The opportunity to build your company’s models without exposing private data to 3rd parties
However, generative AI is a new technology with unique challenges for enterprises:
- Valuable proprietary data can be exposed which was stated by 36% of enterprises as a reason not to use commercial LLMs1
- Generative AI tools will create new services and solutions. Technology leaders can leverage them to enter new markets gaining market share at the expense of incumbents.
- Generative AI models, also called generative models, will bring new automation opportunities with the potential to increase customer satisfaction or reduce costs. Competitors can leverage them to get ahead.
- Reputational or operational risks due to generative models’ bias or hallucinations
Executives wonder how their organizations can reap the benefits of generative AI while overcoming these challenges. Below, we outline:
Generative AI use cases for large companies
Guidelines to leverage the full potential of generative AI solutions, including how to build and manage your company’s generative models.
How should enterprises leverage generative AI?
We charted a detailed path for businesses to leverage generative AI.
While most firms may not need to build their models, most large enterprises (i.e. Forbes Global 2000) are expected to build or optimize one or more generative AI models specific to their business requirements within the next few years. Finetuning can enable businesses to achieve these goals:
- Achieve higher accuracy by customizing model output in detail for their own domain
- Save costs. Customizable models with licenses permitting commercial use have been measured to be almost as accurate as proprietary models at significantly lower cost.2
- Reduce attack surface for their confidential data
Firms like Bloomberg are generating world-class performance by building their own generative AI tools leveraging internal data. 3
What are the guidelines for enterprise AI models?

At a minimum an enterprise generative AI model should be:
Trusted
Consistent
Most current LLMs can provide different outputs for the same input. This limits the reproducibility of testing which can lead to releasing models that are not sufficiently tested.
Controlled
Enterprises should host or integrate generative AI in environments where they can manage security and compliance at a granular level (e.g., on-premises or dedicated cloud instances). The alternative is using online chat interfaces or APIs like OpenAI’s LLM APIs.
The disadvantage of relying on APIs is that the user may need to expose confidential proprietary data to the API owner. This increases the attack surface for proprietary data. Global leaders like Amazon and Samsung experienced data leaks of internal documents and valuable source code when their employees used ChatGPT.4 5
Since then, enterprise offerings have matured significantly:
– OpenAI Enterprise (2023) and later ChatGPT Team (2024) introduced zero data retention, SOC 2 compliance, SSO/SAML integration, and admin controls.6
Secure
Enterprise-wide models may have interfaces for external users. Bad actors can use techniques like prompt injection to have the model perform unintended actions or share confidential data.
Ethical
Ethically trained
Model should be trained on ethically sourced data where Intellectual Property (IP) belongs to the enterprise or its supplier and personal data is used with consent.
1- Generative AI IP issues, such as training data that includes copyrighted content where the copyright doesn’t belong to the model owner, can lead to unusable models and legal processes.
2- Use of personal information in training models can lead to compliance issues. For example, OpenAI’s ChatGPT needed to be disclose its data collection policies and allow users to remove their data after the Italian Data Protection Authority (Garante)’s concerns.7
Read generative AI copyright issues and solutions to learn more.
Fair
Bias in training data can impact model effectiveness.
Licensed
The enterprise need to have a commercial license to use the model. For example using models like Meta’s LLaMa have noncommercial licenses preventing their legal use in most use cases in a for-profit enterprise. Models with permissive licenses like Vicuna built on top of LLaMa also end up having noncommercial licenses since they leverage the LLaMa model.8 9
Sustainable
Training generative AI models from scratch is expensive and consumes significant amounts of energy, contributing to carbon emissions. Business leaders should be aware of the full cost of generative AI technology and identify ways to minimize its ecological and financial costs.
Enterprises can strive towards most of these guidelines and they exist on a continuum except the issues of licensing, ethical concerns and control.
- It is clear how to achieve correct licensing and to avoid ethical concerns but these are hard goals to achieve
- Achieving control requires firms to build their own foundation models however most businesses are not clear about how to achieve this
How can enterprises build foundation models?
There are 2 approaches to build your firms’ LLM infrastructure on a controlled environment.
1- Build Your Own Model (BYOM)
This approach allows world-class performance costing a few million $ including computing (1.3M GPU hours on 40GB A100 GPUs in case of BloombergGPT) and data science team costs.10
BYOM is primarily pursued by enterprises in highly regulated sectors (e.g., finance, healthcare, defense) where data sensitivity and compliance requirements outweigh the costs. Some firms follow a hybrid approach by training smaller domain-specific models while leveraging external foundation models for general-purpose reasoning.
2- Improve an existing model
Most enterprises adopt this approach due to its cost efficiency and flexibility. Several methods are available:
2.1- Fine-tuning
It is a cheaper machine learning technique for improving the performance of pre-trained large language models (LLMs) using selected datasets.
Instruction fine-tuning was previously done with large datasets but now it can be achieved with a small dataset (e.g. 1,000 curated prompts and responses in case of LIMA).11 The importance of a robust data collection approach optimizing data quality and quantity is highlighted in early commercial LLM fine-tuning experiments.12
Compute costs in research papers have been as low $100 while achieving close to world-class performance.13
Model fine-tuning is an emerging with domain with new approaches like Inference-Time Intervention (ITI), an approach to reduce model hallucinations, being published every week.14
2.2- Reinforcement Learning from Human Feedback (RLHF)
A fine-tuned model can be further improved by human in the loop assessment. 15 16
2.3- Retrieval augmented generation (RAG)
RAG allows businesses to pass crucial information to models during generation time. Models can use this information to produce more accurate responses.
Contemporary frameworks such as LangChain and LlamaIndex facilitate secure integration of structured and unstructured enterprise data. Advanced RAG methods now include multi-hop retrieval and real-time search integration, further enhancing reliability and factual accuracy.
Enterprises are moving toward auto-grounding, where models connect to live data sources automatically to keep outputs current. Cloud providers like Azure now frame RAG as the core architecture for copilots, knowledge systems, and customer apps, prioritizing scalability and security.17
Given the high costs involved in BYOM, we recommend businesses to initially use optimized versions of existing models. Language model optimization an emerging domain with new approaches being developed on a weekly basis. Therefore businesses should be open to experimentation and be ready to change their approach.
Top cost-effective foundation models for enterprises
Machine learning platforms released foundation models with commercial licenses relying mostly on text on the internet as the primary data source. These models can be used as base models to build enterprise large language models:
– GPT-5 by OpenAI was released in August 2025 and is now the flagship model powering ChatGPT, Copilot, and API, with reasoning, multimodal support, and improved reliability. 18
– GPT-4.1 by OpenAI was released in April 2025 (with mini and nano variants) and replaced GPT-4o and GPT-4.5 in ChatGPT’s lineup. 19
– DeepSeek-V3.1 by DeepSeek (Aug 2025) extends long-context capabilities with an updated tokenizer and open weights. 20
– DeepSeek-R1 is the highest performing reasoning model with a permissive MIT license. However, DeepSeek team may have used OpenAI’s models in its training process which, if proven, could require a change in its licensing model in the US.21
– LaMA 4 by Meta is released as LLaMA 4 Maverick, Scout and a Behemoth preview. These models are natively multimodal (text and vision), support context windows up to 10 million tokens, and remain optimized for efficiency. 22
– Llama 3 by Meta was the former model with a commercial use license with some limitations for very large businesses. 23
– Mistral 8x22B is the latest open-weights model developed by the European generative AI startup Mistral. With its permissive license (i.e. Apache 2.0) that allows commercial use without specific restrictions for large businesses, it can be attractive for all businesses.24 Mistral also provides models like Mistral Large but that model has more restrictive licensing.25
– IBM’s Granite models are high performing according to code generation benchmarks and are available with the permissive Apache 2.0 license.26
– DBRX is an open-weights model developed by the data platform Databricks. It comes with a commercial license with similar limitations to Meta’s models. Limitations apply to businesses serving more than 700M active users. 27
– X released its 314 billion parameter Grok-1 model with the permissive Apache 2.0 license.28
– Grok‑2 by xAI was released under a custom “Grok‑2 Community License” that allows downloads but is not truly open source and comes with usage restrictions.29
– Grok-4 by xAI was released in July 2025 with native tool use, real-time search integration, and a “Heavy” variant for advanced reasoning. 30
– Formerly, Grok-3 by xAI was introduced in February 2025 with the Deep Search feature and served as the direct predecessor to Grok-4.
What is the right tech stack for building large language models?
Generative AI is an artificial intelligence technology and large businesses have been building AI solutions for the past decade. Experience has shown that leveraging Machine Learning Operations (MLOps) platforms significantly accelerate model development efforts.
In addition to their MLOps platforms, enterprise organizations can rely on a growing list of Large Language Model Operations (LLMOps) tools and frameworks like Langchain, Semantic Kernel or watsonx.ai to customize and build their models, AI risk management tools like Nemo Guardrails.
In early days of new technologies, we recommend executives to prioritize open platforms to build future-proof systems. In emerging technologies, vendor lock-in is an important risk. Businesses can get stuck with outdated systems as rapid and seismic technology changes take place.
Finally, data infrastructure of a firm is among the most important underlying technologies for generative AI:
Vast amounts of internal data need to be organized, formatted.
Data quality and observability efforts should ensure that firms have access to high quality, unique, easily-usable datasets with clear metadata.
Synthetic data capabilities may be necessary for model training
How to evaluate large models’ performance?
Without measurement of effectiveness, the value of generative AI efforts can not be quantified. However, LLM evaluation is a difficult problem due to issues in benchmark datasets, benchmarks seeping into training data, inconsistency of human reviews and other factors.31 32 .
We recommend an iterative approach that increases investment in evaluation as models get closer to be used in production:
– Use benchmark test scores to prepare shortlists. This is available publicly for a large number of open source models.33 34
– Rely on Elo scores used in ranking players in zero-sum games like chess, compare the models to be selected. If there are higher performing models which are not available to be used (e.g. due to licensing or data security issues), they can be used to compare the responses of different models. 35
This can also include chain-of-thought prompting.36
Retrieval augmented generation (RAG) can also be used with commercials models if the enterprise is content with the data security policies of the foundation model provider.
Fine-tuning is also available to further improve model performance of commercial models offered via APIs.37
Pre-foundation model steps for enterprises
Building your enterprise model can take months since the steps below need to be completed. Each of these steps can take weeks to months, and they can not be fully parallelized:
- Data collection can take weeks to months. AI data collection services can accelerate this process by helping companies generate balanced, high-quality instruction datasets and other data for building or fine-tuning models. You can also work with data crowdsourcing platforms for more diverse datasets.
Hiring data scientists with LLM expertise or hiring consultants can take weeks to months.
Training and deployment
Integrating models to business processes and systems
We recommend business leaders encourage experimentation with GenAI. It requires a paradigm shift: We must view machines not as senseless robots but as co-creators. Organizations should start using GenAI to foster this mindset shift, educating employees about its potential and empowering them to change how they work. As consultants often say, the key to any transformation, including AI transformation, is people.

Teams can leverage existing APIs to automate processes in domains where value of confidential data is lower and system integration is easier. Example domains where teams can leverage GenAI to improve productivity and increase teams’ familiarity with generative AI without building own models:
New content creation and optimizing generated content for marketing campaigns
Code generation for front-end software
Conversational AI for customer engagement and support
There are tens of more generative AI applications
Sustainability & costs
Generative AI requires significant computing resources, and therefore has both financial and environmental costs. Enterprises should evaluate these trade-offs carefully when deciding whether to build or optimize models.
Key considerations include:
- Lifecycle modeling: Research shows that the carbon footprint of LLMs spans training, inference, and even the hardware itself. Tools such as LLMCarbon provide frameworks to estimate these costs end-to-end.39
- Cloud sustainability controls: Cloud providers (e.g., Google, Microsoft, AWS) now publish data on the carbon intensity of their data centers.40 This underscores the need to right-size models and optimize inference rather than always chasing the largest model available.41
Practical cost-reduction tactics
Enterprises are adopting methods such as:
- Using smaller, specialized models (fine-tuned on internal data) rather than training from scratch.
- Applying efficiency techniques like quantization (compressing models) or request caching.
- Leveraging RAG so models only generate when needed, instead of retraining with every new dataset.
- Tracking not only financial cost but also CO₂ and water usage at the use-case level for transparency.
Recommendation: Business leaders should treat sustainability as both a cost control strategy and a compliance priority. By aligning AI deployment with corporate ESG goals, enterprises can reduce expenses and limit reputational risk.
What are enterprise generative artificial intelligence use cases?
The web is full of B2C use cases such as writing emails with generative AI support that don’t require deep integration or specialized models. However, enterprise value of generative AI comes from enterprise AI applications listed below:
Common use cases
Enterprise Knowledge Management (EKM): While SMEs and mid-market firms do not have challenges in organizing their limited data, Fortune 500 or Global Forbes 2000 need enterprise knowledge management tools for numerous use cases. Generative AI can serve them. Applications include:
Insight extraction by tagging unstructured data like documents
Summarization of unstructured data
Enterprise search which goes further than keyword search taking into account relationships between words
Part of enterprise search includes answering employee questions about:
Company’s practices (e.g. HR policies)
Internal company data like sales forecasts
A combination of internal and external data. For instance: How would potential future sanctions targeting MLOps systems sales to our 3rd largest geographic market affect our corporate performance?
Larger organizations serve global customers and machine translation ability of LLMs are valuable in use cases like:
Website localization
Creating documentation like technical manuals at scale for all geographies
Social media listening targeting a global audience
Multilingual sentiment analysis
Industry specific applications
Most enterprise value is likely to come from using generative AI technologies for innovation in companies’ specific industries: This could be in the form of new products and services or new ways of working (e.g. process improvement with GenAI). Our lists of generative AI applications can serve as starting points:
What is the level of interest in enterprise generative AI?
Though there are many signs that show that enterprise generative AI is booming (e.g. generative AI related revenues of consultants), this has not been reflected in search engine queries yet. However, there is increasing interest in enterprise AI which was likely triggered by the launch of ChatGPT:
Adoption level
Since last year, major advisory houses have updated enterprise GenAI adoption roadmaps to emphasize operating-model change, governance, and value capture over tooling alone:
- 78% of organizations report using AI in at least one function; firms are rewiring workflows, appointing AI governance leads, and formalizing model-risk processes.42
- GenAI moving past the “peak hype,” with roadmap guidance shifting toward governed, productized use cases and platform thinking.43
AI’s productization gap
While model performance improves every few weeks, enterprise products often lag. Many solutions simply add AI into existing workflows (e.g., chat widgets, form fillers) instead of creating AI-first experiences designed from the ground up.
The real opportunity lies in rethinking products so AI becomes the core interaction model, not an add-on.44
FAQ
What is generative AI?
Generative AI includes text, image and audio output of artificial intelligence models which are also called large language models LLMs, language models, foundation models or generative AI models.
What are the examples of enterprise generative AI?
McKinsey’s Lilli AI leverages McKinsey’s proprietary data to answer consultant’s questions and cites its sources. McKinsey followed an LLM-agnostic approach and leverages multiple LLMs from Cohere and OpenAI in Lilli.
Walmart developed My Assistant generative AI assistant for its 50,000 non-store employees.
If you have other questions or need help in finding vendors, we can help:
External Links
- 1. Survey Report: Large Language Models in Production.
- 2. Unpopular opinion: Current AI is mostly engineering without science and can be a net negative for society for years. This is painful to say. I have been fascinated by AI for the last 20 years and… | Cem Dilmegani.
- 3. Introducing BloombergGPT, Bloomberg’s 50-billion parameter large language model, purpose-built from scratch for finance | Press | Bloomberg LP.
- 4. Amazon Warns Staff Not to Share Confidential Information With ChatGPT - Business Insider. Business Insider
- 5. Samsung Bans ChatGPT, Google Bard, Other Generative AI Use by Staff After Leak - Bloomberg. Bloomberg
- 6. https://openai.com/blog/introducing-chatgpt-enterprise[/efn_note]
– Major providers (e.g., Anthropic, Microsoft, Google, Cohere) now advertise customer data opt-outs, meaning user prompts and outputs are not used for model training.
– Providers have also begun aligning with EU AI Act (2024) requirements, which emphasize responsible AI principles like transparency, auditability, and risk management in high-risk AI systems.
Despite these advances, residual risks remain when relying on third-party cloud systems:
- Malicious insiders or compromised providers could still access enterprise data.
- API misconfigurations can expose sensitive data flows.
- Lack of explainability in LLMs continues to challenge compliance teams.
For highly regulated industries, self-hosting or private deployment of foundation models (via open-weight models like LLaMA-4, Mistral, or Granite) remains the most secure approach, though at higher operational cost.
Explainable
Unfortunately, most generative AI models are not capable of explaining why they provide certain outputs. This limits their use as enterprise users that would like to base important decision making on AI powered assistants would like to know the data that drove such decisions. XAI for LLMs is still an area of research.
Reliable
Hallucination (i.e. making up falsehoods) is a feature of LLMs and it is unlikely to be completely resolved. Enterprise genAI systems require the necessary processes and guardrails to ensure that harmful hallucinations are minimized or detected or identified by humans before they can harm enterprise operations.
Enterprises increasingly rely on retrieval-augmented generation (RAG) pipelines to reduce hallucinations by grounding models in trusted data. Yet challenges remain in infrastructure, storage, and security, making RAG not just a fix but a long-term enterprise requirement.6https://www.infinidat.com/en/resource-pdfs/role-storage-ai-applications-and-workloads.pdf
- 7. OpenAI: ChatGPT back in Italy after meeting watchdog demands | AP News. AP News
- 8. Introducing LLaMA: A foundational, 65-billion-parameter language model.
- 9. Large Language Models for Commercial Use | TrueFoundry. TrueFoundry
- 10. https://arxiv.org/pdf/2303.17564
- 11. [2305.11206] LIMA: Less Is More for Alignment.
- 12. Flowrite is now a part of MailMaestro.
- 13. [2305.14314] QLoRA: Efficient Finetuning of Quantized LLMs.
- 14. [2306.03341] Inference-Time Intervention: Eliciting Truthful Answers from a Language Model.
- 15. Ouyang L.; Wu J.; Jiang X.; Almeida D.; Wainwright C. L.; Mishkin P.; Zhang C.; Agarwal S.; Slama K.; Ray A.; Schulman J.; Hilton J.; Kelton F.; Miller L.; Simens M.; Askell A.; Welinder P.; Christiano P.; Leike J.; Lowe R. “Training language models to follow instructions with human feedback“.
- 16. RLHF: Reinforcement Learning from Human Feedback.
- 17. How RAG and auto-grounding transform enterprise applications with Azure | Sphesihle Mhlongo posted on the topic | LinkedIn.
- 18. https://openai.com/index/gpt-5-new-era-of-work/
- 19. https://en.wikipedia.org/wiki/GPT-4.1[/efn_note]
– DeepSeek-V3 by DeepSeek is an MoE model (~671B, MIT-licensed) with strong reasoning and coding performance and has been open-source since March 2025. 20https://api-docs.deepseek.com/news/news250325
- 20. DeepSeek-V3.1 Release | DeepSeek API Docs.
- 21. Subscribe to read. Financial Times
- 22. Unmatched Performance and Efficiency | Llama 4.
- 23. Meta Llama 3 License.
- 24. Cheaper, Better, Faster, Stronger | Mistral AI.
- 25. Au Large | Mistral AI.
- 26. IBM’s Granite code model family is going open source - IBM Research. IBM
- 27. Introducing DBRX: A New State-of-the-Art Open LLM | Databricks Blog.
- 28. GitHub - xai-org/grok-1: Grok open release.
- 29. Grok-2 Beta Release | xAI.
- 30. Grok 4 | xAI.
- 31. https://datasets-benchmarks-proceedings.neurips.cc/paper/2021/file/757b505cfd34c64c85ca5b5690ee5293-Paper-round2.pdf
- 32. [2305.14314] QLoRA: Efficient Finetuning of Quantized LLMs.
- 33. Open LLM Leaderboard - a Hugging Face Space by open-llm-leaderboard. Open LLM Leaderboard
- 34. Chatbot Arena: Benchmarking LLMs in the Wild with Elo Ratings | LMSYS Org.
- 35. [2305.14314] QLoRA: Efficient Finetuning of Quantized LLMs.
- 36. https://arxiv.org/pdf/2201.11903
- 37. OpenAI Platform.
- 38. The CEO’s Guide to the Generative AI Revolution | BCG. BCG Global
- 39. https://proceedings.iclr.cc/paper_files/paper/2024/file/6b4ac044095525631df38e20919b45d2-Paper-Conference.pdf
- 40. https://datacenters.google/operating-sustainably[/efn_note]
- Choosing greener regions or low-PUE (power usage effectiveness) facilities can significantly lower emissions.43https://cloud.google.com/sustainability/region-carbon[/efn_note]
- Industry reporting: Independent reports (e.g., Stanford AI Index, MIT Tech Review) highlight that data center emissions are rising, even as efficiency improves.44https://hai.stanford.edu/ai-index/2025-ai-index-report
- 41. https://news.mit.edu/2025/explained-generative-ai-environmental-impact-0117
- 42. https://www.mckinsey.com/~/media/mckinsey/business%20functions/quantumblack/our%20insights/the%20state%20of%20ai/2025/the-state-of-ai-how-organizations-are-rewiring-to-capture-value_final.pdf
- 43. The 2025 Hype Cycle for Artificial Intelligence Goes Beyond GenAI.
- 44. AI product innovation lags behind model advancements | Madhu Gurumurthy posted on the topic | LinkedIn.
Comments
Your email address will not be published. All fields are required.