· Insights · 11 min read
AI Lifecycle and the Board's Role
The AI lifecycle consists of data collection, training and fine-tuning, and deployment and integration. The AI trinity of data, software, and hardware is crucial for AI development. Boards should focus on data governance, resource allocation, ethical considerations, and strategic alignment when overseeing AI initiatives. A flexible and adaptable AI strategy is key to navigating the rapidly evolving AI landscape.
Boards are expected to make critical strategic decisions about their companies’ approaches to AI. It’s tough to do that without a basic understanding of the lifecycle of AI development and deployment, as well as the drivers of AI systems. I’ve written this post with the goal of helping board members understand the high-level steps of training a model (and what the board’s role is at each step) and how the board should think about data, software, and hardware when developing their organization’s strategic plans.
The Lifecycle of an AI Model
Though the specifics of an AI model’s lifecycle vary by developer, model architecture, and model type, the process generally progresses through the following steps, each of which is essential for developing a high-performing model:
- Data collection
- Training and fine-tuning
- Deployment and integration
Data Collection
The AI model cycle begins with data collection, as this will provide the model with the inputs needed for it to “learn.” The quality of the data used to train the model has a significant impact on the model’s performance: just like with data science, the old adage of garbage in, garbage out applies. The amount of data required will vary depending on the complexity of the model and the task it is being trained to perform. For example, a model that is being trained to tackle any kind of generative task by the public requires a much larger dataset than a model that is being trained to classify document types for a bank.
The data ethos has historically been more = better. As a counterexample, however, the increasing number of small language models being released suggests that this may not be the case. In addition, a number of existential questions related to copyright, privacy, and fair use surround the data collection process; though I feel very strongly about these, I won’t cover these in-depth in this post.
Board considerations:
- Data governance: Utilize a data governance strategy to ensure that data collected within the company is high-quality. Do we have a mature data strategy, or do we need to first focus on change management around data practices?
- Data provenance: Assess the provenance of training data. Have we undertaken a sufficient analysis of intellectual property rights? Can we license data that we’re missing?
- Data protection: Establish the relevant frameworks that regulate datasets and ensure that risks are addressed. Does the data contain any personal information? Do we need to carry out a data protection impact assessment (DPIA)?
- Competitive advantage: Establish the reason behind the dataset usage. Do we have proprietary or confidential internal data that we can use for training data, giving us a model that no one else has?
Training and Fine-Tuning
Once the data has been collected, the model developer next selects the type of model to train and fine-tune. The model developer will then label the data collected in the prior phase and train the model on it. The labeling and training process can be computationally expensive, so it has historically required access to specialized hardware and labor. Given these historical costs and the limited availability of specialized hardware, most usage of AI, such as large language models, has been through cloud-based services like OpenAI’s API.
The training process is iterative, and the model will typically be trained multiple times until it reaches a level of performance that is deemed acceptable. The training process is also where the model is fine-tuned on or aligned with a specific task.
Most organizations won’t train a foundation model from scratch, but will instead fine-tune an existing model that was developed by someone else. The models are frequently open weight models (e.g., LLaMA, BLOOM, BERT, and Falcon); while proprietary AI model vendors like OpenAI may allow third-parties to fine-tune their models, these services are often financially impractical and offer limited flexibility.
Board considerations:
- Resource allocation and scalability: Ensure that the current strategy will be able to scale as the organization, the model, and technology changes. How much should be budgeted for computational resources to train a model? Is it worth investing in capital assets, such as hardware, or should computing power be treated as an operating expense?
- Business objectives: Align model training and fine-tuning efforts with the company’s broader AI and business objectives. Is it necessary to train a model from scratch, or is a fine-tuned model sufficient to achieve business objectives?
- Environmental impact: How does training a model impact our ability to achieve Environmental, Social, and Governance (ESG) objectives?
- Ethical guidelines: Establish principles for responsible AI development, including considerations of bias mitigation and fairness in model training.
- Intellectual property strategy: Decide whether to develop proprietary models or leverage open-source alternatives, and how to protect any novel training techniques or model architectures.
- Talent acquisition and retention: Assess whether current staffing is appropriate for model training and fine-tuning. How will we attract and retain skilled data scientists, machine learning engineers, and researchers?
Deployment and Integration
Once the model has been trained and fine-tuned, it can be deployed to production. The deployment phase brings the model into active service, enabling it to deliver value in real-world situations. At this stage, the model is available to users (though as I’ve counseled before, it can be beneficial to initially use a proof-of-concept project before fully rolling out a model or AI system to the full organization).
During deployment, the model may or may not be integrated with other systems with which it can interact. Integration is typically complex, as it requires ensuring the model’s security and performance not only independently, but also jointly with other programs and systems.
Board considerations:
- Stakeholder management: Communicate with shareholders, customers, and employees regarding AI integration. How will we address concerns about AI from our workforce and the public?
- Performance monitoring and accountability: Define key performance indicators (KPIs) and reporting mechanisms to measure the success and Return on Investment (ROI) of AI and its impact on business objectives.
- Incident management: Establish board-level reporting mechanisms for AI incidents or performance issues. Develop an AI-specific incident response plan, and ensure adequate insurance coverage for such incidents. Do we have an insurance policy that covers AI incidents?
The AI Trinity: Data, Software, and Hardware
Three key assets are involved in the AI lifecycle: data, software, and hardware. Without any one of these, a model cannot exist or function, yet each is subject to its own changing economics and trends.
All three asset types have undergone significant changes in recent years, with the scope of investment and pace of innovation accelerating the speed at which these changes occur.
Data
AI models need data (and if you ask OpenAI, it needs ALL the data) - without data there can be no model. However, there are challenges here. “Publicly available data” (i.e., data available on the internet) drives most AI models, which may not be representative of the types of documents and material commonly encountered in most enterprise work. Moreover, reliance on “publicly available data” can potentially lead to a host of compliance issues, including lawsuits for copyright or other IP infringement.
The broader environment in which data is situated is constantly shifting due to the evolving volume and variety of data as well as the regulations and lawsuits surrounding it. The release of open source datasets by nonprofits and academic researchers has lowered the cost of data collection, enabling more organizations to train or fine-tune models.
The value of proprietary, domain-specific datasets has been demonstrated by organizations like Bloomberg, which trained a transformer-based model on its own financial data. The model outperformed general models on domain-specific tasks; we found similar results in the testing of our own model (🍊 KL3M) at 273 Ventures. We trained KL3M on high-quality professional legal and financial content; as a result our 170M parameter KL3M model outperformed a legal domain fine-tuned model 10 times its size. So, I can speak from personal experience that there can be a very real value to training on domain-specific data from the start - as long as it aligns with your strategic vision!
Data, both as the training data and the outputs from the models themselves, is becoming more regulated and scrutinized. Governments, plaintiffs, and other private parties seek to better understand the data on which models have been trained.
Board considerations:
- Legal exposure: Assess whether the use of existing public datasets is acceptable based on the organization’s risk criteria and business objectives.
- Responsible AI: Establish a set of responsible AI policies, including ones that govern data selection, collection, and use. Does our data introduce bias or errors that would be amplified if a model is trained on them?
Software
It may feel strange to think of an AI model as software, but software is the means by which model architecture is defined and how the data is processed and interpreted. The knowledge for how to design model architecture was once a significant barrier to entry, but thanks to the democratization of software via the open source community and the development of affordable alternatives, AI has been opened up to organizations of all stages and sizes.
The release of open source tools and models has made it significantly easier to use, train, and deploy models. In particular, a category of research on small language models and quantized models significantly reduced the GPU requirement for organizations. Quantized models use a lower resolution model that enables it to run on consumer-grade hardware (think a gaming computer) rather than specialized enterprise hardware. This trend is likely to continue, as the economic incentives for increased affordability are significant in light of the current scarcity of and long lead time to acquire enterprise GPU hardware.
An alternative model architecture has recently emerged that suggests that future generations of architecture may remove many current limitations related to memory requirements or context windows.
Board considerations:
- Technological improvements vs. business objectives: Establish clear business objectives to ensure that internal development is appropriately aligned. Are we chasing the latest technology unnecessarily, exposing ourselves to financial risks?
Hardware
Hardware is the engine required to execute the calculations required to convert data and software instructions into trained models and outputs. Historically, computing power involved primarily specialized Graphics Processing Units (GPUs), often from Nvidia - despite the cost (typically anywhere from $10k-$100k+, depending on the size), the demand for enterprise GPU hardware was vastly greater than the supply.
The advent of more specialized computing solutions designed for AI has resulted in an increase in hardware options for organizations, including more edge computing options (i.e., running models locally on machines). These types of hardware development have increased the accessibility of AI models by reducing the capital requirements.
More flexible hardware solutions have also emerged: while Amazon and Microsoft historically had the most GPU availability in theory, nearly all of it was locked up by large tech companies (namely Anthropic and OpenAI). Smaller computing providers have emerged since then, offering a mix of on-demand, short-term, and long-term contracts for computing resources.
Board considerations:
- Capital expenditure vs. operating expense: Ensure that the budget for capital requirements aligns with the technological strategy for AI development.
- Availability vs. flexibility: Based on the strategic vision and operational objectives, determine whether the organization values hardware availability or flexibility more. Do we want the computing power at all times (i.e., purchase GPUs), or do we prefer the flexibility to scale as needed (i.e., rent GPUs)? What risks are we exposed to with each approach?
Strategically Addressing the AI Trinity
Given the trends that I’ve discussed above, it’s apparent that the technical environment in which AI operates has not only undergone significant shifts over the recent years, but will also continue to do so. These changes will likely dramatically increase what’s possible for most organizations to do with AI. But how do you avoid adopting a technical AI strategy that is quickly outdated?
One of the best ways to address the changing nature of data, software, and hardware in AI is to craft a strategy that is flexible and adaptable, enabling your organization to take advantage of emerging technology and improvements. This applies not only to internally-developed AI, but to third-party products: ensure that the products and vendors that your organization uses allow for data portability, so that you can switch to a new solution if a better one emerges. In the early AI boom, many AI companies were able to get customers to sign up for subscription terms generally unheard of in SaaS; many of these customers kicked themselves when later entrants to the market introduced superior offerings.
By understanding the interplay between data, software, and hardware in the AI lifecycle, boards are better able to guide their organizations’ strategies, ensure responsible AI adoption, and make informed decisions about resource allocation and risk management.