Since the launch of ChatGPT, generative AI and large language models (LLMs) have taken the world by storm. With capabilities born from the vast amounts of data used to train them, these models are dramatically changing the ways we produce and consume data. As data leaders, we are focused on their implications for data strategy. While the technology has not (yet) broadly upended data practices, we believe LLMs have the potential to make high quality proprietary data – and the robust privacy and security practices required to protect this data – more important than ever. We encourage leaders to consider the opportunities and the risks these models present, with a focus on three key trends:
Drive step function increases in data literacy and self-service
In recent years, many organizations have invested the time and effort to build a data warehouse and more robust reporting functionality. Despite this investment, growth stage companies still often find a gap between the data and the leaders across the organization who need to use it. It’s no wonder. Data is complex; many people in critical, decision-making roles don’t have the technical background required to decipher it; and data scientists who can bridge the gap are in short supply. While asking good questions and making good decisions is likely to require human intervention for some time, LLMs are already quite adept in other areas of the analytical process. For example, these models have an impressive ability to translate business questions into SQL or Python code in order to retrieve data, perform calculations, create charts and graphs, and summarize results in a narrative or bullet point. We expect that chat interfaces will also help educate workers about data concepts in-context and on-demand. (LLMs will happily explain the difference between mean and median as often as needed!) That said, while the potential for improvements in data literacy and self-service are significant, we recommend that leaders tread carefully in these areas. Models cannot easily distinguish between fact and fictional data; LLMs tend to “hallucinate”, confidently making up answers, and have problems calculating accurately. Use with caution in the near term.
Accelerated timeline from question to insight
Many executives can relate to this experience: ask a seemingly simple question about your business, only to find that it may take your team days, weeks, or more to get the answer. Data analysis – including locating data sources, stitching them together, ensuring data quality and (finally) performing the analysis – remains, in most organizations, a high-effort, time-consuming task. Technology has improved the process tremendously in recent years, from cloud data warehouses and data pipelines to drag-and-drop business intelligence software. And yet, there is still plenty of space for generative AI to further improve time to insight. We believe this will happen in two primary ways:
- Code creation: AI – including tools like Copilot and those with automatic code generation capabilities – are easing and accelerating the process of writing the code needed to move, clean, and analyze data.
- Access to larger data sets: LLMs can be tuned on unstructured data sets, allowing business users to query qualitative customer feedback, conversation histories from corporate instant messaging apps, and other data that may currently live outside “traditional” databases.
The result: a wider range of data will become available for analysis. With timely access to broader types of data, we expect business leaders will ask more questions, keeping data engineers busier than ever.
Heighted focus on data ownership, data quality and data privacy
The unprecedented performance of ChatGPT and other LLMs – on everything from writing poetry to passing exams – has captured our attention. At the same time, model accuracy and opacity surrounding sources of training data have raised meaningful ethical and legal concerns. To help improve accuracy and impact of these models, business leaders need to consider ways to fine-tune (i.e., provide additional training steps) or prompt (i.e., phrase a question in a particular way) models using their own internal data. For example, customer service teams can leverage existing documentation and past support solutions to fine-tune a support chat bot, providing customer service responses that are more specific than a general model. The quality of the data matters; answers can only be as good as the material that goes into the model.
At the same time, we encourage teams to tread very cautiously when deciding which data is used to train a model. Consider who has access to the data and what types of data are shared. Can anyone outside your organization access this information? Are you sharing any trade secrets, personally identifiable information (PII) or other confidential data? This type of information should never be entered into a consumer-level chat bot such as ChatGPT, and leaders need to proactively educate employees about this risk. We expect major cloud players, proprietary solution providers and open-source models alike to increase and improve offerings designed to help companies keep private data private while still taking advantage of the power of generative AI. We also expect regulation, including HIPAA, GDPR and other regulations, to play a role in risk management. For the moment, gen AI technology continues to evolve so rapidly that it has outpaced the introduction of new regulatory guidelines, and we expect that will continue for some time. In the meantime, we encourage executives to think proactively about data ownership and data quality, as well as the privacy protection practices that may affect their industry.
It can be easy to forget that data is at the heart of LLMs; the models appear to “just work” on demand. However, it’s important to keep in mind that the data underpinning these models is data about real products, processes and people – and that data quality, data ownership and data privacy should continue to be the foundations of any data strategy. Generative AI and large language models are positioned to make data work easier, faster and more accessible to a wider range of people in the future. All of this will certainly change our interactions with – and likely increase our demand for – data and the insights it can provide.
Don't delete this element! Use it to style the player! :)
The content herein reflects the views of Summit Partners and is intended for executives and operators considering partnering with Summit Partners. For a complete list of Summit investments, please click here.
Stories from the Climb
At Summit, it’s the stories that inspire us – the problems being solved and the different paths each team takes to grow a business. Stories from the Climb is a series dedicated to celebrating and sharing the challenges of building a growth company. For more Stories and other Summit perspectives, please visit our Growth Company Resource Center.
Subscribe to our newsletter to stay up to date on our partners, portfolio, and more.