In today’s fast-paced business environment, obtaining actionable insights swiftly is crucial. Large Language Model (LLM) chatbots are emerging as powerful tools in Business Intelligence (BI) platforms, offering an intuitive way to interact with complex data. These advanced chatbots leverage the familiarity of conversational interfaces, similar to popular messaging apps like WhatsApp and Slack, to provide straightforward responses to intricate business queries.
Avi Perez, CTO of Pyramid Analytics, explains that the appeal of LLM chatbots lies in their ability to understand and respond in plain, conversational language, making data analysis accessible to non-technical users. This integration is transforming data interrogation, moving away from traditional methods to more dynamic interactions. Users can now ask questions ranging from simple data retrievals to in-depth analytical inquiries, understanding trends, forecasting outcomes, and identifying actionable insights.
However, incorporating LLM chatbots into BI systems presents challenges, especially concerning data privacy and compliance. To address these concerns, innovative solutions like those implemented by Pyramid Analytics ensure data remains within the secure confines of the organization’s infrastructure. This interview with Avi Perez delves into the advantages of LLM chatbots, privacy risks, compliance challenges, and future trends, offering a comprehensive overview of how these chatbots are revolutionizing BI and shaping the future of data-driven decision-making.
LLM Chatbots in BI
– Can you explain what LLM chatbots are and why they are being integrated into Business Intelligence products?
An LLM chatbot is an interface that is familiar to many users, allowing them to basically interact with a computer through plain language. And if you consider how many people today are so used to using things like WhatsApp or a messaging tool like Teams or Slack, it’s obvious that a chatbot is an interface that they’re familiar with. The difference is, you’re not talking to a person, you’re talking to a piece of software that is going to respond to you.
The power of the large language model engine allows people to talk in very plain, vernacular type language and get a response in the same tone and feeling. And that’s what makes the LLM chatbot so interesting.
The integration into business intelligence, or BI, is then very appropriate because, typically, people have a lot of questions around the data that they’re looking at and would like to get answers about it. Just a simple, “Show me my numbers,” all the way through to the more interesting aspect which is the analysis. “Why is this number what it is? What will it be tomorrow? What can I do about it?” So on and so forth. So it’s a very natural fit between the two different sets of technologies.
I think it’s the next era because, in the end, nobody wants to actually run their business through a pie chart. You actually want to run your business through getting straightforward answers to complicated business questions. The analysis grid is the old way of doing things, where you have to do the interpretation. And the chatbot now takes it to a new level.
Business Value
– What are the primary advantages that LLM chatbots bring to Business Intelligence tools and platforms?
The greatest value is simplifying the interaction between a non-technical user and their data, so that they can ask complicated business questions and get very sophisticated, clean, intelligent answers in response and not being forced to have to ask that question in a particular way, or get a response that is unintelligible to them. You can calibrate both of those things, both on the in and on the out, using the LLM.
It simplifies things dramatically, and that makes it easier to use. If it’s easy to use, people use it more. If people use it more, they’re making more intelligent decisions on a day-to-day basis. If you’re doing that, you’re going to make better decisions, and, therefore, we should, in theory, get a better business outcome.
Data Privacy Risks
– How significant are the data privacy risks associated with integrating LLM chatbots into BI systems?
Initially, the way people thought the LLM was going to work is that users would send the data to the chatbot and ask it to do the analysis and then respond with an outcome. And in fact, there are quite a few vendors today that are selling just that kind of interaction.
In that regard, the privacy risks are extreme, in my opinion. Because you’re effectively sharing your top-secret corporate information that is completely private and frankly, let’s say, offline, and you’re sending it to a public service that hosts the chatbot and asking it to analyze it. And that opens up the business to all kinds of issues – anywhere from someone sniffing the question on the receiving end, to the vendor that hosts the AI LLM capturing that question with the hints of data inside it, or the data sets inside it, all the way through to questions about the quality of the LLM’s mathematical or analytical responses to data. And on top of that, you have hallucinations.
So there’s a huge set of issues there. It’s not just about privacy there, it’s also about misleading results. So in that framework, data privacy and the issues associated with it are tremendous in my opinion. They’re a showstopper.
However, the way we do it at Pyramid is completely different. We do not send the data to the LLM. We do not even ask the LLM to interpret any sets of data or anything like that. The closest we come to is allowing the user to ask a question; explaining to the LLM what ingredients, or what data structures, or data types we have in the pantry, so to speak; and then asking the LLM to generate a recipe for how they might respond to that question, given the kinds of ingredients we have. But that LLM doesn’t actually work out or resolve in the analysis, or do any kind of mathematical treatment – that is done by Pyramid.
So the LLM generates the recipe, but it does it without ever getting their hands on the data, and without doing mathematical operations. And if you think about it, that eliminates something like 95% of the problem, in terms of data privacy risks.
Specific Compliance Challenges
– What are the most pressing compliance challenges companies face when using LLM chatbots in BI, especially in regulated industries?
Regulations generally relate to the issue of sharing data with the LLM and getting response from the LLM, and that whole loop and the security issue associated with it. So this actually goes very much to the previous question, which is how can we ensure that the LLM is responding effectively with information results in a way that does not breach the sharing of data, or breach the analysis of data, or provide some kind of hallucinatory response to the
data. And as I said in my previous response, that can be resolved by taking the issue of handing the data to the LLM away.
The best way to describe it is the baking story, the cooking story, that we use at Pyramid. You describe the ingredients that you have in the pantry to the LLM. You tell the LLM, “Bake me a chocolate cake.” The LLM looks at the ingredients you have in the pantry without ever getting their hands on the ingredients, and it says, “Okay, based on the ingredients and what you asked for, here’s the recipe for how to make the chocolate cake.” And then it hands the recipe back to the engine – in this case, Pyramid – to go and actually bake the cake for you. And in that regard, the ingredients never make it to the LLM. The LLM is not asked to make a cake and, therefore, a huge elimination of the problem.
There are many issues around compliance that are solved through that, because there’s no data shared. And the risk of hallucinations is reduced because the recipe is enacted on the company’s data, independent of the LLM, and therefore there’s less of a chance for it to make up the numbers.
Risk Mitigation
– What strategies can companies adopt to mitigate the risks of sensitive information leaks through these AI models?
If you never send the data, there is really no leak out to the LLM or to a third-party vendor. There is just that small gap of some user typing into a question, “My profitability is only 13%. Is that a good or a bad number?” By sharing that number in the question, you expose your profitability level to that third party. And I think one of the ways to try and solve that is through user education. I expect there will be technologies coming along soon that will pre-screen the question in advance.
But for the most part, even sharing that little snippet is very, very minimal, compared to sharing your entire P&L, all your transactions in your accounting solution, all the detailed information from your HR system around people’s payrolls, or a healthcare plan sharing patients’ HIPAA-sensitive data sets with an LLM.
Technological Safeguards
– Are there specific technological safeguards or innovations that enhance data privacy and compliance when using LLM chatbots in BI?
All of that is gone under the recipe model, whereby you don’t share the data with the solution.
Another way is to completely change the whole story and to take the LLM offline and run it yourself privately, off the grid, in an environment that you control as the customer. No one else can see it. The questions come, the questions go, and there is no such issue whatsoever.
We allow our customers to talk to offline LLMs. We have a relationship now with IBM’s Watsonx solution, which offers that offline LLM framework. And in that regard, you provide maybe the greatest hermetically sealed approach to doing things, whereby no one can see the questions coming or going. And, therefore, even that last 5% issue, where a user might inadvertently share a data point in a question itself, even that problem is taken off the table.
If you are running off the grid, if you’re running your own sandbox, it doesn’t mean it has to be running locally. It could still be running at the cloud, but no one else has access to your LLM instance. You really have the greatest level of protection with the whole thing.
Role of Data Governance
– How critical is data governance in the secure and compliant deployment of LLM chatbots within BI products?
So if it’s open season and you can do whatever you want with a chatbot, you have a big headache on data governance. In the “fly by the seat of your pants” approach, where people send data in even an Excel spreadsheet to the LLM, the LLM will read the dataset, do something with it, and come back and give me a response. On a governance track, this is a huge headache, because who knows what dataset you’re sending in? Who knows what the LLM will respond to with that dataset? And, therefore, you could get a very, very garbled misunderstanding by the user, based on the response of the LLM.
You can see immediately how that problem gets completely vacated through the strategy I shared, whereby the LLM is only in charge of generating the recipe. All the analysis, all the work, all the query on the data is done by the robot.
Because Pyramid is doing the analysis, Pyramid is doing the mathematical operations, the problems get squashed completely. Better than that, because Pyramid also has a full-blown data security structure built into the platform, it doesn’t matter what question the user asks, because Pyramid itself is generating the query on behalf of that given user, within the confines of their data access, their functional access. This is all filtered and limited by the overarching security applied to that user in the platform. So in that regard, again, governance is handled far better by a full-blown solution than it would be by an open-ended chatbot, where the user can upload their own LLM.
Employee Training and Awareness
– How can companies ensure their employees are well-trained and aware of the risks and best practices for using LLM chatbots in BI tools?
This is a perennial problem in any kind of advanced technology. It’s always a challenge to get people trained and aware. It doesn’t matter how much you train people, there’s always a gap, and it’s always a growing gap. And in fact, it’s a huge problem because people hate to read help resources. People hate to go for training courses. On the other hand, you wantthem to use the cool new technologies, especially if they could use some very clever things.
So the first thing is to actually train employees more about how to ask good questions, train employees to be questioning of the results set because the LLM is still an interpretive layer that you never know what you’re going to get. But the beauty of the new LLM universe that we live in is that you don’t need to teach them how to ask questions structurally. And that’s to the credit of the LLMs and their what I call interpretive capabilities.
Beyond that, employees need very little training, because for the most part, they don’t need to be taught how to ask the question or use the tool in a specific way. I think the only part that is left then is teaching users how to look at the results that come back from the LLM. And to look at them with a degree of skepticism because it’s interpretive in the end, and people need to know that it’s not necessarily the be all and end all response.
Case Studies or Examples
– Can you share any success stories or examples where companies have effectively integrated LLM chatbots into their BI systems while maintaining data privacy and compliance?
We have customers who’ve integrated Pyramid in an embedded scenario, where you take Pyramid’s functionality and drop it into their third-party applications. The LLM is then baked into that solution too. Very, very elegant because probably the highest use case scenario for a chatbot or a natural language querying scenario is embedded. Because this is where you have your least technical, least trained, least tethered users logging into third-party application and wanting to use analysis.
Specific names and companies who’ve implemented this, I cannot share with you, but we have seen this being deployed at the moment in retail for suppliers and distributors – that’s one of the biggest use cases. We’re beginning to see it in finance, in different banking frameworks, where people are asking questions around investments. We’re seeing those use cases pop up a lot. And insurance is going to be a growing space.
Emerging Trends
– What emerging trends do you see in the use of LLM chatbots within the BI sector, particularly concerning data privacy and compliance?
The next big trend is around users being able to ask really specific questions on very granular data points in a dataset. This is the next big thing. And there are inherent issues with getting that to work on a scalable, effective, and performance vector. It’s very difficult to make that work. And that’s the next trend in the LLM chatbot space.
And that, too, then brings into questions around data privacy and compliance. And I think part of it is solved by the governance framework that we’ve put in place, where you can ask
the question, but if you don’t have access to the data, you’re simply not getting a response around that. That’s where tools like Pyramid would provide the data security. But, again, if this becomes a broader problem on different tangents to this same headache, then you’re going to see more and more customers demanding to have private offline LLMs that are not running through the public domain, certainly not to third-party vendors where they have no control over the use of that stuff.
Regulatory Developments
– How do you anticipate the regulatory landscape will evolve in response to the increasing use of AI and LLM chatbots in business applications?
I don’t see it happening at all, actually. I think there’s a bigger concern around AI in general. Is it biased? Is it giving responses that could incite violence? Things like that. Things that are more generic around generative AI functionality – is the AI model “appropriate”? I’m going to use that word very broadly. Because I think there’s a bigger push on that side from the regulatory aspect of it.
In terms of the business aspect, I don’t think there’s an issue, because the questions you’re asking are super specific. It’s on business data, and the response is business centric. I think you’re going to see far less of an issue there. There will be a spillover from one to the other, but no one’s really concerned about bias, for example, in these situations, because we’re going to run a query against your data and going to give you the answer that your data represents.
So I think these two things are being conflated together. I think the regulation landscape is more about the AI model and how it was generated. And it’s not related to the business application side, especially if the business application is about querying business data on specific questions related to the business. That’s my take on it for now. We’ll see what happens.
Executive Advice
– What advice would you offer to other executives considering the integration of LLM chatbots into their BI products, particularly in terms of data privacy and compliance?
A chatbot is only as good as the engine that runs the querying. So going back to my cake scenario, anybody can keep a pantry of ingredients, anyone can share ingredients and write what I call the prompts to the chatbot. That’s not so difficult. Getting their chatbot to respond with a good recipe, it’s not easy, but it’s achievable. And so, really, the real magic is which robot is going to go and take the ingredients and build the query for you and build an intelligent response to the user’s question, bring it back to data analysis?
And so, if you really think about it, the majority of the problem beyond the interpretive layer, which is still the LLM’s domain and where its tremendous magic lives, is in the query engine. And that’s actually where all the focus should be, ultimately – coming up with more
and more sophisticated recipes, but then having a query engine that can work out what to do with it. And if the query engine is part of a very smart broad platform that includes governance, security layers, associated with it, then your data security issues are mitigated heavily through that. If the query engine can only respond in the context of the security associated with me as the user, I’m really going to mitigate that problem dramatically. And that’s effectively how to solve it.