Why Data Resilience Matters Even More in the AI Age
Nasuni’s Russ Kennedy explains how data resilience has long been a core tenet of enterprise IT, and the impacts of AI and ML solutions.
September 18, 2024 | Russ Kennedy
A friend in the healthcare industry recently told me the sort of story that gives CISOs data resilience nightmares. The hospital she works for was hit by a sophisticated cyberattack that cut off access to critical datasets. To restore operations, IT had to recover old data from backup systems, a process that took more than a month.
The hospital’s IT team naturally feared exposing new data to follow-up attacks, so as the recovery process began, the hospital’s staff was forced to revert to an old form of data collection. In the age of AI, they had to go back to paper and pen.
Today’s cyber intrusions are becoming more difficult to detect and more precise in their strikes. Cybercriminals are using large language models (LLMs) to craft phishing emails, minimizing the sort of grammatical errors that typically raise red flags for discerning readers. Yet enterprises are constantly challenged by having to bring their cybersecurity measures up to the level of their attackers, especially with regard to their AI systems. A recent report from IBM and AWS revealed that less than a quarter of generative AI projects are being secured.
The Importance of Securing Data
As large organizations integrate advanced machine learning and AI models and make them an essential part of operational workflows, safeguarding these systems will be critical. But there is another AI-related dimension to cybersecurity the enterprise continues to overlook: enhancing the protection of both the data these tools need to work effectively and the new data they generate. Today, in the age of AI, securing your organization’s data may be more important than ever.
Data resilience has long been a core tenet of enterprise IT. The difference now is that companies are using data to train and feed their AI and ML solutions. LLMs are so capable because they were trained on a vast corpus of data. After training, when AI models are deployed in the wild, data is equally essential.
These advanced systems will not produce valuable insights if they do not have access to secure, cleansed, organized, relevant data. And, importantly, automated workflows that deliver significant business improvements will crater if the underlying data on which those models operate is compromised.
Manufacturing
Let’s say you have a manufacturing business and you are operating a number of plants that produce complex widgets. At each step of the production and assembly process, you have scanners capturing high-resolution, perhaps even three-dimensional images of the components.
Manufacturing is already a highly automated industry. AI can be deployed as a driver of automation that helps firms find ways to increase yield and quality while producing more efficiently at a lower cost. To improve quality and failure analysis, for example, you could take those images captured as a component moves through the production line and use AI to pick out potential defects or flaws.
Both historical and newly generated data will be essential here. You will need to store a compilation of images to build and train the model. Then this trained model will need the new data to actually do its work and identify potential problems in components as they are produced.
This data needs to be fresh, relevant and usable. If there is a cyberattack-related interruption of service or data is encrypted with a strain of ransomware, then the plant’s workflow will be completely disrupted. The newly automated failure analysis process will be suspended, and you will have to decide whether to shut down or risk producing flawed components. Neither choice is a good one.
Media And Marketing
Another industry that has led the way in terms of the adoption of AI tools is media and marketing. My company works closely with a number of leading global firms, and one of our customers, a global advertising giant, introduced a custom LLM for in-house creative work and client use.
Additionally, these firms have long relied on data from focus groups and studies of consumer behavior, and AI is emerging as a way to generate novel analyses and insights. Again, though, the key variable here is data. If secure, quality data is not available to these models, then they will not produce valuable or reliable results.
Prioritizing Data Resilience
The good news is that this is a solvable problem. Although cyber threats are becoming more sophisticated and the potential for disruption is greater as organizations rely more on AI solutions, there are built-in security and data resilience tools designed to protect against these threats and, more importantly, accelerate data recovery should such attacks prove successful.
The problem is that they are not necessarily an executive-level priority right now. This is a mistake. Organizations need to prioritize planning around data resilience and rapid recovery strategies now more than ever. Protecting the data that feeds AI models must be a core part of an organization’s AI playbook—otherwise, you risk reverting back to paper and pen.