The New Era of Big Data

June 14, 2023 | Russ Kennedy The New Era of Big Data

We need to reset the way we think about data.

Until recently, enterprise data was siloed within massive, expensive facilities and storage arrays. Data was stuck. Today, the data center is no longer the center of data. The cloud and edge now offer more flexible and viable alternatives for storing data. Data can be made available anywhere at any time to any authorized person, machine or application. In a way, data has become its own entity, and we are only beginning to explore what we can do with it.

Given that we are roughly a dozen years into the era of big data, a post about the coming boom in analytics and intelligence would seem outdated. The field of data analytics has spawned some of the most successful technology companies of the past decade—innovators that help enterprises uncover valuable insights leading to new business and customer initiatives and transforming data into dollars.

As I see it, though, we have barely begun to tap into the potential of data analytics because we have been only marginally focused on the largest segment of enterprise data.

Data comes in two basic forms: structured and unstructured. The leading analytics tools were designed and optimized for structured data, which resides in a standardized format in a traditional or cloud warehouse such as a SQL database. Unstructured data like documents, images, videos, and advanced architectural, engineering and manufacturing models are not so easily crammed into tables.

Files and other unstructured data sets are estimated to make up 80% of enterprise data, yet this data has historically been anchored to hardware. Until recently, big data analytics solutions have forced conformity of unstructured data to fit into a model that facilitates analytics rather than leveraging the variety and flexibility of the data in whatever format makes sense.

These huge stores of unstructured data can now be made available to a wide range of exciting new analytics services. One of the things that makes this so exciting is supporting the variability. A large enterprise will have many different types of files with different sources and origins. The question is how you start thinking about extracting insights and intelligence from all of this data as it exists and delivering value for your business. At a high level, this involves four steps.

1. Understand. Identify a solution that helps you understand what sort of data you have on hand. Unstructured data comes in many forms and can consist of text, images, audio, video, or incredibly complex 3D models of buildings, consumer devices or engines, to name a few.

2. Organize. Find an automated tool or service that tags this data so it can be indexed and organized according to its type, content or format. Basically, you want to make this data referenceable so you can make it easier for people or applications to leverage.

3. Liberate. If some portion of this newly organized and labeled data resides on traditional storage infrastructure or some other medium that limits its accessibility, you need to migrate that data set to the cloud. For data to provide the value and insights you are seeking, it needs to be free and securely accessible rather than tethered to a data center or outdated storage array.

4. Capitalize. Extract value from this newly tagged, accessible, discoverable data. The leading clouds offer machine learning (ML), artificial intelligence (AI) and deep learning (DL) services focused on image recognition, pattern matching, content search, compliance discovery and many other functions.
The tools on the market today are impressive and evolving rapidly. Rather than endorse a specific set of solutions, I’ll share how different industries are using this new approach to deliver real value to their business.

Pipeline operator TC Energy, for example, has to maintain extensive documentation relating to safety, maintenance and compliance. At one point, it had 2.2 million pages of text and diagrams that needed to be stored, labeled and rendered discoverable. In other words, these documents needed to be easy to find.

This sort of work, which probably would have required basements full of filing cabinets in the past, can now be digitized and performed in the cloud. TC Energy leveraged the image processing ML technology Amazon Rekognition from Amazon Web Services (AWS) to classify and label the digitized records and then deployed the AI service Textract to transform the handwritten letters into machine-readable words.

We worked with architecture leader Perkins+Will to help it leverage its cloud footprint in order to extract value from a treasure trove of historical images. When a design team begins a new project, it can search more than 500,000 drawings and photos. It can quickly uncover similar projects, or elements within projects, to find inspiration for their current work, or use these past examples as part of their pitch to clients, demonstrating successfully completed work in the same vein. What I find amazing is how the AI and ML tools from the hyperscale cloud providers do this work automatically.

The healthcare industry is overloaded with unstructured data in the form of emails, PDFs, scanned documents and medical images. AWS has designed a road map for healthcare organizations to put a suite of AI and ML services to work on unstructured data. Once PDFs are uploaded to the cloud, various AWS AI services can extract vital data, transform and categorize it for easy search and make it available to interactive analytics platforms. These tools are so new that organizations are still figuring out how best to use them.

We’re truly at the beginning of a new era of analytics, and there are boundless possibilities now that data is its own entity.

Ultimately, it’s simple: Data has evolved into a globally accessible asset that can be utilized in new and creative ways. The era of big data may be more than a decade old, but this new era of insights and intelligence that data delivers could prove to be even more transformative.

Related Posts

April 24, 2024 Nasuni Featured as Google Cloud Assured Workload Partner

Bobby Silva shares Nasuni’s efforts to aid evolving global data sovereignty requirements and compliance regulations in addition to being recognized as a Google Cloud Assured Workloads partner.

, , , , , , , , ,
April 22, 2024 The Surprising Environmental Impact of Hybrid Cloud Solutions

Lance Shaw shares insight on how switching to hybrid cloud solutions can be positive for both the enterprise and the planet.

, , , , , , ,
April 17, 2024 Three Impediments to AI Success

Andres Rodriguez shares why enterprises need to get fit for AI and the top factors prohibiting their AI success.

, , , , , ,