Consider these statistics: 90 percent of the world’s data was created in the last two years and 85 percent of records keepers use only 20 percent of the data they have. Where is all that data coming from and what are we doing, and apparently not doing, with it?
Let’s consider the first question. Data today comes in three forms.
- Structured data is information in its most elemental form. It is discrete pieces of rather self-explanatory values and descriptions such as names, dates, blood type, ethnicity, marital status, sex, income and so forth.
- Unstructured data often contains the same sort of information but much more training is required to discern it. Examples include languages, pictures, writing, movies, recordings, and x-rays to name a few. Understanding some of these involve skills acquired very early in life, while entire careers are devoted to others. Either way, a great deal of learning is involved.
- Semi-structured data can be thought of as an amalgam of both. Invoices, patient records, loan applications, insurance claims…etc, where the same discrete data points are compiled in myriad different formats often with unstructured components requiring some skill and experience to make use of all the information available.
Considering what all this different data is used for may help these rather unstructured definitions. Structured data is the go-to resource for the vast majority of fact-finding, analysis, and process improvement. It has been around since long before computers in all manner of different graphics, company ledgers, and standardized forms: think IRS and Census, roll calls and elections.
Unstructured data represents the overwhelming majority of our present-day data. Some folks estimate it will comprise 80 percent or more of all data by 2025. Quite the opposite of structured data, its amorphous nature makes it more difficult to parse for actionable information.
Manhattan’s 33rd and 3rd. Unstructured data is a little hard to make sense of.
The sheer volume of the unstructured data in existence today and the asymptotic rate at which it’s being generated has sparked intense focus from software engineers eyeing equally enormous potential from putting it to use: Think self-driving cars, early cancer detection, generating marketing content, news writing, and even professional sports.
But what about right now, and those 85 percent making full use of only 20 percent of their data? If you are a records manager in any enterprise you are sitting on a gold mine that is your semi-structured data. The reason so little of this data is utilized lies within its name.
Take, for example, the myriad different kinds of invoices manufacturers process each month. Each contains much of the same information scattered about in different locations within invoices of every shape, size, and configuration.
But rather than just using those invoices to pay bills, imagine instantly analyzing every fact and figure contained in all of them across all kinds of variables, like vendor, date, price, quality control, and delays. Such analysis can yield priceless operational insight but for those “85 percent” only at the expense of the time required to manually compile those figures from within all these different documents.
Semi-structured data: Doctor’s patient visit form.
Therein lies the rub records managers must resolve: How to turn all that unused structured data within our semi-structured data—those facts and figures—into actionable intelligence; how to build a system to automatically ferret out the information needed to conduct all these analyses from all manner of different documents.
This is where artificial intelligence and machine learning come into play. Just as CIOs the world over are looking for AI systems to make greater use of the mountains of accumulating unstructured data, Laserfiche engineers have designed bots and process automation systems to extract more actionable intelligence from our semi-structured data.
Those systems along with our other more established products like Quickfields give records managers the tools they need to start mining that gold from semi-structured data. It starts with the designation of metadata fields to create indexing systems to lend more structure to your semi-structured data. The more metadata you create the more detailed your index and the easier it is to find and compile actionable information buried within your semi-structured database.
Think of that indexing of your semi-structured data as the learning process needed to understand unstructured data that’s driving the AI industry, only the tools to do so exist right now. For example, we have a Southwest high-rise construction company using Laserfiche to automate processing and speed indexing of job site safety incident reporting forms. A Midwest pizza chain is using the same tools to search out inefficiencies and cost savings in the AR and AP records of each of its 200 franchises.
While the IT world is abuzz with the promise of AI and machine learning turning our growing mountains of unstructured data into systems that can drive our cars and detect cancer long before it’s lethal, records managers can start turning our 80 percent on unused data into process improvement and enterprise innovation. Let’s start putting all this data to work.