Big Data and Technology Today: Are We Ready for AI Yet?
The State of Data
Successful AI requires two basic components: 1) massive amounts of data and 2) data science, i.e., algorithms that can extract knowledge from the data. So where do we stand today with these two requirements?
Requirement #1: We Have Massive Data
In contrast to internal or proprietary company data, Open Data initiatives are organized worldwide by private, public, and government entities to make specific datasets available for anyone to use, re-use, or redistribute. An Open Data project aims for developers to use the data to produce new and valuable products that create a demand for more data, thus creating a feedback loop of data improvement in quantity, quality, and use. Essentially, if you can say it, do it, photograph it, or measure it, someone is likely collecting data about it. There is infinite knowledge to be gleaned from all the data in the world, so it might be surprising to hear that only 1% of a company’s data is accessible for use, on average.
Examples of Data Categories
Data Types
Structured | Quantitative and tabular data, with columns and rows that are clearly defined. | Ex: names, dates, geolocation, credit card numbers, stock information |
Unstructured | Raw data in any format that could contain anything. | Ex: images, videos, .pdf documents, social media comments, transcriptions |
Semi-structured | Data that doesn’t consist of structured data, but still conforms to some structure. | Ex: email, HTML, NoSQL databases, CSV, XML, JSON documents |
Data collection methods
Batch | Streaming | Real time |
Data collection sources
Requirement #2: We Have Good Data Science
In the 1960s, mathematicians formally recognized the significant value of accessing, understanding, and extracting meaning from data. The field of data science is rooted in this realization, so it began with data analytics. Over the years, analytics has grown and evolved – AI was born from this evolution. The critical difference between data analytics and AI is that machine learning algorithms can test, iterate, and learn autonomously.
Evolution of Data Analytics:
The data science industry is brimming with thousands of companies that offer enterprise customers AI capabilities ranging from niche open-source technologies to one-stop-AI shops. As a result, AI is the fastest-growing software market in history and will facilitate more than $13 trillion annually, according to McKinsey in their “The State of AI in 2020 Report.”
These logos are a random selection of only a fraction of the companies offering products and services in the data and AI industry. For a truly overwhelming view of hundreds more, broken down by category and accompanied by analysis, see Matt Turck’s yearly data and AI ecosystem assessment in his “State of the Union” Report.
Get our complete Feature-First Field Guide to continue reading.