If you’ve spent any time in the data science and/or IT space in the last few years, you’ve been inundated with the term “big data.” Heck, if you’ve spent any time even remotely associated with data (e.g., marketing, operations, finance, etc) you’ve probably been overly exposed to the term. You’ve also have most likely heard quite a few things about big data that have kept your organization from incorporating all that it has to offer into your processes and culture.
All of those truths that you’ve heard about big data? They aren’t (usually) truths but myths that somehow keep making the rounds within organizations and around the internet. A few of the key myths that I hear often are:
- Big data is only for big companies
- Big budget, big teams, big platforms
- More data is better
- Machines are better than humans
Let’s take a look at each one of these four myths and see why they still exist – and why you can safely ignore them.
Myth 1: Big data is only for big companies
This myth is similar to Myth #2 (big budget, big teams, big platforms) but both deserve to be discussed (and busted) separately. Big data initiatives are just as valid for a small company as they are for the world’s largest companies.
As a small organization, the data you have may not be as large as some others, but the approach to analyzing and utilizing that data remains the same. Whether your organization has 20, 200, or 2,000 people, you can use the same processes and platforms that Amazon uses to analyze your data to cut expenses, boost sales, increase revenue and create new innovative avenues for growth.
Myth 2: Big Budget, big teams, big platforms
As I mentioned above, this myth is usually found mixed in with Myth #1. Over the years, I’ve heard people say they can’t “do big data” because they “just aren’t one of those large companies with huge budgets.” The reality is that it doesn’t take much to get started in this space. I know a company with less than 10 people that has revenue in the low millions and they have one of the most sophisticated data science pipelines and processes I’ve ever seen. They are “doing” big data just fine without the large teams, large budgets, and large platforms.
The ability to spin up cloud-based systems today makes this extremely easy and cost-effective for any organization. This particular 10-person team did have to spend a bit of money to get things started and to re-skill some of their employees to take full advantage of their data, but the investment wasn’t large and has been paid back many times over since.
Myth 3: More data is better
When people first start looking at the amount of data they’ve collected, it’s not unusual to think, “Hey! We should use *all* the data!” While I applaud the enthusiasm, just adding more data isn’t always the answer (rarely is it the answer at all).
It’s easy to think that by adding more variables to a forecasting project that you can get more accuracy, but that isn’t always the case. Sometimes better accuracy leads to less trustworthy models because those models may be over-optimized for one specific set of data, which might lead to catastrophe if the data features change over time.
More data isn’t always better but good, clean data is. Rather than continuously look for more data to add to your systems and processes, start looking for ways to make your existing data processes better. More data can mean more risk, especially if you aren’t constantly reviewing your models, your data governance, and data processes. In most cases, you’ll find more value from investing in good data management than from investing in more data.
Myth 4: Machines are better than humans
One of the challenges I’ve been seeing lately is that many people think they’ll be able to take new approaches like machine learning and artificial intelligence and apply these approaches to their data and things will be automatically better. Some people wrongfully assume that humans can only do so much and machines can do so much more: This is so far from the truth that its dangerous. Machines can only do what a human tells them to do. While there is some research out there trying to develop thinking machines, there’s nothing like that available to organizations today.
A machine learning model will only be as good as the data scientist(s) build it and the data set(s) that were used to create it. There will always be a need for humans in the equation, at least for the foreseeable future.
The urge for more
When dealing with most things in life, people are usually pretty good at determining what is plausible by using their common sense and experience to gauge whether something they hear/read is correct or not. This approach works well until it doesn’t. My experience has shown me that it doesn’t work that well when it comes to big data, AI, and machine learning. Most people’s intuition is wrong (e.g., more data must be better). The company trying to sell a multi-million dollar big data platform for large organizations will always say that you have to have big budgets, big teams, and big platforms for big data. The company selling machine learning and AI consulting services will always try to sell you on “more” because that’s how they make their money and stay in business.
Don’t get me wrong, making money isn’t a bad thing and I strongly advocate that companies hire consulting services to do things that they don’t have the skill to do. But the reality is that you don’t need a lot of data, a huge budget, or large platforms to find value in your data. All you really need to get started is curiosity, some analytical skills, and some good, clean data.