Does Machine Learning Require a Data Lake?

Does Machine Learning Require a Data Lake?

As the marketing machine cranks up for Industry 4.0 and the industrial internet of things (IIoT), leaders in many organizations are faced with a barrage of buzzwords and the associated high-cost solutions. For example, I was asked the other day if an organization needed a data lake in order to have machine learning. Machine learning is a functionality with many benefits, so let’s explain these concepts and how they work.

 

By: Tim White

The term data lake is used to define a place, usually off premises, that is used to store raw, uncleansed or unorganized data. It is a fundamental concept of data management. In contrast, a data warehouse is a repository of structured data that has already been processed and organized. With the continued rise in big data, the use of data lakes have become very common. While this is normally thought of as being synonymous with cloud storage, data lakes can also reside on-premise in a company’s own data center. What makes data lakes unorganized is the fact that multiple data sources are all stored in the same “place”, allowing for easier access for the end users.

"What makes data lakes unorganized is the fact that multiple data sources are all stored in the same “place”, allowing for easier access for the end users."

Defined at a high level, machine learning is a subset of artificial intelligence. It is a technique used to “teach” a computer to make decisions directly from data without using a predetermined calculation or algorithm. Whenever there is a complex problem involving large amounts of data, but there is no existing formula or algorithm, this is when machine learning should be considered. The key word here is “data” and the more you have, the better the performance. This is why the idea of machine learning is often connected with an assumed need for a data lake.

You can use machine learning to get real benefit now without a data lake. If analytics are being performed on a single data set, then you probably do not need a data lake for machine learning. A single computer or server, with the right resources, would be able to store data and perform the algorithms necessary to return the information requested. A good example that many organizations could quickly benefit from would be using control system data collected by a data historian to determine the remaining useful life of an asset. We can use high powered desktops to train and test newly developed models with this exact capability. There is far more opportunity in the data you are currently collecting than you realize. Additionally, starting with a simpler use case like this can help to build interest and enthusiasm in the organization for supporting larger efforts.

"A single computer or server, with the right resources, would be able to store data and perform the algorithms necessary to return the information requested."

Often, information can be much more enlightening when data is correlated from multiple applications across an enterprise and possibly even from the internet. With multiple data platforms and formats, a data lake may be required to store the information. Remember though that “data lake” does not necessarily equal large hosted cloud or data center solutions unless we are talking about large scale, enterprise initiatives.

Data lake is only a term that defines an approach to data management. Most IT departments have a policy in place for how data is to be consumed, stored, and backed up for the organization. A meeting with them to understand those policies, communicate your business requirements, and work out a solution collaboratively will go a long way to realizing success long term.

So, do you need a data lake in order to have machine learning? Not always. As I described above, you can get started with some very beneficial use cases to analyze your equipment health without a data lake. What is more, starting small is a great way for the organization to learn valuable lessons and build enthusiasm and support that will make larger implementations, such as creating a data lake, much more successful.

 

 

Zur Person

Tim White, Senior Manager

In seiner Rolle als Senior Manager bei T. A. Cook konzentriert sich Tim White auf die Bereitstellung von Dienstleistungen im Zusammenhang mit Digital Asset Performance Management. Zuvor war er in der Prozessindustrie als Global Director für Asset Management tätig und verantwortlich für 83 Standorte weltweit. Mit dieser Praxiserfahrung unterstützt er zahlreiche Kunden bei der Lösung ihrer Asset-Management- und Instandhaltungs-Strategien.

How to Manage Your Master Data

  October 2020 / Jennifer Adams , Marketing Coordinator

Master data refers to information shared across a company that is critical to business operations.

Schafft der Mensch sich ab?

  July 2018

Leere Fabriken und sich verselbstständigende Maschinen – wer allein dieses Schreckensbild der

Von der Vision zur
 Realität – intelligentes Asset-Management

  June 2020 / Alice Zhang , Senior Consultant

Asset-Management vereint viele Teilbereiche. Kein Wunder also, dass die Agenda der von T.A. Cook