As the marketing machine cranks up for Industry 4.0 and the industrial internet of things (IIoT), leaders in many organizations are faced with a barrage of buzzwords and the associated high-cost solutions. For example, I was asked the other day if an organization needed a data lake in order to have machine learning. Machine learning is a functionality with many benefits, so let’s explain these concepts and how they work.
By: Tim White
The term data lake is used to define a place, usually off premises, that is used to store raw, uncleansed or unorganized data. It is a fundamental concept of data management. In contrast, a data warehouse is a repository of structured data that has already been processed and organized. With the continued rise in big data, the use of data lakes have become very common. While this is normally thought of as being synonymous with cloud storage, data lakes can also reside on-premise in a company’s own data center. What makes data lakes unorganized is the fact that multiple data sources are all stored in the same “place”, allowing for easier access for the end users.
Defined at a high level, machine learning is a subset of artificial intelligence. It is a technique used to “teach” a computer to make decisions directly from data without using a predetermined calculation or algorithm. Whenever there is a complex problem involving large amounts of data, but there is no existing formula or algorithm, this is when machine learning should be considered. The key word here is “data” and the more you have, the better the performance. This is why the idea of machine learning is often connected with an assumed need for a data lake.
You can use machine learning to get real benefit now without a data lake. If analytics are being performed on a single data set, then you probably do not need a data lake for machine learning. A single computer or server, with the right resources, would be able to store data and perform the algorithms necessary to return the information requested. A good example that many organizations could quickly benefit from would be using control system data collected by a data historian to determine the remaining useful life of an asset. We can use high powered desktops to train and test newly developed models with this exact capability. There is far more opportunity in the data you are currently collecting than you realize. Additionally, starting with a simpler use case like this can help to build interest and enthusiasm in the organization for supporting larger efforts.
Often, information can be much more enlightening when data is correlated from multiple applications across an enterprise and possibly even from the internet. With multiple data platforms and formats, a data lake may be required to store the information. Remember though that “data lake” does not necessarily equal large hosted cloud or data center solutions unless we are talking about large scale, enterprise initiatives.
Data lake is only a term that defines an approach to data management. Most IT departments have a policy in place for how data is to be consumed, stored, and backed up for the organization. A meeting with them to understand those policies, communicate your business requirements, and work out a solution collaboratively will go a long way to realizing success long term.
So, do you need a data lake in order to have machine learning? Not always. As I described above, you can get started with some very beneficial use cases to analyze your equipment health without a data lake. What is more, starting small is a great way for the organization to learn valuable lessons and build enthusiasm and support that will make larger implementations, such as creating a data lake, much more successful.
Two Life-Hacks for a Digital Asset Management Strategy
Recent studies have shown that ~80% of digital transformation strategies fail to meet their objective and therefore never scale beyond their original pilots. In this whitepaper industry expert Tim White will discuss how digitalizing asset monitoring and creating predictive analytic models is bringing enormous benefits to those companies making the investment.Download the Whitepaper