Future of Big Data in 2018

Future of Big Data in 2018

The year 2017 was an interesting one in the Big Data world. Though the adoption of Hadoop as the Big Data platform moved marginally beyond the 50% mark per the Gartner survey report, the adoption focus has gradually shifted from IT-driven to Business-driven. This only means that organizations are seeing much value in investing in Big Data and moving ahead with deriving value.

I have been fortunate to be a part of the Big Data World for the last several years and closely observe the journey. It’s fantastic to see how a small toy elephant has become one of the most sought-after technologies and has taken several forms. I see the Big Data and Hadoop world moving in a particular direction in 2018 –

Think Big Data? Think Cloud!

Companies who haven’t joined the Hadoop bandwagon yet (slightly over 50% in 2017, as per the Gartner survey) will seek Hadoop deployments. However, in all likelihood, this will not be on-premise but on a cloud-vendor-of-choice. Typically, on-premise deployments are preferred when the cluster size is enormous.

Why do I say so?

“Lowering costs and coping with complexity will be the primary motivating factors for cloud-based Hadoop deployments in 2018”.

We migrated from an RDBMS based IOT application to the cloud in 2014, and going on the cloud was a pretty expensive proposition. When we compare costs for the same cluster configuration now, it is around 30%! Fascinating, isn’t it? I see the same trend in 2018 as well.

Getting started with Big Data is quick.

Cloud deployments offer flexibility in gearing up for Hadoop without the time and overheads related to procuring, provisioning, and setting up the infrastructure.

We have spoken with several customers in 2017, and a common trend is that they don’t want to wait for weeks or months for their infrastructure team to provide the hardware and software. This is especially true about SMEs. For one of our Manufacturing clients, we proposed migrating their ERP data to the Google Big Query platform, and the proposal was readily accepted by them, thanks to the enormous flexibility and low cost of the forum.

Familiarity with the cloud is increasing.

Due to various other reasons, infrastructure teams of organizations are more cloud-aware and understand that Hadoop can be set up on the cloud. This makes a cloud setup particularly attractive to them.

Flexibility

Without the undue pressure of correctly sizing their first Hadoop cluster, organizations can make a beginning with Hadoop, try various use cases, and gradually get to know Hadoop better. Instances can be shut down during non-working hours, making it cost-effective.

We are working with a client offering Insurance Solutions, and the first step they wanted was a cloud-based setup for their in-house team to try stuff and get used to the ecosystem. Very convenient indeed!

What does this mean?

Along with leading cloud platform vendors like AWS, Azure, and Google, other cloud-based vendors who offer cloud deployments of various Hadoop distros like Cloudera, Hortonworks, and MapR will also see lots of traction and demand in 2018. Organizations have their cloud vendor of choice and will prefer to embark upon the Hadoop journey with them.

We are in touch with several organizations still waiting to begin their Hadoop journey.

“Convenience of setting up Hadoop on the cloud will encourage SMEs to start their Hadoop journey in 2018.

Taking Hadoop to Production? Plan carefully

Why do I say so?

Gartner’s research shows that while investment in big data continues, the move to production has remained flat. Gartner estimates that “roughly 14% of Hadoop deployments are in production.”

Focus on Security and Governance is high before taking anything to production. There seem to be several gaps in these offerings provided by various Hadoop vendors. As a result, custom third-party solutions are needed.

What does this mean?

Security solutions, such as Sentry, Ranger, and Knox from various Hadoop distros like Cloudera and Hortonworks, will increase interest and adoption. They will increasingly mature and offer the security considerations a production-ready application needs.

“2018 is when organizations will get the increased confidence to move their Hadoop systems to production”.

Organisations will start insisting on Security in Hadoop right from the early stages.
For one of our Insurance clients, we are implementing security from the POC stage. Clients who have already taken Hadoop to production have also asked us for a roadmap for demonstrating, testing, and implementing safety.

“Hadoop will be forced to become magnanimous and accommodate supporting frameworks in 2018.”

Why do I say so?

  1. Performance in some Hadoop-related use cases that require running heavy interactive queries by concurrent users is different from expectations, limiting its suitability for decision support cases.
  2. Spark supports stream analytics and interactive querying (Spark Streaming). Additionally, the support for multiple areas like SQL language support (Spark SQL) and Machine Learning (MLlib) has made it even more popular.
  3. For most of the current Hadoop implementations we are doing for our clients, we use Spark as the standard processing engine.

What does this mean?

Though not an official part of the Hadoop ecosystem, integrating Spark within a cluster will become a de facto standard.

“In 2018, Spark will continue to be the processing engine of choice for Hadoop systems, replacing even some of the MR jobs in production”.

“In 2018, Focus on the third “V” (Variety) of Big Data will take a boost.”

There will be an increasing demand for implementing Data Lakes using Hadoop and deriving helpful business insights. With its inherent support for the storage and processing of all forms of data, Hadoop will remain the favorite for Data Lakes.

Why do I say so?

  1. All these years, analytics was mainly restricted to the structured side of data. Data warehouses and BI work well on structured data but need help with the dimension of unstructured data. Due to this, a lot of advanced business insights are simply not possible
  2. Focus on the third “V” (Variety) of Big Data is still missing to a great extent. While companies successfully process structured data on the Hadoop platform, the same cannot be said about unstructured data. As a result, integrating unstructured data will be one of the biggest reasons for Big Data adoption

We are seeing some fascinating Use Cases:

  1. We are currently implementing a Hadoop Data Lake for the IT arm of a Financial Services organization. This involves establishing a direct link between the transaction level data (for example, customer loan-related entries) and the supporting documents (Loan agreement in pdf format). They are seeing a lot of business value in such analytics.
  2. Another client of ours is planning to derive insights from images generated by body scanners. Apart from the volume of data, the complexity posed by the unstructured data is huge.

What does this mean?

Organisations will increasingly ask for unstructured (documents, images) data processing. Also, exciting use cases that were impossible earlier will start becoming feasible. As a result,

“Technologies and Platforms with demonstrated capability in unstructured data processing and analysis will be in great demand in 2018”.

It will be interesting to see how 2018 turns out for all of us. I plan to revisit the above towards the end of 2018, and hopefully, I’ll be right!

I look forward to your comments and views. Please comment on the above or share your opinions with me at [email protected].