How To Ensure a Successful Big Data Proof of Concept?
How To Ensure a Successful Big Data Proof of Concept?
“Give me six hours to chop down a tree, and I will spend the first four sharpening the axe.“ This is how Abraham Lincoln described his habit of meticulous planning. But at Ellicium, we have taken this quote seriously when planning a Big Data implementation.
We have helped a number of clients, from conceptualization to Big Data Proof of Concept (POC) to production implementation of various Big Data use cases, mainly for streaming and Internet Of Things (IoT) data using our Ellicium’s Gazelle. A common factor in all successful Big Data Proof of Concept to Production journeys has been ‘planning well even before the POC.’
This is a routine journey for us now. Given that the solution has matured with each implementation. Learnings from each Big Data Proof of Concept have helped us make each Big Data POC more effective in terms of value delivered and readiness for production implementation. Some key lessons that we learned are:
Big Data projects can turn out to be one of the most multi-dimensional endeavors in the organization. In one of our implementations, the client’s legal team was required to comment on privacy issues involved when extracting customer data from the web.
Business users will constantly need to comment on whether additional investment in Big Data is resulting in a multi-fold increase in decision-making capability. The list can go on and on. All these decisions need to be orchestrated from an organizational perspective by a committee of senior executives.
Having a ‘Big Data governance council’ is a must. Based on our experience and learning, we recommend establishing a Big Data Governance Council. We recommend such a council and even clients set up the council so that the Big Data Proof of Concept adds value to the organization’s balance sheet and does not remain one odd technology initiative.
Do you really need a Big Data platform
We have often come across clients who do not actually have a Big Data scenario but are ‘lured’ by utopian promises of Big Data. We had a customer managing a greatly complex business with excel spreadmarts. Surely, they needed a data warehouse but not a Hadoop-based Big Data system. Evaluate your data volumes and potential growth and the ability of the existing technology stack to meet the demand.
Developing and managing a Big Data system is a big task that may consume a lot of your IT bandwidth. This is a common knowledge, but we have come across client situations where this basic fact needs to be reiterated.
Involve Big Data distribution vendor at the POC conceptualization Stage
Cloudera, DataStax, Hortonworks, IBM, MapR, Google – whatever distribution of Hadoop / Big Data you choose for the POC, involving these companies from the conceptualization stage is a must.
They greatly help review architecture hardware sizing and share experiences from past implementations. Although our velocity data solution has a recommended stack of technologies to use, we always consult our Big Data distribution partner for critical decisions. This has averted situations like underfed hardware, wrong choice of visualization tool (due to client insistence), etc
Consider available skills in your organization
Maintaining a Big Data application requires strong Linux skills (HDP from Hortonworks is an exception). Does your organization have the skills to maintain an environment of Linux machines? Do you have a system admin and programmers? One of our clients was a Microsoft shop, and we were implementing a midsized Hadoop cluster for their IOT data. We requested the IT managers to start hiring Linux system admin and programmers so that by the time the POC was completed, our client had a team to own production implementation.
Agile, agile, and agile
Big Data projects are risky and uncertain given that the technologies involved constantly evolve; user data needs may need to be crystal clear, and new findings may change the direction. Going by a ‘waterfall style’ project plan is the biggest mistake a PM for Big Data Proof of Concept can commit. Agile helps to change the focus as per the need of the hour. In one of the implementations after the first 2 weeks of the project, we knew performance would be a bottleneck.
Hence, we stopped all further work on data visualization and focused all our energies on tuning hardware and software for higher performance. This averted a ‘functionally correct but nonperforming’ product. Sometimes, the data being used will be of bad quality, and alternate sources will have to be identified. All this requires an agile methodology to survive and succeed.
Consider your visualization needs
Big Data implementation is more than just storing and churning of data. Results need to be provided to business users. And this needs to be done in the most effective manner. This is challenging, considering that many more dimensions of data come from a wearable device or mobile device streaming data. A couple of our initial implementations suffered regarding user satisfaction since we used traditional visualization tools and techniques. Then we realized that Big Data requires ‘newer and different’ visualization and started involving a ‘data artist’ from day 1 of the proof of concept. It really helped!
Commodity hardware is costly
So what everybody has heard about Big Data is ‘you throw in some commodity hardware and reap the benefits of distributed processing.’ At least, that’s what the perception is for some non-IT business executives. But remember, commodity hardware does not mean cheap / low-capacity hardware. In fact, for processing streaming data (which we do for a living), having a strong cluster of nodes consisting of at least 16-32 GB RAM and 1 GB connection is recommended. Be aware of these requirements before you commit to a Big Data Proof of Concept. One of our clients almost killed the Big Data project with a scrawny 4 GB RAM machine cluster and expected to process billions of records per hour.
Having proven value of Big Data during the Proof of Concept, your business users will always want a fast transition to production implementation. The points discussed above ensure that the bridge crosses with minimal hassles.