Data Lake For Beverage Company In Canada

About the Client

North America’s most diversified and successful private beverage company focused on the alcohol beverage sector.

Business Requirement

  • Create a data lake to collect data from a variety of sources and make it available for analysis.
  • Unstructured data contained in PDF/Doc/Docx should be quarriable and searchable as well.

Our Solution

  • To import SQL Server source data and PDF/DOC files onto HDFS/Hbase, an ingestion framework was built.
  • Python was used to extract valuable information from unstructured data sources.
  • Data files were transformed to an efficient format using Spark in order to optimize storage.
  • Cloudera search was enabled on the documents using Apache Solr.

Solution Architechture

Data Lake For Beverage Company In Canada

Business Outcomes

  • There was quick and easy access to information from a variety of sources.
  • Converting to an efficient format, storage was increased by 50%.
  • To capture and search the data, the speed was doubled.