Text Analytics: Superpower to harness unstructured Big Data

Text Analytics Superpower to harness unstructured Big Data

The first message was sent over the internet on October 29, 1969. It was between UCLA professor Leonard Kleinrock, his student and programmer Charley Kline, and Bill Duvall at Stanford Research Institute. They intended to send “Login” as a message, but the system crashed after typing “Lo.” So “lo” was the first message sent over the internet. By 2019, the number of emails sent over the internet will be 246 billion. It seems we have come a long way!

2.4 million emails are sent every Second.

Computers are shrinking, and the internet is growing, generating data everywhere. In just one second, 43,281GB of Internet traffic is flowing. By 2020, the digital universe will be 44 ZB (trillion gigabytes), and up to 90% of its data will be unstructured. And lots of this data is of textual nature. Sounds exciting? But without analytics, this data is just data. Hence, the onus is upon text analytics to deliver the value out of this data.

Defining Text Analytics

Text analytics is semi-automatically aggregating and exploring textual data to obtain new insights by combining technology, industry knowledge, and practices that drive business outcomes. Text analytics is a set of linguistic, analytical, and predictive techniques.

History, Current State, and Massive Future Opportunities

The first definition of business intelligence (BI), in an October 1958 IBM Journal article by H.P. Luhn, ‘A Business Intelligence System’, focused on text analytics. However, as BI emerged as a practice in the 80s and 90s, the focus was on numerical data stored in relational databases. It was hard back then to process text in documents.

Since its first emergence as “text mining “in the 1990s, text analytics has evolved a lot and continues to evolve.

Though text analytics is an old problem, it has just started gaining more significance due to exponential growth in unstructured data generation. Text analytics can be a business game changer combined with structured data analysis. Text analytics can go beyond numbers and unveil unseen insights.

For example, analysis of email exchanges between customers and the support department can help businesses improve on the service front while also helping to predict customer mood.

As per Allied market research, by 2020, the text analytics market is expected to reach $ 6.5 Billion. In Forrester’s recent Big Data Text Analytics platform report, the text analytics market is dominated by leaders like IBM, Clarabridge, SAS, HPE, Attivio, Digital Reasoning, and Linguamatics. IBM and Clarabridge are leading with huge market share due to their broader feature offerings. Going by the same report, major companies are making clear moves showing the importance of text analytics. For example, IBM is pushing its Watson platform hard while making significant acquisitions. At the moment, it seems market leaders are beyond the market thanks to the full feature platform and the acquisition of rivals. At the same time, some open-source severe tools are making their presence felt.

Big Data Text Analytics Platforms, Q2 ’16

Big Data Text Analytics Platforms, Q2 ’16

Big Data Text Analytics Platforms, Q2 ’16
Image source: Forrester Wave™ Big Data Text Analytics
Platforms, Q2 2016

During my research, I found that the following are the ideal text analytics platform features:

Scalability (Huge amount of data from multiple sources)
Flexibility (Integration of multiple text mining functions)
Efficiency (Target relevant Data)
Quality (Accuracy of results)
Costless (Time / Cost / Development)

From a user perspective, the following are the desired features:

Ease of use tool
Intuitive tool
Good interface
Accuracy of results (customizable tool)
Guide by prebuilt categories

Following are the broad features that should be there in the text analytics product:

Document Classification
Concept Extraction
Information Extraction
Information Retrieval
Web Mining
Sentiment Analysis

Text Analytics: Superpower

By 2020, Retail, FMCG, BFSI, and Healthcare will be major adopters of text analytics platforms. They will use it mainly for Customer Relationship Management, Brand Reputation, and to gain competitive intelligence.

As mentioned before, most of the unstructured data is in textual format. As per the accepted maxim, 20% of data is analyzed by organizations, and it is in structure. Imagine the possibility of organizations analyzing 80% of their remaining unstructured data in textual format. Text analytics can empower organizations to get deeper and more significant insights. Whether social media listening, connecting with customer experience, or extracting cancer-related from pathology or radiology reports, text analytics can leverage to unleash the potential of available data.

Text analytics can be used to learn more about customer behavior; this intelligence would help significantly improve customer loyalty and increase sales. Just recently, Facebook announced the availability of Topic Data, which uses text analytics to reveal what audiences are saying on Facebook about events, brands, subjects, and activities.


At Ellicium, our unstructured data analytics tool ’Gadfly’ helped lead LPO in automating their document classification work based on text analytics principles. This solution would save 85% of human effort and result in multi-million-dollar cost savings over a 3 years period.


Social media is another huge opportunity for businesses to employ text analytics to listen to customers’ conversations in real-time and act upon them immediately.
With a variety of tools available in the market, any business, small or big, can start using text analytics and capitalize on it.