Yes, I Can Live With Small Data!

A couple of days back, I read an interesting article “Big Data Doesn’t Exist” from TechCrunch.

In this article, Slater Victoroff argued that most of the companies do not have any real “big data.” And even if they do, it’s not of much use to them either because of lack of quality of the data they have or inadequate processing capacities to make sense out of it.

This nicely put article seems reasonable to this point. However, it also raises many questions about the information sources and the assumptions for the number crunching used to draw certain conclusions.

Despite this problem, the article did introduce me to two great ideas – one of them is “small data”.

A quick help from Google revealed a comprehensive definition from Small Data Group, “Small data connects people with timely, meaningful insights (derived from big data and/or local sources), organized and packaged – often visually – to be accessible, understandable, and actionable for everyday tasks.”

So basically, “small data” is either what we get when we process “big data” or smaller sets of data that we use in our day-to-day life. Fuel prices or temperature or currency exchange rate for any particular day are some of the examples.

The second idea that inspired me to write this post was about taking full advantage of the “small data” that we have, instead of employing our limited resources to acquire, manage and process “big data.”  Slater proposed that by applying “transfer learning” techniques, an idea borrowed from Lisa Torrey and Jude Shavlik, and suitable algorithms, companies can use the knowledge gained in one area to learn and better understand related areas.

One fascinating example, shared by Slater is the way MetaMind is applying artificial intelligence to their Text Classifier product.

This product can substantially help in solving any text classification problem. An existing classifier from around 200+ classifiers, created by other users, or a new classifier can be trained on the fly, using user-provided labeled data sets.

Their showcase application, Twitter Sentiment lets anyone find out the opinions and feelings within the tweets of their friends, favorite personalities or against a particular trend. You just need to provide the hashtag and the application does the rest of the work for you.

How can artificial intelligence be applied to “small data” to solve our day-to-day problems? You can train the classifier with your customers’ email excerpts. Then, you can quickly sort your daily burst of email, as they arrive, for corrective action within a fraction of the time. As a result, removes emails that do not require any action. You will need up to 1000 samples per class – to improve the accuracy of its predictions.

Obviously, it may not be the first or only implementation of this nature, but definitely one of few out of the box solutions.

The possibilities are endless, and you are free to test your imagination!

The Text Classifier is freely available for experimentation. However, you might consider paying for it, if performance is a primary consideration at any stage.

The API is also available for integration with third-party applications.

The above image is an adaptation from the original “Data” that is copyright (c) 2012 by CyberHades and made available through Flickr under an Attribution-NonCommercial 2.0 Generic (CC BY-NC 2.0) license.

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s