The 10 Coolest Big Data Startups Of 2014

Big Data, Cool Companies

Few segments of the IT industry have attracted as many entrepreneurs in recent years as big data has. There has been a surge of new companies developing technologies for collecting, storing, managing and analyzing the terabytes and even petabytes of data being generated today by businesses and consumers.

Here are 10 startups and emerging companies in the big data arena that particularly caught our attention this year.

Aerospike

Founder and CTO: Brian Bulkowski

The battle is on among companies backing competing next-generation databases, including MongoDB, CouchDB, Redis and others. Aerospike, founded in 2009 and based in Mountain View, Calif., develops a real-time, flash-optimized NoSQL database for running high-performance applications.

The in-memory database meets ACID (atomicity, consistency, isolation and durability) requirements for reliable transaction processing. In November Forensiq, which provides a fraud detection service for online advertisers, said it was using the database to process 1 trillion bid requests per month.

Earlier this year Aerospike raised $20 million in Series C funding. In December the company boosted its database's capabilities with performance and storage enhancements, new functions and Hadoop integration.

Altiscale

CEO: Raymie Stata

Hadoop was hot in 2014. But the big data platform is still very complex and difficult to work with. That's why Altiscale and its Hadoop-as-a-Service offering have attracted so much attention. The Altiscale Data Cloud is a purpose-built, petabyte-scale infrastructure that delivers Apache Hadoop as a cloud service, coupled with operational support and expertise.

In October the Palo Alto, Calif.-based company, founded in 2012, began offering a SQL-on-Hadoop service, making it possible to access Hadoop data through a SQL user interface or API.

Altiscale's top leadership, including CEO Raymie Stata and CTO David Chaiken, came from Yahoo, the birthplace of Hadoop. So these guys know what they're doing. Earlier this month the company snagged $30 million in second-round financing.

Databricks

CEO: Ion Stoica

One of the hottest big data technologies in 2014 was Apache Spark, the in-memory data processing engine that turbocharges big data systems like Hadoop. The open-source software grew out of a development project at the University of California, Berkeley.

Databricks offers a Spark-based platform (launched in June) for big data tasks including data transformation, exploration and analytics. But the company, founded in 2013, is much more than just another startup piggy-backing on a popular open-source technology. Databrick's founders, CEO Ion Stoica (a U.C. Berkeley computer science professor) and CTO Matei Zaharia, actually created Spark and are leveraging their expertise with what they call a one-stop shop for big data software.

DataStax

CEO: Billy Bosworth

DataStax is another leading player in the next-generation database market. The company has thrown its weight behind Apache Cassandra, the highly scalable, fault-tolerant NoSQL database used by Cisco Systems, eBay and Twitter.

DataStax sells DataStax Enterprise, a commercial edition of Cassandra, and tools and support services for the platform. Earlier this month the company, a Gartner magic quadrant "visionary," debuted DSE 4.6 with new Spark streaming analytics capabilities the company is touting as just the thing for the nascent Internet of Things market.

DataStax, founded in 2010 and based in Santa Clara, Calif., raised an eye-popping $106 million in Series E financing in September for a total of $190 million.

DataTorrent

Co-Founder and CEO: Phu Hoang

DataTorrent develops an enterprise-class, real-time streaming analytics platform that provides a way for users to process, monitor, analyze and act on data instantaneously. The company says its Hadoop-based DataTorrent RTS system is capable of processing hundreds of millions -- even 1 billion -- events per second.

For many businesses one of the biggest challenges with big data is latency -- or working with streaming data such as stock trades. Co-founder and CEO Phu Hoang, who helped oversee the development of Hadoop while at Yahoo, saw the potential for a product like Data Torrent RTS. The Santa Clara, Calif.-based startup, launched in 2012, debuted DataTorrent RTS in June and followed that up in October with RTS 2.0.

Qubole

Co-Founder and CEO: Ashish Thusoo

Qubole's Hadoop-based Qubole Data Service is a cloud platform that businesses use to store and manage huge volumes of structured and unstructured data and tap into that data for analytical chores and other tasks. You might say Qubole is self-obsessed: The company's focus is making its platform as self-managing as possible with auto-scaling and other built-in management capabilities; and self-service with user-friendly interfaces and data integration functions that don't require a data scientist or programmer.

CEO Ashish Thusoo and Qubole India manager Joydeep Sen Sarma founded Qubole in 2012. Previously they worked together managing Facebook's data infrastructure team and its massive data processing needs, then they contributed to the development of the Apache Hive data warehouse infrastructure project for Hadoop. Earlier this month the Mountain View, Calif.-based company raised $13 million in Series B financing.

Snowflake Computing

CEO: Bob Muglia

Snowflake Computing came out of stealth mode in October, debuting the cloud-based data warehousing services it's positioning as a more flexible, easier-to-manage alternative to complex and often very expensive on-premise data warehouse systems. Snowflake's Elastic Data Warehouse, currently in beta, also will be competing with the likes of Amazon Web Services' Redshift and Google's Big Query.

CEO Bob Muglia says the Snowflake Elastic Data Warehouse service can operate at 90 percent lower cost than on-premise data warehouses, but is easier to use than the AWS and Google competitors. The San Mateo, Calif.-based startup went so far as to develop its own database system that can handle both structured and semi-structured data.

SumAll

CEO: Dane Atkinson

SumAll operates on the idea that big data analytics isn’t just for big businesses. The New York-based company provides an online analytic platform that collects data from a business' ecommerce, email marketing, social media and online traffic systems, as well as advertising systems like Google AdWords, and displays it in one interactive "visualization" interface.

Because of its emphasis on ease-of-use, SumAll, founded in 2011, sells to small and midsize businesses as well as big companies. Some statistics from the company's website include 233,000 SumAll users, $14 million in venture funding, an average employee age of 32.6 years, and the consumption of 22 kegs of beer per year.

Tamr

Co-Founder and CEO: Andy Palmer

One problem with big data is that, well, there's just so much of it, usually from many different sources. Oh, and it's constantly changing.

Tamr's software utilizes machine-learning technology to create a single view across all those sources, providing a business with a complete inventory of its data assets and finding connections among disparate data sets. The company's technology grew out of initial research at MIT's Computer Science and Artificial Intelligence Laboratory.

Database luminary Michael Stonebraker, along with industry veterans Andy Palmer and Ihab Ilyas, founded Tamr in 2013 and formally launched the company in May of this year. At the same time the Cambridge, Mass.-based company raised $16 million in venture financing.

WibiData

Co-Founder and CEO: Christophe Bisciglia

Everyone's familiar with Amazon.com's ability to provide a highly personalized experience to online shoppers. San Francisco-based WibiData, founded in 2010, develops real-time applications that help any business do just that.

The company's WibiEnterprise platform uses advanced analytics to provide customers with recommendations, personalized content and relevant search results. The platform is built on a number of open-source Apache technologies including Hadoop, HBase and Cassandra, as well as the Kiji open-source project framework for collecting, analyzing and serving data in real time. Earlier this year the startup released WibiRetail, the company's first out-of-the-box application for retailers.

Investors include Google Chairman Eric Schmidt and Cloudera founder Mike Olson.