Infochimps has tamed the Big Data elephant, and now it wants to hand the reins to you.
The Austin, Texas company has rolled out its internal Big Data management tools as a product. Infochimps Platform makes it easier to build Big Data stacks made up of Hadoop (known by its elephant symbol) and a variety of database systems, including HBase, Cassandra and MySQL.
Infochimps originally built the tools to manage its own data-set marketplace. Infochimps Data Marketplace offers 15,000 data sets from 200 providers. The data sets range from OkCupid dating service questions and answers to Northern Ireland Neighborhood Information Service info. Infochimps uses the platform to serve terabytes of data.
Building and operating Big Data systems involves coordinating rapidly evolving technologies and getting them to work at large scales. This was a major pain point for Infochimps, says Flip Kromer, the company's CTO. "Big Data operations is just brutal."
The Infochimps platform pulls data from the web, external databases and the Infochimps Data Marketplace, organizes it in a database management system, and processes the data in an on-demand version of the Hadoop open source data analytics tool. The heart of the platform is Infochimps' Ironfan management system. Ironfan connects the platform’s components and makes it easier to schedule and configure workflows.
Big Data applications are all about crunching huge amounts of data from large science experiments or data mining applications. A Big Data workload typically runs on large server clusters. It can also run on virtual machines in the cloud such as Amazon Web Services. Ironfan simplifies and speeds "spinning up" and "spinning down" these virtual clusters.
The Infochimps Platform reduces the technical knowledge required to build Big Data systems. The platform provides visual tools to connect and control the pieces of the system. "It's your system diagram come to life," says Kromer. "The thing you would draw on the chalkboard, that's the level of description you need to give."
The platform allows data scientists with minimal programming skills to build the Big Data systems they need on their own, says Kromer. "We built tools that made Hadoop and these things really easy to use, which let us do things like recruit data scientists straight out of college or, in fact, often before they graduated.”
Infochimps is aiming the product at small startups and small units within enterprises -- Big Data users who don't have a lot of resources. It’s also offering a more hands-on service for enterprises. The Infochimps Platform is available as a on-premises system or platform-as-a-service.
Infochimps platform customer BlackLocus provides a service to e-commerce vendors that monitors the Web for competitors' pricing.
Going forward, Infochimps plans to automate the process of building data flows. Instead of connecting the components of the system, a user would focus on moving data using interactive tools. "Ultimately you're taking a data flow, constructing it with visual tools, and binding it to the stuff that executes it and the resources it runs on," says Kromer.