Matt Casters, the lead developer of the open source data integration tool Kettle announced that Pentaho is going to open source all Kettle plugins related to big data today. You can now access quite a bunch of your NoSQL databases using Kettle without the need to spend a lot of money or to start developing a plugin on your own.
The plugins look quite promising because all of them can read, write or query data from a NoSQL database which means that they can be used to transform data stored in a relational database to a NoSQL database and vice versa. I think it would be a ideal playground to start with NoSQL by just transforming your own data into Hadoop or Cassandra.
The big data plugins support reading/writing Data from/to:
- Apache Hadoop (including support for HDFS, Map/Reduce, Hive, Pig)
- MapR (which is Hadoop with some tools to maintain the cluster)
The big data plugins will be part of the standard Kettle release starting with Kettle 4.3 which is planned to be released end of march. If you want to start playing around with it just download the Kettle 4.3 preview using the pentaho wiki: http://wiki.pentaho.com/display/BAD/Downloads
Matt also announced that Kettle 4.3 is the first release that will be released under the Apache License 2.0 making it easier to integrate it with 3rd party software (Kettle is using the LGPL for now).