Of particular value to our environment and applications are the following Greenplum capabilities:
- Scalable (Massive) Parallel Processing (MPP) – The ability to bring to bear large amounts of compute against large data sets with Greenplum and the EMC DCA has proven itself to be very effective.
- Fast load of data into Greenplum – We experience performance of approximately 1TB per hour loading data to Greenplum without the use of specialized hardware.
- MADlib (madlib.net) – There are a number of statistical and analytical functions available within MADlib upon which we rely. Among these are linear regression, logistic regression, apriori, k-means, principle component analysis, etc.
- User Defined Functions in Python (UDFs in PL/Python) – Where MADlib does not provide a direct solution to an application problem, the ability to quickly prototype and deploy user defined functions with Python has been effective.
I found your comment about clustering interesting, because it echoes my own experience of RabbitMQ clusters in production (with mirrored / HA queues). What problems did you have, specifically?