Software Engineer at a aerospace/defense firm with 1,001-5,000 employees
31 December 16
It depends...what is your endgame ?
Hadoop these days mostly servers as a distributed clustering file system that specializes in storing very large files. If you are merely interested in writing software for distributed processing....Apache Spark, or NVIDIA CUDA are a much better choice....if you are interested in the distributed processing of large amounts of data, then the common practice is to use Apache Spark to write the code to process the data, and Hadoop for persistent file system storage.
Hi,I am analyzing big data architecture. I would like to get a comparison of BigInsights and Cloudera versions of Hadoop.1. I am looking at the pros and cons of BigInsight compared to Cloudera.2. Also would like to know the pros and cons of BigInsight compared to Hortonworks.Your help is greatly appreciated.
Business Unit technical Lead at a tech services company with 1,001-5,000 employees
15 September 15
I am a Netezza DBA currently. I am in the middle of working with a group on Biginsights move to production. I have done ALOT of integration testing between Netezza and BI.
1. BI (only tried V3 enterprise and VM trial)
PRO - BIGSQL implementation is ANSI compliant SQL ! That is giant PRO in my mind.
PRO- GPFS - seems like a decent improvement over HDFS. Very fault tolerant and allows for data updates as well
PRO - Integration with Netezza is excellent for Fluid Query read
PRO - Console has the ability to create linked tasks
CON - Enterprise console - had some applications for data movement that didn't work. Have a ticket open
CON - Enterprise Netezza data mover proc seems broken when using GPFS. Have a ticket open for this
I have used Cloudera and Hortonworks a little
PRO- Really like Ambari ! I think it is much easier to use than BI console interface
CON- SQL implementation not 100% ansi compliant