Share your experience using Splice Machine

The easiest route - we'll conduct a 15 minute phone interview and write up the review for you.

Use our online form to submit your review. It's quick and you can post anonymously.

Your review helps others learn about this solution
The PeerSpot community is built upon trust and sharing with peers.
It's good for your career
In today's digital world, your review shows you have valuable expertise.
You can influence the market
Vendors read their reviews and make improvements based on your feedback.
Examples of the 84,000+ reviews on PeerSpot:

Atal Upadhyay - PeerSpot reviewer
AVP at MIDDAY INFOMEDIA LIMITED
Real User
Top 5
Allows us to consume data from any data source and has a remarkable processing power
Pros and Cons
  • "With Spark, we parallelize our operations, efficiently accessing both historical and real-time data."
  • "It would be beneficial to enhance Spark's capabilities by incorporating models that utilize features not traditionally present in its framework."

What is our primary use case?

We pull data from various sources and employ a buzzword to process it for reporting purposes, utilizing a prominent visual analytics tool.

How has it helped my organization?

Our experience with using Spark for machine learning and big data analytics allows us to consume data from any data source, including freely available data. The processing power of Spark is remarkable, making it our top choice for file-processing tasks.

Utilizing Apache Spark's in-memory processing capabilities significantly enhances our computational efficiency. Unlike with Oracle, where customization is limited, we can tailor Spark to our needs. This allows us to pull data, perform tests, and save processing power. We maintain a historical record by loading intermediate results and retrieving data from previous iterations, ensuring our applications operate seamlessly. With Spark, we parallelize our operations, efficiently accessing both historical and real-time data.

We utilize Apache Spark for our data analysis tasks. Our data processing pipeline starts with receiving data in the RAV format. We employ a data factory to create pipelines for data processing. This ensures that the data is prepared and made ready for various purposes, such as supporting applications or analysis.

There are instances where we perform data cleansing operations and manage the database, including indexing. We've implemented automated tasks to analyze data and optimize performance, focusing specifically on database operations. These efforts are independent of the Spark platform but contribute to enhancing overall performance.

What needs improvement?

It would be beneficial to enhance Spark's capabilities by incorporating models that utilize features not traditionally present in its framework.

For how long have I used the solution?

I've been engaged with Apache Spark for about a year now, but my company has been utilizing it for over a decade.

What do I think about the stability of the solution?

It offers a high level of stability. I would rate it nine out of ten.

What do I think about the scalability of the solution?

It serves as a data node, making it highly scalable. It caters to a user base of around five thousand or so.

How was the initial setup?

The initial setup isn't complicated, but it varies from person to person. For me, it wasn't particularly complex; it was straightforward to use.

What about the implementation team?

Once the solution is prepared, we deploy it onto both the staging server and the production server. Previously, we had a dedicated individual responsible for deploying the solution across multiple machines. We manage three environments: development, staging, and production. The deployment process varies, sometimes utilizing a tenant model and other times employing blue-green deployment, depending on the situation. This ensures the seamless setup of servers and facilitates smooth operations.

What other advice do I have?

Given our extensive experience with it and its ability to meet all our requirements over time, I highly recommend it. Overall, I would rate it nine out of ten.

Disclosure: I am a real user, and this review is based on my own experience and opinions.
Flag as inappropriate
Shahan Rehman - PeerSpot reviewer
Senior Business Development Manager at BBI Consultancy
Real User
Can host multiple technologies and help businesses with their AI initiatives
Pros and Cons
  • "The tool can be deployed using different container technologies, which makes it very scalable."
  • "The tool's ability to be deployed on a cloud model is an area of concern where improvements are required."

What is our primary use case?

The tool is used by our company's different customers who have requirements for big data management. When our company's customers want to build a platform for big data management, they choose Cloudera as their tool and as a big data management platform even though there are different options in the market since it is best suited if they consider having an on-premises solution. If a customer wants a cloud-based solution for big data management, then there are other tools in the market that better suit their requirements. For an on-premises big data management platform, Cloudera is the best choice.

What is most valuable?

The best part of the tool is that it is able to expand horizontally and vertically when its customer wants to grow the business. The tool can be deployed using different container technologies, which makes it very scalable.

What needs improvement?

The tool's ability to be deployed on a cloud model is an area of concern where improvements are required. The tool works very well when deployed on an on-premises model. The deployment on a cloud platform is where Cloudera needs to work more. There are competitors who are way ahead of Cloudera.

For how long have I used the solution?

I have been using Cloudera Distribution for Hadoop for five years. My company has a partnership with Cloudera.

What do I think about the stability of the solution?

It is a very stable solution. Stability-wise, I rate the solution a nine out of ten.

What do I think about the scalability of the solution?

It is a scalable solution. Scalability-wise, I rate the solution a nine out of ten. Scalability depends on the environment, but it can scale up in an on-premises environment. There are challenges with its scalability on the cloud.

My company deals with around seven customers who use the product.

How are customer service and support?

The technical team in my company deals with the product support team.

How was the initial setup?

The ease or difficulty in setting up the product depends on the environment of the customer where the tool is deployed. If a banking, industrial, or retail sector firm is taken into concentration, depending on how big of a database is maintained, including the applications that are to be hosted, the deployment process can range from a simple to a very complex phase, depending on the architecture.

For Cloudera Distribution for Hadoop, one has to go through the usual deployment process, like for any software product. You have to have different environments before going into production, like pre-production environments, test and dev environments. You install and configure all the components in the test environment and then test them on the pre-production environment. Once UAT is done, you move them to the production environment. In general, it's a critical product deployed in a company.

What's my experience with pricing, setup cost, and licensing?

The tool is expensive. Overall, it's not a cheap software tool, and that is why only large enterprises who are mature enough and have an architecture that is complex enough opt for Cloudera, as its ROI would make sense to such businesses. For the SMB market or customers whose environments are not that complex and do not have multiple systems running, Cloudera might not be a good option.

What other advice do I have?

Speaking about the security features of the tool, I feel that it is a very secure system, but I cannot comment more on it since I don't have a technical background. The product follows international security guidelines to comply with the PII data and other kinds of regulated data for its end customers.

I recommend that those planning to use the solution examine their environment and its complexities. There are cheaper tools in the market since everybody is not well-suited to using Cloudera Distribution for Hadoop. All the large enterprises' on-premise architecture definitely needs to have the tool. As most of our company's customers are now moving to the cloud, Cloudera's role in their environments has been reduced.

The benefits of the solution stem from the fact that it is a tool for big data management that can host multiple technologies. The other benefit of Cloudera is that you can use it to support your AI or artificial intelligence initiatives since the tool can host different data warehouses or data lakes, which provides you with the flexibility of hosting an AI solution on top of it. Customers can leverage Cloudera platform for their AI initiatives. There has been an increase in hardware utilization over the years, so the servers, hardware, memory, IOPS, and CPU required need to be much more efficient than in the past.

I rate the tool a nine out of ten.

Disclosure: My company has a business relationship with this vendor other than being a customer: Reseller
Flag as inappropriate