IT Central Station is now PeerSpot: Here's why

What Data Science Platform is best suited to a large-scale enterprise?

Rony_Sklar - PeerSpot reviewer
Community Manager at PeerSpot (formerly IT Central Station)

Hello community members,

There are many Data Science Platforms available. Which platform would you recommend that can handle large amounts of data? Why?

PeerSpot user
916 Answers

Ziad Chaudhry - PeerSpot reviewer

DakaIku is a great general purpose data science platform for both supervised and unsupervised learning. It handles Big Data very well.

Rony_Sklar - PeerSpot reviewerRony_Sklar
Community Manager

@Ziad Chaudhry thanks for your input :)

Anastasia Ant - PeerSpot reviewerAnastasia Ant

@Ziad Chaudhry I'd also vote for Dataiku, look at their cases

AaronCooke - PeerSpot reviewer
Real User

Sparkcognition's Darwin product can handle very large data sets. 

Russell Rothstein - PeerSpot reviewerRussell Rothstein
Community Manager

@AaronCooke ​did you compare with any other solutions? What are the other alternatives for large data sets? BTW, thank you for sharing your review of Darwin with the community!

Rony_Sklar - PeerSpot reviewerRony_Sklar
Community Manager

Thanks for your input @AaronCooke ​:) 

Djalma Gomes, Pmp, Mba - PeerSpot reviewer
Top 5Vendor

Data science platform is a vague term.  

It all depends on what you wish to accomplish. Are you talking about fast databases, ETLs, a Machine Learning tool, integration with R or Python, Self-Service Data Visualization Tool, Collaboration? No size fits all...

Jinhyung Cho - PeerSpot reviewer

Dataiku, Domino, RapidMiner are notable candidates for your purpose, I presume. 

It has been 2 years when I checked several vendors and made the list as candidates. They all support large-scale data manipulation for data analysis and machine learning development as a platform that can be used by many people in a collaborative way.

Laurence Moseley - PeerSpot reviewer
Top 5LeaderboardReal User

I suspect that I cannot answer this. I have used Knime and RapidMiner with data sets that have had up to about 80,000 rows and 1,500 columns and both have performed well. However, I doubt whether the questioner would classify my usage as "large amounts of data". If my usage is like theirs, then both packages can be recommended.

Both Knime and RapidMiner offer the facility to link with Python or R, and those languages have modules or methods which offer better performance on large data sets (multi-processing or using GPUs, etc.), so those combinations might serve their purpose. So, they might use, say, Knime for ease of use and, say, R for the excess power or RapidMiner and Python.

Hyundong Lee - PeerSpot reviewer

If you want to handle computer vision data, I recommend the Superb AI Suite.

Yogesh PARTE - PeerSpot reviewer

The question also needs to specify which domain, what kind of data and public or private platforms. 

For structured/tabular data driverless AI / sparkling water is my preferred platform. 

Rony_Sklar - PeerSpot reviewerRony_Sklar
Community Manager

@Yogesh PARTE ​Good point - this is a more general question, but I do agree that it's easier to make recommendations with more details. Would you mind sharing more about why Sparkling Water is your preferred choice in this instance?

LaurenceMoseley - PeerSpot reviewer
Top 5Real User

My experience has not been on large scale systems. Not even  multi-terabytes. My mult-megabytes would not help. Sorry!

EzzAbdelfattah - PeerSpot reviewer
Top 5LeaderboardReal User

IBM SPSS Modeler

Russell Rothstein - PeerSpot reviewerRussell Rothstein
Community Manager

@EzzAbdelfattah ​why do you recommend IBM SPSS Modeler? 

WalisonAbreu - PeerSpot reviewerWalisonAbreu
Real User

@EzzAbdelfattah IMHO it's pretty much limited and outdated to handle with the latest frameworks features,

Buyer's Guide
Data Science Platforms
June 2022
Find out what your peers are saying about Databricks, Alteryx, Knime and others in Data Science Platforms. Updated: June 2022.
610,336 professionals have used our research since 2012.