2018-08-14T07:42:00Z
it_user434868 - PeerSpot reviewer
Senior Director of Delivery at a tech services company with 51-200 employees
  • 0
  • 30

What needs improvement with Apache Hadoop?

Please share with the community what you think needs improvement with Apache Hadoop.

What are its weaknesses? What would you like to see changed in a future version?

18
PeerSpot user
18 Answers
AM
Credit & Fraud Risk Analyst at a financial services firm with 10,001+ employees
Real User
Top 20
2022-09-29T11:28:03Z
Sep 29, 2022

In terms of processing speed, I believe that some of this software as well as the Hadoop-linked software can be better. While analyzing massive amounts of data, you also want it to happen quickly. Faster processing speed is definitely an area for improvement. I am not sure about the cloud's technical aspects, whether there are things that happen in the cloud architecture that essentially make it a little slow, but speed could be one. And, second, the Hadoop-linked programs and Hadoop-linked software that are available could do much more and much better in terms of UI and UX. I mentioned it definitely, and this is probably the only feature we can improve a little bit because the terminal and coding screen on Hadoop is a little outdated, and it looks like the old C++ bio screen. If the UI and UX can be improved slightly, I believe it will go a long way toward increasing adoption and effectiveness.

Search for a product comparison
YT
Business data analyst at RBSG Internet operations
Real User
2022-09-05T12:51:45Z
Sep 5, 2022

We have plans to increase usage and this is where we've realized that when we have all these clusters and we're running queries and analyzing, we are facing some latency issues. I think more of the solution needs to be focused around the panel processing and retrieval of data.

MB
IT Expert at a comms service provider with 1,001-5,000 employees
Real User
Top 20
2022-07-21T16:29:00Z
Jul 21, 2022

The price could be better. I think we would use it more, but the company didn't want to pay for it. Hortonworks doesn't exist anymore, and Cloudera killed the free version of Hadoop.

Juliet Hoimonthi - PeerSpot reviewer
Manager at Robi Axiata Limited
Real User
Top 5
2022-07-05T06:27:00Z
Jul 5, 2022

What could be improved in Apache Hadoop is its user-friendliness. It's not that user-friendly, but maybe it's because I'm new to it. Sometimes it feels so tough to use, but it could be because of two aspects: one is my incompetency, for example, I don't know about all the features of Apache Hadoop, or maybe it's because of the limitations of the platform. For example, my team is maintaining the business glossary in Apache Atlas, but if you want to change any settings at the GUI level, an advanced level of coding or programming needs to be done in the back end, so it's not user-friendly.

DulalMali - PeerSpot reviewer
Data Analytics Practice head at bse
Real User
Top 20
2022-04-27T08:19:03Z
Apr 27, 2022

The integration with Apache Hadoop with lots of different techniques within your business can be a challenge.

Donghan Kim - PeerSpot reviewer
e-Business Department Professor at MANU MEDITEC
Real User
Top 10
2022-01-14T10:24:00Z
Jan 14, 2022

Apache Hadoop's real-time data processing is weak and is not enough to satisfy our customers, so we may have to pick other products. We are continuously researching other solutions and other vendors. Another weak point of this solution, technically speaking, is that it's very difficult to run and difficult to smoothly implement. Preparation and integration are important. The integration of this solution with other data-related products and solutions, and having other functions, e.g. API connectivity, are what I want to see in the next release.

Learn what your peers think about Apache Hadoop. Get advice and tips from experienced pros sharing their opinions. Updated: November 2022.
655,711 professionals have used our research since 2012.
DD
Partner at a tech services company with 11-50 employees
Real User
Top 20
2021-10-05T18:57:00Z
Oct 5, 2021

Hadoop's security could be better.

GA
Founder & CTO at a tech services company with 1-10 employees
Real User
Top 20
2020-12-08T22:10:56Z
Dec 8, 2020

I don't have any concerns because each part of Hadoop has its use cases. To date, I haven't implemented a huge product or project using Hadoop, but on the level of POCs, it's fine. The community of Hadoop is now a cluster, I think there is room for improvement in the ecosystem. From the Apache perspective or the open-source community, they need to add more capabilities to make life easier from a configuration and deployment perspective.

SS
Technical Lead at a government with 201-500 employees
Real User
2020-10-19T09:33:27Z
Oct 19, 2020

For the visualization tools, we use Apache Hadoop and it is very slow. It lacks some query language. We have to use Apache Linux. Even so, the query language still has limitations with just a bit of documentation and many of the visualization tools do not have direct connectivity. They need something like BigQuery which is very fast. We need those to be available in the cloud and scalable. The solution needs to be powerful and offer better availability for gathering queries. The solution is very expensive.

JP
Vice President - Finance & IT at a consumer goods company with 1-10 employees
Real User
2020-07-14T08:15:56Z
Jul 14, 2020

I'm not sure if I have any ideas as to how to improve the product. Every year, the solution comes out with new features. Spark is one new feature, for example. If they could continue to release new helpful features, it will continue to increase the value of the solution. The solution could always improve performance. This is a consistent requirement. Whenever you run it, there is always room for improvement in terms of performance. The solution needs a better tutorial. There are only documents available currently. There's a lot of YouTube videos available. However, in terms of learning, we didn't have great success trying to learn that way. There needs to be better self-paced learning. We would prefer it if users didn't just get pushed through to certification-based learning, as certifications are expensive. Maybe if they could arrange it so that the certification was at a lesser cost. The certification cost is currently around $2,500 or thereabout.

Abhik Ray - PeerSpot reviewer
Co-Founder at Quantic
Real User
Top 5
2020-02-07T02:52:00Z
Feb 7, 2020

It would be helpful to have more information on how to best apply this solution to smaller organizations, with less data, and grow the data lake.

it_user1093134 - PeerSpot reviewer
Technical Architect at RBSG Internet Operations
Real User
2019-12-16T08:14:00Z
Dec 16, 2019

We're finding vulnerabilities in running it 24/7. We're experiencing some downtime that affects the data. It would be good to have more advanced analytics tools.

it_user1208307 - PeerSpot reviewer
Practice Lead (BI/ Data Science) at a tech services company with 11-50 employees
Real User
2019-12-16T08:13:00Z
Dec 16, 2019

It could be because the solution is open source, and therefore not funded like bigger companies, but we find the solution runs slow. The solution isn't as mature as SQL or Oracle and therefore lacks many features. The solution could use a better user interface. It needs a more effective GUI in order to create a better user environment.

YM
CEO at AM-BITS LLC
Real User
2019-11-27T05:42:00Z
Nov 27, 2019

What needs improvement depends on the customer and the use case. The classical Hadoop, for example, we consider an old variant. Most now work with flash data. There is a very wide application for this solution, but in enterprise companies, if you work with classical BI systems, it would be good to include an additional presentation layer for BI solutions. There is a lack of virtualization and presentation layers, so you can't take it and implement it like a radio solution.

LD
Data Scientist at a tech vendor with 501-1,000 employees
Real User
2019-09-29T07:27:00Z
Sep 29, 2019

Hadoop itself is quite complex, especially if you want it running on a single machine, so to get it set up is a big mission. It seems that Hadoop is on it's way out and Spark is the way to go. You can run Spark on a single machine and it's easier to setup. In the next release, I would like to see Hive more responsive for smaller queries and to reduce the latency. I don't think that this is viable, but if it is possible, then latency on smaller guide queries for analysis and analytics. I would like a smaller version that can be run on a local machine. There are installations that do that but are quite difficult, so I would say a smaller version that is easy to install and explore would be an improvement.

MB
IT Expert at a comms service provider with 1,001-5,000 employees
Real User
Top 20
2019-07-28T07:35:00Z
Jul 28, 2019

We are using HDTM circuit boards, and I worry about the future of this product and compatibility with future releases. It's a concern because, for now, we do not have a clear path to upgrade. The Hadoop product is in version three and we'd like to upgrade to the third version. But as far as I know, it's not a simple thing. There are a lot of features in this product that are open-source. If something isn't included with the distribution we are not limited. We can take things from the internet and integrate them. As far as I know, we are using Presto which isn't included in HDP (Hortonworks Data Platform) and it works fine. Not everything has to be included in the release. If something is outside of HDP and it works, that is good enough for me. We have the flexibility to incorporate it ourselves.

Real User
2019-07-16T01:59:00Z
Jul 16, 2019

We would like to have more dynamics in merging this machine data with other internal data to make more meaning out of it.

Samuel  Feinberg - PeerSpot reviewer
Analytics Platform Manager at a consultancy with 10,001+ employees
Real User
2018-08-14T07:42:00Z
Aug 14, 2018

In general, Hadoop has as lot of different component parts to the platform - things like Hive and HBase - and they're all moving somewhat independently and somewhat in parallel. I think as you look to platforms in the cloud or into walled-garden concepts, like Cloudera or Azure, you see that the third-party can make sure all the components work together before they are used for business purposes. That reduces a layer of administration configuration and technical support. I would like to see more direct integration of visualization applications.

Related Questions
Tomasz Rabong - PeerSpot reviewer
Client Engagement Leader at Sanmargar Team
Apr 20, 2022
Hello peers, I am looking for a data catalog vendor or open-source with the following DB data sources: Teradata MS SQL HANA Hadoop/Hive and BI data sources: SAP BO 4.0 Tableau Server 2022.1 Can you please advise? I appreciate the help.
2 out of 6 answers
Evgeny Belenky - PeerSpot reviewer
Director of Community at PeerSpot (formerly IT Central Station)
Apr 18, 2022
Hi @Delmar Assis, @Angel Pineda, @reviewer1318779, @George McGeachie, @Carel Van Der Merwe and @Moorthy Natarajan, Can you please assist @Tomasz Rabong ​with their question?​​ ​ ​ ​ ​
Leandro Sodré - PeerSpot reviewer
Data Governance Specialist at Keyrus
Apr 18, 2022
Hi Tomasz Rabong,  I believe that if you have a developer team in Amundsen it would be possible.  Alternatively, you can look at Informatica EDC or at Data Virtualization Data Catalog (from Denodo).
it_user1272297 - PeerSpot reviewer
Special Adviser Strategy at a university with 501-1,000 employees
Apr 19, 2020
I currently am working as a Special Strategic Adviser. I am involved in strategic risk management analysis and mitigation actions. We are currently evaluating SQream Technologies SQream DB. Does anybody have experience with them and can attest to them being the best RDBMS vendor for big data of 30TB+? Are there any other RDBMS solutions for big data that I should be evaluating? Thanks! I ap...
2 out of 4 answers
Russell Rothstein - PeerSpot reviewer
CEO at PeerSpot (formerly IT Central Station)
Jan 27, 2020
Morten, the most popular comparisons of SQream can be found here: https://www.itcentralstation.com/products/sqream-db-alternatives-and-competitors The top ones include Cassandra, MemSQL, MongoDB, and Vertica.
CD
Data Architect at a tech services company with 201-500 employees
Jan 27, 2020
I haven't used SQream personally. However, if you are only considering GPU based rdbms's please check the following https://hackernoon.com/which-gpu-database-is-right-for-me-6ceef6a17505
Related Articles
Netanya Carmi - PeerSpot reviewer
Content Manager at PeerSpot (formerly IT Central Station)
Apr 26, 2022
PeerSpot’s crowdsourced user review platform helps technology decision-makers around the world to better connect with peers and other independent experts who provide advice without vendor bias. Our users have ranked these solutions according to their valuable features, and discuss which features they like most and why. You can read user reviews for the Top 5 Data Warehouse Tools to help you d...
Related Articles
Netanya Carmi - PeerSpot reviewer
Content Manager at PeerSpot (formerly IT Central Station)
Apr 26, 2022
Top 5 Data Warehouse Tools 2022
PeerSpot’s crowdsourced user review platform helps technology decision-makers around the world to...
Download Free Report
Download our free Apache Hadoop Report and get advice and tips from experienced pros sharing their opinions. Updated: November 2022.
DOWNLOAD NOW
655,711 professionals have used our research since 2012.