IT Central Station is now PeerSpot: Here's why

Microsoft Parallel Data Warehouse OverviewUNIXBusinessApplication

Microsoft Parallel Data Warehouse is #12 ranked solution in top Data Warehouse tools. PeerSpot users give Microsoft Parallel Data Warehouse an average rating of 8 out of 10. Microsoft Parallel Data Warehouse is most commonly compared to Microsoft Azure Synapse Analytics: Microsoft Parallel Data Warehouse vs Microsoft Azure Synapse Analytics. Microsoft Parallel Data Warehouse is popular among the large enterprise segment, accounting for 70% of users researching this solution on PeerSpot. The top industry researching this solution are professionals from a computer software company, accounting for 26% of all views.
Buyer's Guide

Download the Data Warehouse Buyer's Guide including reviews and more. Updated: June 2022

What is Microsoft Parallel Data Warehouse?

The traditional structured relational data warehouse was never designed to handle the volume of exponential data growth, the variety of semi-structured and unstructured data types, or the velocity of real time data processing. Microsoft's SQL Server data warehouse solution integrates your traditional data warehouse with non-relational data and it can handle data of all sizes and types, with real-time performance.

Microsoft Parallel Data Warehouse was previously known as Microsoft PDW, SQL Server Data Warehouse, Microsoft SQL Server Parallel Data Warehouse, MS Parallel Data Warehouse.

Microsoft Parallel Data Warehouse Customers

Auckland Transport, Erste Bank Group, Urban Software Institute, NJVC, Sheraton Hotels and Resorts, Tata Steel Europe

Microsoft Parallel Data Warehouse Video

Archived Microsoft Parallel Data Warehouse Reviews (more than two years old)

Filter by:
Filter Reviews
Industry
Loading...
Filter Unavailable
Company Size
Loading...
Filter Unavailable
Job Level
Loading...
Filter Unavailable
Rating
Loading...
Filter Unavailable
Considered
Loading...
Filter Unavailable
Order by:
Loading...
  • Date
  • Highest Rating
  • Lowest Rating
  • Review Length
Search:
Showingreviews based on the current filters. Reset all filters
Mohammed  Sleem - PeerSpot reviewer
EDRMS Practice Lead at a tech services company with 51-200 employees
Real User
Collects data through SSIS packages from different sources and puts them all in one data repository
Pros and Cons
  • "Collecting the data through SSIS packages from different sources and putting them all in one data repository is the most powerful thing. While others have this feature, they don't have the simplicity or ease of use when getting a resource and knowing everything about it."
  • "I would like the tool to support different operating systems."

What is our primary use case?

We provide this solution as a service to customers. Sometimes, it is just used as a data warehouse. However, a couple of our customers use the data warehouse to collect information from everywhere, e.g., from their subsidiaries, such as electric or petrol companies. We are also developing some dashboards using the data warehouse. Sometimes, you are using Microsoft BI, but most of the time, you are using Qlik. It depends on the customer needs at the end of the day.  Most of our limitations until this moment have been on-premise. There is some sensitive information, like the billing system for an entire country's electricity. However, because of the current circumstances with COVID-19, we are discussing with them how to move it to the cloud. Mostly because of the regulation that any government data cannot be hosted on a cloud, we need to host it on a local cloud at the moment. We cannot use AWS or similar things as of now.  While we do have local cloud vendors that we are working with, this is something in progress.

What is most valuable?

Availability of resources everywhere The cost of resources The information knowledge is there. It is easy to use. You can do what you want with the tool. It integrates well with SQL Server. Collecting the data through SSIS packages from different sources and putting them all in one data repository is the most powerful thing. While others have this feature, they don't have the simplicity or ease of use when getting a resource and knowing everything about it. You can collect data from Oracle Databases, SQL databases, or file systems where it is then saved in one location, building cubes in an easy for SSIS packages. 

What needs improvement?

The setup has room for improvement. Some customers want to go to non-Windows Servers. Have the entire SQL Server done there though is an obstacle for us, as the configuration of Microsoft BI with cubes is a bit complicated. I would like the integration to be offered in a simpler way. I would like improvement in the integration between Microsoft SQL cubes and Parallel Data Warehouse with other members of the Microsoft family, such as Microsoft Power BI and SharePoint. This tool needs a lot of memory for processing. As it does a lot of jobs, it may need this memory while other tools don't need that much. It would help if they enhanced the memory and hardware usage of the tool in order to support the performance of doing the queries I would like the tool to support different operating systems. I want a dashboard.

For how long have I used the solution?

We have been using it for five to six years.
Buyer's Guide
Data Warehouse
June 2022
Find out what your peers are saying about Microsoft, Oracle, Teradata and others in Data Warehouse. Updated: June 2022.
610,812 professionals have used our research since 2012.

What do I think about the stability of the solution?

It is stable. We have migrated millions to billions of data points in Cubes from system to system, while running data cleansing conditions. Putting restraints on this can be time consuming which is why people may want to go to another server. When you inject equation and conditions, then it consumes a lot of time. Though, I have never compared this time consumption to other tools.

What do I think about the scalability of the solution?

I have no concern about the scalability. We have never faced any problems regarding the scalability. If we wanted to extend this, it is extendable.

How are customer service and support?

The online help is good enough along with our technical expertise. 

How was the initial setup?

If everything is available, such as information knowledge sources and connections, I can have the solution deployed in five days.

What about the implementation team?

I recommend getting assistance from an integrator or consultant with the integration.

What's my experience with pricing, setup cost, and licensing?

Microsoft has an agreement with the government in our country, so our customers get their licensing costs from the Ministry. Whenever we work with any government, company, or government institute, which is mainly what we are doing, that license comes directly from the Ministry of Technology and Information.  We actually do not know that much about the price of Microsoft tools. Even if I ask for the price, I'm asking it as a partner.

What other advice do I have?

We have had a good experience working with this tool. If you don't have a problem with memory and have good processors, then you won't have an issue. However, depending on your needs, you may want to go check out other tools. I would rate the tool as an eight out of 10.

Which deployment model are you using for this solution?

On-premises
Disclosure: My company has a business relationship with this vendor other than being a customer: Partner.
PALAKSURI - PeerSpot reviewer
Software Engineer at Tech Mahindra Limited
Real User
Easy to set up and deploy, but it can be difficult to clean large amounts of data
Pros and Cons
  • "We are able to monitor daily jobs, so if there is anything that needs to be done then we can do it."
  • "If the database is large with a lot of columns then it is difficult to clean the data."

What is our primary use case?

This is a solution that we implement for our clients. We provide end-to-end development, maintenance, enhancement, testing, and support. Everything is done by us.

How has it helped my organization?

We are able to monitor daily jobs, so if there is anything that needs to be done then we can do it. If there are discussions with our clients about changes that need to be made then it is us who implements them.

What is most valuable?

The most valuable feature is that we can deploy directly to the production environment, rather than having to go through a development environment first.

What needs improvement?

If the database is large with a lot of columns then it is difficult to clean the data.

For how long have I used the solution?

I have been using the Microsoft Parallel Data Warehouse for two years.

What do I think about the stability of the solution?

The Microsoft Parallel Data Warehouse is a stable solution.

What do I think about the scalability of the solution?

We have not had problems with scalability.

How was the initial setup?

The initial setup is not difficult. It is easy and we are fine with that.

What about the implementation team?

We deploy this solution for our clients.

What other advice do I have?

I am not comfortable with the Microsoft Parallel Data Warehouse at the moment.

I would rate this solution a six out of ten.

Disclosure: I am a real user, and this review is based on my own experience and opinions.
Buyer's Guide
Data Warehouse
June 2022
Find out what your peers are saying about Microsoft, Oracle, Teradata and others in Data Warehouse. Updated: June 2022.
610,812 professionals have used our research since 2012.
AnupNair - PeerSpot reviewer
Senior Principal Consultant at a tech services company with 1,001-5,000 employees
Consultant
A good interface and BI functionality, but scalability is an area that needs improvement
Pros and Cons
  • "The most valuable feature is the business intelligence (BI) part of it."
  • "This solution would be improved with an option for in-memory data analysis."

What is our primary use case?

I am a consultant and I assist in implementing this solution for our clients.

The projects that we are working on right now are related to finance, banking, and government.

What is most valuable?

The most valuable feature is the business intelligence (BI) part of it. BI includes decision trees and similar functionality. This happens post-building of the data warehouse when you start using the analysis features.

The user interface is definitely good.

What needs improvement?

This solution would be improved with an option for in-memory data analysis.

Scalability can be improved because there are other tools that perform better with very large datasets.

For how long have I used the solution?

We have been working with Microsoft Parallel Data Warehouse for more than five years.

What do I think about the stability of the solution?

This is a stable solution and we haven't had any issues. Even if we encounter issues, there is a lot of help available. If issues are identified then we will have a quick turnaround time for fixing them.

What do I think about the scalability of the solution?

There are some tools that are more scalable than this one, and the performance is better as well. This is an area that could use improvement, especially if you have a huge set of data.

We have about 50 analysts and another 30 or 40 regular users who work with this solution every day.

How are customer service and technical support?

We have been in contact with technical support in India and we are satisfied with them.

Which solution did I use previously and why did I switch?

We are using Pentaho Data Warehouse in addition to this solution. We select which one is implemented depending on the customer's request and what they want to do.

How was the initial setup?

The initial setup is straightforward. There is a lot of help available to assist with each step, so it's pretty easy.

The deployment took longer than expected because of continuous improvement and upgrades that were needed for the projects. Once we started and got a handle on how this solution could benefit our projects, we had to incorporate those new ideas. In total, it took between one and a half and two years to deploy.

What about the implementation team?

We implemented this solution with our in-house team.

What other advice do I have?

This is a solution that has good performance and I recommend it. The support from Microsoft is also another thing that makes the Parallel Data Warehouse a good option.

The biggest lesson that we have learned from using this solution is that customers are most interested in a quick project turnaround time, which is something that Parallel Data Warehouse provides.

This is a good solution but there is always room for improvement.

I would rate this solution a seven out of ten.

Disclosure: I am a real user, and this review is based on my own experience and opinions.
Vice President at a tech consulting company with 201-500 employees
Real User
Very scalable, good stability, and a straightforward initial setup

What is our primary use case?

We primarily use the solution to build up our server analytics.

What is most valuable?

The data transmissions between the data models is the most valuable feature.

The database is enormous, so much so that we haven't exploited it completely.

What needs improvement?

I don't have any recommendations for new features at this time.

For how long have I used the solution?

I've been using the solution for about one year.

What do I think about the stability of the solution?

The stability of the solution is good. We haven't had any issues with it.

What do I think about the scalability of the solution?

The solution is very scalable.

How are customer service and technical support?

My understanding is that the team's experience with technical support has been good.

Which solution did I use previously and why did I switch?

We didn't previously use a different solution. Our customers chose this solution.

How was the initial setup?

The initial setup was straightforward. We had a knowledgeable team working on the implementation. Deployment took about six to eight weeks.

What other advice do I have?

We use the on-premises deployment model.

I'd rate the solution eight out of ten.

Disclosure: I am a real user, and this review is based on my own experience and opinions.
Team Lead at a computer software company with 10,001+ employees
Real User
Good reporting functionality for historical and real-time data, but the initial setup is complex
Pros and Cons
  • "The most valuable feature of this solution is performance."
  • "The reporting for certain types of data needs to be improved."

What is our primary use case?

I am using the Microsoft Parallel Data Warehouse for working with information services and datasets, creating reports. It is used primarily for our historical data, but we do some in real-time also. We can receive data weekly, nightly, or even hourly.

What is most valuable?

The most valuable feature of this solution is performance. It reduces the time it takes to process data.

What needs improvement?

The initial setup of this solution is complex and can be improved.

Sometimes you need to re-run the reports because they don't work.

The reporting for certain types of data needs to be improved.

For how long have I used the solution?

I have been working with these kinds of services for the past five to ten years.

What do I think about the stability of the solution?

The stability depends on your data and whether it is null. If the data is set up properly then it will be more stable.

What do I think about the scalability of the solution?

It is definitely scalable. We have more than 220 users in multiple cities. Most of the people, perhaps 70%, are information technology users. The remaining 30% are business analysts who are concerned only with reports.

We do plan on increasing our usage. We recently converted all of our Tableau data, and now we are working on permissions and security.

How are customer service and technical support?

The technical support from Microsoft is good.

How was the initial setup?

The initial setup is definitely complex.

We are not always aware of where the data comes from because it is distributed. Sometimes the data is fine and sometimes it needs to be set up. Once it is configured then it runs fine in parallel.

The length of time for deployment depends on the size of the data and whether we were also implementing a data warehouse or a data mart. When I was implementing this solution and copying data, it took between six and eight months. That is end-to-end with everything finished.

If it is a small data mart then it may only take a month.

What about the implementation team?

Our in-house team, including our DBA, handled the deployment.

What other advice do I have?

My advice for anybody who is implementing Microsoft Parallel Data Warehouse is to be aware of how big the data is. You have to be sure of how to secure and take care of the data. You have to think about scalability.

I would rate this solution a seven out of ten.

Disclosure: I am a real user, and this review is based on my own experience and opinions.
Mukul-Jain - PeerSpot reviewer
BI Developer at a healthcare company with 51-200 employees
Real User
Easy to import data from various formats and retrieve it when required
Pros and Cons
  • "One of the most important features is the ease of using MS SQL."
  • "I think that the error messages need to be made more specific."

What is our primary use case?

I have mostly worked with analytics, so I normally use SQL to put in data for analysis. I have worked on building dashboards as well.

What is most valuable?

One of the most important features is the ease of using MS SQL. It has been easy to do exactly what I wanted. For example, it was very easy to pull the data from, and it is also very easy to import data from CSV or Excel. If I needed a specific set of data from the master then it was easy to get it.

What needs improvement?

I think that the error messages need to be made more specific. Recently, I was working with a large amount of data, and I received a runtime error. I had to solve it by running the query several times, removing data each time. If the error messages were less ambiguous then it would have helped.

For how long have I used the solution?

I have been using this solution for about one year.

What do I think about the stability of the solution?

The stability is good.

What do I think about the scalability of the solution?

The scalability is really good. I know that we don't have to worry about scalability because one time at my former company, we almost doubled our size. In fact, the number of staff doubled and we were able to migrate very easily. The additional data was very easy to accommodate.

In my previous company, my team had 35 users. At one point, during our migration, we had about 50 users who were working extensively with the solution.

How are customer service and technical support?

Funnily enough, neither myself nor anybody in my team has had to contact technical support. Whenever we have had a problem, we have always found a solution on Google or in a YouTube video. When we get stuck and search with Google, we usually find ten similar problems where people have commented on how to fix them.

How was the initial setup?

I was not part of the initial setup, although I assisted with linking our ERP systems to SQL and that is pretty straightforward. There was not much complication in terms of linking the data and how it was being stored.

What other advice do I have?

My advice for anybody who is implementing this solution is not to overload your server with all of the data that you don't use on a day-to-day basis. It is very easy to pull the data that you need whenever it is required. You will have to decide how to store the data and how easy it is for your client to send new data. Neither of these things should be difficult to do, even for newcomers to SQL. Sometimes, you will have to have a senior person demonstrate something basic, like how to pull data, and then people usually pick it up.

I would rate this solution an eight out of ten.

Which deployment model are you using for this solution?

Private Cloud

If public cloud, private cloud, or hybrid cloud, which cloud provider do you use?

Amazon Web Services (AWS)
Disclosure: I am a real user, and this review is based on my own experience and opinions.
Product Director at a insurance company with 10,001+ employees
Real User
Good querying capabilities and it is quite easy to scale

What is our primary use case?

The primary use cases for this solution are for recording and transactions.

What is most valuable?

The most valuable feature for me is querying.

What needs improvement?

I would like to see better visualization features.

A stateless update functionality for the forms may help. Without this, you have to perform updates manually using the drop-down menu.

The user interface should be more user-friendly.

For how long have I used the solution?

I have been using the SQL Data Warehouse for fifteen years.

What do I think about the stability of the solution?

The stability of this solution is based on the set up from the outsourcing team. There are a lot of things to consider, including the connection.

It is constantly being used with transactions occurring every minute.

What do I think about the scalability of the solution?

It is quite easy to scale this solution. The whole company of approximately 2,500 people uses it.

Which solution did I use previously and why did I switch?

Prior to using this solution, we used Oracle. We switched because my daily requirements are on Microsoft SQL. 

How was the initial setup?

I find this setup of this solution to be simple, but that is based on fifteen years of working with it.

Which deployment model are you using for this solution?

On-premises
Disclosure: I am a real user, and this review is based on my own experience and opinions.
Elham-Gharegozloo - PeerSpot reviewer
Senior System Analyst at a tech services company with 1,001-5,000 employees
Real User
Top 10
Good performance and usability with a simple interface
Pros and Cons
  • "The most valuable features are the performance and usability."
  • "More tools to help designers should be included."

What is our primary use case?

We use this solution for keeping track of sales, goods, times of shipping, and other information. It is used for our KPIs.

How has it helped my organization?

This solution helps the higher levels of the organization because they have better visibility of the whole company. This helps with decision making in terms of what should be improved or implemented.

What is most valuable?

The most valuable features are the performance and usability.

Microsoft Parallel Data Warehouse is simple to use, and the user interface is intuitive.

What needs improvement?

More tools to help designers should be included.

For how long have I used the solution?

I have been using this solution for about two years.

What do I think about the stability of the solution?

This is a stable solution. Once your data is stable, the data warehouse is stable too. If the structure of the data changes then you can't change the warehouse to add or delete fields or columns.

What do I think about the scalability of the solution?

Scalability depends on how you design the warehouse. It can be scalable, but it depends on how much data you have to put into it.

We have about six hundred users.

How are customer service and technical support?

Unfortunately, we are in a country that has a limitation that means we cannot contact Microsoft directly. Most of the time we use Google and I can help myself to solve problems.

Which solution did I use previously and why did I switch?

Prior to Microsoft Parallel Data Warehouse, we used a standard relational SQL database. We switched because of the performance and because the data and KPI changed. It would be difficult to use a relational database. We switched to a data warehouse solution because it was acceptable.

How was the initial setup?

The initial installation and setup were easy and we did not have any issues. The deployment was completed within hours.

What about the implementation team?

We deployed using our in-house people.

What other advice do I have?

My advice to anybody who is implementing this solution is to design the databases as well as they can because it is difficult to make changes in the future. It is also important to have a time field in your data in case you want to use it in the future as a reference.

This is a good solution but all software can be improved and made better.

I would rate this solution an eight out of ten.

Disclosure: I am a real user, and this review is based on my own experience and opinions.
Microsoft Dynamics Specialist at a computer software company with 51-200 employees
Real User
Top 20
It's well-priced, extremely stable and the technical support is very good
Pros and Cons
  • "I am very satisfied with the customer service/technical support."
  • "In the future I would love to see a slightly better automation engine, just for the data integration layer, to make it slightly easier for end-users or junior developers to get involved in incremental updating."

What is our primary use case?

I work for a Microsoft partner. We're one of the Gold partners, so we implement on their databases. We are also Dynamics 365 specialists, and I'm a Business Intelligence consultant, so I do SQL, Power BI, Azure SQL, SQL Data Warehouses, and a few others.

What is most valuable?

What I like the most about this solution is the fact that you can basically add capacity behind your data modeling, and you speed up the process before it goes into the Cube holding section. That's a great integration layer. You can basically collect all your data, and then that becomes a staging database for other models, where you can then either report directly with Power BI, or Excel, or other applications, and in more specifically, data Cubes with Microsoft Analysis Services.

What needs improvement?

Something that needs to improve, is the integration layer itself connecting to other non-Microsoft layers. But I don't know if that can be improved, due to the complexity of the data that they're connecting to. But I think they can maybe look at a way to do incremental updates, as it is slightly different.

For how long have I used the solution?

I have been using this program for 21 years in total.

What do I think about the stability of the solution?

The program is very stable - even during power outages it only takes a few minutes to be up and running again.

What do I think about the scalability of the solution?

I think Microsoft is the largest scalable company in terms of data warehousing, elastic pools, elastic servers. Even with Power BI conducting parallel data warehouses. Those can scale up pretty large, and if you really want to, you can move into the data lakes.

How are customer service and technical support?

I am very satisfied with the customer service/technical support. We work for a Microsoft company, so we've got a direct line to Microsoft, and because I am also an ambassador for one of the other larger event companies, I have great connectivity - I speak to some of the black belts at Microsoft!

How was the initial setup?

The initial setup was quite complex. The time deployment takes will depend on all the components they specify. We've had deployments that took a couple of weeks, and we've had deployments that's been spread out over multiple years, because we cover 60 countries, in six continents.

What's my experience with pricing, setup cost, and licensing?

I think the program is well-priced compared to the other offerings that are out in the market.

What other advice do I have?

On a scale from one to 10, I rate this solution a nine. In the future I would love to see a slightly better automation engine, just for the data integration layer, to make it slightly easier for end-users or junior developers to get involved in incremental updating.

The biggest part that we battle with in terms of costing, and explaining to people why it takes so long to develop some of those things, is just to get the data into the actual data warehouse and automating that. It's purely an integration layer to actually get the data into the data warehouses.

People need to do their research very well to understand the terminology and the technology when they speak to people that are technically inclined, because there's a lot of miscommunication in terms of what they expect from the program and what's delivered at the end of the day.

The biggest lessons I've learned through the years are that Microsoft is probably the largest research company there is. So people should stick to people that know what they're doing, and Microsoft definitely has some very, very capable people designing these products.

And that's probably why I've stayed with Microsoft so long. I've actually tried out a few other suppliers, but I always go back to Microsoft.

Which deployment model are you using for this solution?

Public Cloud
Disclosure: I am a real user, and this review is based on my own experience and opinions.
PeerSpot user
BI Business Analyst at a transportation company with 1,001-5,000 employees
Vendor
It handles high volumes of data very well. Though, it needs more compatibility with common BI tools.

What is our primary use case?

Analysing large volumes of data collected from auto ticket barriers at railway stations.

How has it helped my organization?

It has allowed fast daily loads and analysis of millions of rows of data, which eventually moved to near real-time.

What is most valuable?

It handles high volumes of data very well.

What needs improvement?

It needs more compatibility with common BI tools. 

It does not work well with normal ETL tools. Some functions do not work.

For how long have I used the solution?

One to three years.
Disclosure: I am a real user, and this review is based on my own experience and opinions.
PeerSpot user
Teradata DBA / Parallel datawarehouse DBA at a tech services company with 10,001+ employees
Real User
Concurrency issues forced the customer to use the raw DB as a secondary solution

What is our primary use case?

We are using PDW as an EDW solution.

How has it helped my organization?

It helped, initially, as a replacement for our DW DB, but later on faced issues due to concurrency, which forced the customer to use the DB as a secondary solution.

What is most valuable?

Nothing specific, comparable to other solutions.

What needs improvement?

Concurrent queries are limited to 32, making it more of a data storage mechanism instead of an active DWH solution.

They need to improve the metadata being captured to a greater duration.

For how long have I used the solution?

One to three years.
Disclosure: I am a real user, and this review is based on my own experience and opinions.
it_user694689 - PeerSpot reviewer
Business Intelligence evangelist at a hospitality company with 10,001+ employees
Vendor
Gives us the ability to distribute large data sets across nodes.

What is most valuable?

MPP processing gives us the ability to distribute large data sets across nodes.

How has it helped my organization?

We delivered a data warehouse for Contactless and Oyster at TFL.

What needs improvement?

Improve the speed of processing replicated tables.

For how long have I used the solution?

We have been using this solution for years.

What do I think about the stability of the solution?

There were stability issues when the product was in beta.

What do I think about the scalability of the solution?

There were scalability issues in that there is a limit to 32 concurrent queries.

How are customer service and technical support?

Technical support is good.

Which solution did I use previously and why did I switch?

We switched from the standard SQL server 2014. We use PDW to improve overall ETL and report performance.

How was the initial setup?

The initial setup was complex. It was fairly challenging to migrate from SQL server to PDW.

What other advice do I have?

Make sure your data volumes are very large: At least 60 million rows per table.

Disclosure: I am a real user, and this review is based on my own experience and opinions.
it_user347586 - PeerSpot reviewer
Solution Architect at a comms service provider with 11-50 employees
Real User
It has inherent Hadoop integration that can refer HDFS by means of external tables, but there's a bottleneck if there are many partitions in the baseline table.

What is most valuable?

The bulk data-load feature with fast response is one of the most intriguing features. In addition, clustered column store index boosts OLAP query performance significantly.

How has it helped my organization?

For large scale big data analytics, we have been using this product for two years. It has inherent Hadoop integration that can refer HDFS by means of external tables. Thus, large scale historic data retention for business function improvement is quite easy, thus boosting customer confidence.

What needs improvement?

It supports partitioning to improve query performance. However, this has a bottleneck if there are many partitions in the baseline table and the underlying query performance degrades significantly.

For how long have I used the solution?

I've used it for two years.

What do I think about the stability of the solution?

We have not faced any major issues with this product.

What do I think about the scalability of the solution?

Unlike other MPPs, such as Netezza, this requires excellent expertise in SQL to reap the benefits of using it.

How are customer service and technical support?

Customer Service:

7/10

Technical Support:

7/10

Which solution did I use previously and why did I switch?

I have used Netezza as an MPP solution for around eight years, and from my personal experience, I believe Netezza has an edge over PDW due to user friendliness.

How was the initial setup?

The initial setup was straightforward as it is an appliance. The Microsoft support team work closely with us. We always do it in collaboration with the vendor team and thus ensure we get the best out of our investment. The project team is involved during the initial setup process, and thus optimal installation is ensured.

What about the implementation team?

We always do it in collaboration with the vendor team, and thus ensure we get the best out of our investment.

What was our ROI?

We are happy with the ROI. There is significant scope for improvement in this area.

What's my experience with pricing, setup cost, and licensing?

Pricing and licensing is competitive.

Which other solutions did I evaluate?

Any potential customer should explore PDW along with other MPP solutions before making a final decision on defining their OLAP analytics. We evaluated Teradata, as we had more expertise in SQL server, we were intrigued by PDW, and finally it emerged as the winner.

What other advice do I have?

This is a great product for big data analytics as it can challenge other MPPs quite well.

Disclosure: I am a real user, and this review is based on my own experience and opinions.
PeerSpot user
Senior Data Architect at a pharma/biotech company with 1,001-5,000 employees
Vendor
Microsoft PDW History

Originally published at https://www.linkedin.com/pulse/microsoft-pdw-history-datallegro-stephen-c-folkerts

Microsoft SQL Server Parallel Data Warehouse (PDW) is the result of the DATAllegro acquisition in 2008 for roughly $238M. Datallegro was the invention of Stuart Frost to compete with Netezza which is now IBM PureData System for Analytics. Stuart Frost founded DATAllegro in 2003, was CEO of the company from the beginning, and specified the architecture of the product.Netezza came to market with a compelling value proposition. It leveraged an open source Postgres DBMS. It used an appliance business model to create a tightly integrated software and hardware stack, removing a significant area of complexity for DBAs and other system staff. It shifted to sequential I/O from the more typical random I/O in SMP architectures. This allowed the use of much larger and cheaper SATA disk drives and led to a highly competitive price/performance ratio. However, there was a significant flaw in Netezza's strategy. They created a highly proprietary hardware platform and, effectively, a proprietary software platform, with little of Postgres remaining.

Netezza secured its first few customers around the time DATAllegro was being founded. Looking at the Netezza architecture, Stuart Frost realized that there was an opportunity to create a similar value proposition while using a completely non-proprietary platform. Frost’s vision was to create a massively parallel DW appliance with an embedded, off-the-shelf open source Ingres DBMS running on Linux and using completely standard servers, networking and storage from major vendors.

Each server in DATAllegro ran a highly tuned copy of the Ingres DBMS and custom Java on SuSe Linux. These separate database servers were turned into a massively parallel, shared nothing database system that offered incredibly good performance, especially under complex mixed workloads.

Once Microsoft acquired DATAllegro in 2008, the first obvious task was to port the appliance over to the Microsoft SQL Server Windows stack. Microsoft internally went to work on this migration between the 2008 and 2010 period of time. It was known then as project ‘Madison’. In 2010, IBM ponied up $1.8 billion for DATAllegro's biggest competitor, Netezza.

Microsoft Parallel Data Warehouse (PDW)

See my article Microsoft Parallel Data Warehouse (PDW) for a more in-depth look at Microsoft SQL Server PDW.

Microsoft Analytics Platform System (APS)

See my article Microsoft Analytics Platform System (APS) for a more in-depth look at Microsoft APS.

These views are my own and may not necessarily reflect those of my current or previous employers.

Disclosure: I am a real user, and this review is based on my own experience and opinions.
PeerSpot user
Senior Data Architect at a pharma/biotech company with 1,001-5,000 employees
Vendor
It's a scale out, MPP shared nothing architecture where there are multiple physical nodes.

Originally published at https://www.linkedin.com/pulse/microsoft-parallel-data-warehouse-pdw-stephen-c-folkerts

What’s the Difference Between Microsoft’s Parallel Data Warehouse (PDW) and SQL Server?

In this post, I’ll provide an in-depth view of Microsoft SQL Server Parallel Data Warehouse (PDW) and differentiate PDW with SQL Server SMP.

SQL Server is a scale up, Symmetric Multi-Processing (SMP) architecture. It doesn’t scale out to a Massively Parallel Processing (MPP) design. SQL Server SMP runs queries sequentially on a shared everything architecture. This means everything is processed on a single server and shares CPU, memory, and disk. In order to get more horse power out of your SMP box, or as your data grows, you need to buy a brand new larger more expensive server that has faster processors, more memory, and more storage, and find some other use for the old machine.

SQL Server PDW is designed for parallel processing. PDW is a scale out, MPP shared nothing architecture where there are multiple physical nodes. MPP architectures allow multiple servers to cooperate as one thus enabling distributed and parallel processing of large scan-based queries against large data volume tables. Most all data warehouse workload centric appliances leverage MPP architecture in some form of fashion. Each PDW node runs its own instance of SQL Server with dedicated CPU, memory, and storage. As queries go through the system, they are broken up to run simultaneously over each physical node. The primary benefits include the breath-taking query performance gains MPP provides, and the ability to add additional hardware to your deployment to linearly scale out to petabytes of data, without the diminishing returns of an SMP architecture.


The Grey Zone, When to use SQL Server PDW & What About Netezza, Teradata, Exasol, & a Hundred Others?

Often we’re not dealing with petabyte-scale data. Not even close, just in the terabytes, and we’re in a ‘grey zone’, where SQL Server SMP overlaps with PDW. Or the data is all relational and well structured, and there’s no Big Data business need like social media analysis, or fraud detection or combine structured and unstructured data from internal and external sources.

The capabilities of xVelocity Columnstore indexes and other in-memory capabilities and performance enhancements in SQL Server SMP should first be explored before recommending a PDW appliance. A combination of technologies, all native to SQL Server SMP may be the answer if you’re just dealing with relational data problems. The distinction between these sister products will always be blurry. And the underlying question of when to use SQL Server SMP versus PDW will persist, especially since SQL Server capabilities will keep clawing up into PDW territory, while PDW keeps growing. It is wise to understand the important differences for decision making.

Organizations demand results in near real time, and they expect their internal systems to match the speed of an Internet search engine to analyze virtually all data, regardless of its size or type. Once non-relational, unstructured or semi-structured data is thrown into the mix, you suddenly have a very different story. Business analysts struggle to figure out how to add the value of non-relational Hadoop data into their analysis. As a result, they’re held back from making faster more accurate data-driven decisions that are needed to compete in the marketplace. This is the current data challenge.

If you want to get into Netezza and Teradata, see my article Should I Choose Netezza, Teradata or Something Else?

Microsoft Parallel Data Warehouse (PDW)

Microsoft PDW is the rebuilt DATAllegro appliance with hardware and software designed together to achieve maximum performance and scalability. If you’d like to know more, see my article Microsoft PDW History, DATAllegro.

Query processing in PDW is highly parallelized. Data is distributed across processing and storage units called Compute Nodes. Each Compute Node has its own direct attached storage, processors, and memory that run as an independent processing unit. The Control Node is the brains of PDW and figures out how to run each T-SQL query in parallel across all of the Compute Nodes. As a result, queries run fast!

Microsoft SQL Server is foundational to PDW and runs on each Compute Node. PDW uses updateable in-memory clustered columnstore indexes for high compression rates and fast performance on the individual Compute Nodes.

The first rack of the PDW appliance is called the base rack. Every appliance has at least one base rack with 2 or 3 SQL Server Compute Nodes, depending on vendor hardware. As your business requirements change, you can expand PDW by adding scale units to the base rack. When the base rack is full, PDW expands by adding additional racks, called expansion racks, and adding scale units to them.

The base rack has two InfiniBand and two Ethernet switches for redundant network connectivity. A dedicated server runs the Control Node and the Management Node. A spare server ships in the rack for failover clustering. Optionally, you can add a second spare server.

With PDW’s MPP design, you don’t need to buy a new system to add capacity. Instead, PDW grows by adding to the existing system. PDW is designed to expand processing, memory, and storage by adding scale units consisting of SQL Server Compute nodes. By scaling out, you can easily expand capacity to handle a few terabytes to over 6 petabytes in a single appliance.

You don’t need to over-buy and waste storage that you don’t need, and if you under-buy you can quickly add more capacity if your data growth is faster than projected. When one rack is full, you can purchase another rack and start filling it with Compute nodes.

You also don’t need to migrate your data to a new system in order to add capacity. You can scale out without having to redesign your application or re-engineer the distribution mechanism. There is no need to migrate your data to a new system, and no need to restructure your database files to accommodate more nodes. PDW takes care of redistributing your data across more Compute nodes.

In-Memory xVelocity Clustered Columnstore Indexes Improve Query Performance

If MPP provides the computing power for high-end data warehousing, columnar has emerged as one of the most powerful architectures. For certain kinds of applications, columnar provides both accelerated performance and much better compressibility. Teradata, picked up columnar capabilities with its acquisition of Aster Data. HP acquired Vertica, which gave it a columnar MPP database.

PDW uses in-memory clustered columnstore indexes to improve query performance and to store data more efficiently. These indexes are updateable, and are applied to the data after it is distributed. A clustered columnstore index stores, retrieves and manages data by using a columnar data format, called a columnstore. The data is compressed, stored, and managed as a collection of partial columns, called column segments. Columns often have similar data, which results in high compression rates. In turn, higher compression rates further improve query performance because SQL Server PDW can perform more query and data operations in-memory. Most queries select only a few columns from a table, which reduces total I/O to and from the physical media. The I/O is reduced because columnstore tables are stored and retrieved by column segments, and not by B-tree pages.

SQL Server PDW provides xVelocity columnstores that are both clustered and updateable which saves roughly 70% on overall storage use by eliminating the row store copy of the data entirely. The hundreds or thousands of terabytes of information in your EDW can be built entirely on xVelocity columnstores. Updates and direct bulk load are fully supported on xVelocity columnstores, simplifying and speeding up data loading, and enabling real-time data warehousing and trickle loading; all while maintaining interactive query responsiveness.

Combining xVelocity and PDW integrates fast, in-memory technology on top of a massively parallel processing (MPP) architecture. xVelocity technology was originally introduced with PowerPivot for Excel. The core technology provides an in-memory columnar storage engine designed for analytics. Storing data in xVelocity provides extremely high compression ratios and enables in-memory query processing. The combination yields query performance orders of magnitude faster than conventional database engines. Both SQL Server SMP and PDW provide xVelocity columnstores.

Fast Parallel Data Loads

Loads are significantly faster with PDW than SMP because the data is loaded, in parallel, into multiple instances of SQL Server. For example, if you have 10 Compute Nodes and you load 1 Terabyte of data, you will have 10 independent SQL Server databases that are each compressing and bulk inserting 100 GB of data at the same time. This 10 times faster than loading 1 TB into one instance of SQL Server.

PDW uses a data loading tool called dwloader which is the fastest way to load data into PDW and does in-database set-based transformation of data. You can also use SQL Server Integration Services (SSIS) to bulk load data into PDW. The data is loaded from an off-appliance loading staging server into PDW Compute nodes. Informatica also works with PDW.

Scalable, Fast, and Reliable

With PDW’s Massively Parallel Processing (MPP) design, queries run in minutes instead of hours, and in seconds instead of minutes in comparison to Symmetric Multi-Processing (SMP) databases. PDW is not only fast and scalable, it is designed with high redundancy and high availability, making it a reliable platform you can trust with your most business critical data. PDW is designed for simplicity which makes it easy to learn and to manage. PDW’s PolyBase technology for analyzing Hadoop HDInsight data, and its deep integration with Business Intelligence tools make it a comprehensive platform for building end-to-end solutions.

Fast & Expected Query Performance Gains

With PDW, complex queries can complete 5-100 times faster than data warehouses built on symmetric multi-processing (SMP) systems. 50 times faster means that queries complete in minutes instead of hours, or seconds instead of minutes. With this performance, your business analysts can perform ad-hoc queries or drill down into the details faster. As a result, your business can make better decisions, faster.

Why Queries Run Fast in PDW

Queries Run in PDW on Distributed and Highly Parallelized data. To support parallel query processing, PDW distributes fact table rows across the Compute Nodes and stores the table as many smaller physical tables. Within each SQL Server Compute Node, the distributed data is stored into 8 physical tables that are each stored on independent disk-pairs. Each independent storage location is called a distribution. PDW runs queries in parallel on each distribution. Since every Compute Node has 8 distributions, the degree of parallelism for a query is determined by the number of Compute Nodes. For example, if your appliance has 8 Compute Nodes your queries will run in parallel on 64 distributions across the appliance.

When PDW distributes a fact table, it uses one of the columns as the key for determining the distribution to which the row belongs. A hash function assigns each row to a distribution according to the key value in the distribution column. Every row in a table belongs to one and only one distribution. If you don’t choose the best distribution column, it’s easy to re-create the table using a different distribution column.

PDW doesn’t require that all tables get distributed. Small dimension tables are usually replicated to each SQL Server Compute Node. Replicating small tables speeds query processing since the data is always available on each Compute Node and there is no need to waste time transferring the data among the SQL Server Compute Nodes in order to satisfy a query.

PDW’s Cost-Based Query Optimizer

PDW’s cost-based query optimizer is the brain that makes parallel queries run fast and return accurate results. A result of Microsoft’s extensive research and development efforts, the query optimizer uses proprietary algorithms to successfully choose a high performing parallel query plan. The parallel query plan, contains all of the operations necessary to run the query in parallel. As a result, PDW handles all the complexity that comes with parallel processing and processes the query seamlessly in parallel behind the scenes. The results are streamed back to the client as though only one instance of SQL Server ran the query.

PDW Query Processing

Here’s a look into how PDW query processing works ‘under the covers’. First, a query client submits a T-SQL query to the Control Node, which coordinates the parallel query process. After receiving a query, PDW’s cost-based parallel query optimizer uses statistics to choose a query plan, from many options, for running the user query in parallel across the Compute Nodes. The Control Node sends the parallel query plan, called the dsql plan, to the Compute Nodes, and the Compute Nodes run the parallel query plan on their portion of the data.

The Compute Nodes each use SQL Server to run their portion of the query. When the Compute nodes finish, the results are quickly streamed back to the client through the Control node. All of this occurs quickly without landing data on the Control node, and the data does not bottleneck at the Control node.

PDW relies on co-located data, which means the right data must be on the right Compute Node at the right time before running a query. When two tables use the same distribution column they can be joined without moving data. Data movement is necessary though, when a distributed table is joined to another distributed table and the two tables are not distributed on the same column.

PDW Data Movement Service (DMS) Transfers Data Fast

PDW uses Data Movement Service (DMS) to efficiently move and transfer data among the SQL Server Compute Nodes, as necessary, for parallel query processing. Data movement is necessary when tables are joined where the data isn’t co-located DMS only moves the minimum amount of data necessary to satisfy the query. Since data movement takes time, the query optimizer considers the cost of moving the data when it chooses a query plan.

Microsoft Analytics Platform System (APS)

See my article Microsoft Analytics Platform System (APS) for a more in-depth look at Microsoft APS.

These views are my own and may not necessarily reflect those of my current or previous employers.

Disclosure: I am a real user, and this review is based on my own experience and opinions.
PeerSpot user
Developer at a tech consulting company with 51-200 employees
Consultant
It offers high performance & low cost Data Warehousing over industry latest hardware

Valuable Features:

Microsoft SQL Server Parallel Data Warehouse provides significant performance boost from 10-100 X times faster for data loading operation.It minimized the cost of design with lower infrastructure requirements.It is designed to provide optimized performance on latest industry hardware.

Room for Improvement:

It requires significant infrastructure expertise to implement. Possibility of over specified storage or under specified CPU (need proper planning before starting implementation)It requires significant SQL Server expertise. Microsoft SQL Server Parallel Data Warehouse is purposely built to provide high performance data warehousing. By adopting industry standard hardware it avoids vendor lock-in issue. It is best suited for mid-level to large scale organization.

Other Advice:

Both hardware and software support is provided by Microsoft.
Disclosure: I am a real user, and this review is based on my own experience and opinions.
it_user7554 - PeerSpot reviewer
Consultant at a tech consulting company with 51-200 employees
Consultant
Parallel Data Warehouse (PDW) POC – lessons learned

The first version of Microsoft’s Parallel Data Warehouse is out for a while, now I had the chance to get my hands on it during a customer POC. Because PDW is an appliance solution the software is hardware bounded. You can’t download and install PDW on a normal server, you need the right hardware which needs to be MPP capable. Currently there are only 2 vendors providing PDW hardware, HP and DELL.

For all of you who need a further introduction into PDW I can recommend my past blog posts:

Customer Requirements

So let’s go back to the customer POC. My customer currently has a Data Warehouse solution build on SQL Server. The following list provides some high level information about the environment:

  • Data Volume: 15 TB
  • Number of users: 200
  • Biggest Dimension: 20 mill. records
  • Biggest Fact Table: 750 mill. records
  • Number of relational Data Warehouse layers: 3
  • Number of data sources: 15
  • Number of SSIS packages: 400
  • Daily processed data: 50 mill. records

Since his current hardware is not able to scale beyond this size he is looking for a new solution. Together with HP & Microsoft we started a 3 weeks POC to check if PDW can meet customer needs. Without going to much into detail let me quickly highlight the current customer pain points.

ETL Performance: Currently all data loads are implemented with SQL Server Integration Services. Altogether there are about 400 SSIS package that are responsible for source system connectivity and to support data loading over 3 data warehouse layers. In the current environment data loads are done daily and have a predefined time frame for execution. The customer requirement was to get a least the same SSIS-Performance as on the current production system and to be able to scale SSIS in case of adding additional load processes to the daily load without breaking the current loading window.

Relational Query Performance: Today nearly 80% of all customer analysis is done based on an Analysis Services cube. The cube gives us the ability to present the complex data model and business logic in a simplified form that end users understand and are able to work with. But there is also a small amount of technical affine power users which want to be able to analyze data directly on the relational data model. So the demand for a scalable relational reporting solution is growing.

Scalability: As mentioned before my customer is looking for a scalable Data Warehouse solution that is able to handle up to 50TB of relational data (estimated growth in the next 2 years).

Administration: This area is not a real critical success criteria but the customer is also interested in administration efforts and IT operation processes like backup / restore, database administration, database reorg, etc.

Test cases

As you can imagine we can’t test all this within 3 weeks and there are also points that doesn’t make sense to test on your own, like scalability. I think based on the given MPP architecture it’s totally clear, that the system is able to scale for data volume. If you need more space, you need to buy an additional rack, this is how it works.

So clearly we had to focus on what we can really test within 3 weeks and what are the most important things the customer wants to see that are working so that he is able to make a decision to go ahead with PDW or not. Together with the team we agreed that we have a deeper look at ETL Performance and relational query performance.

ETL Performance

We decided to take a representative ETL process and run it against PDW. This process includes:

  • 5 SSIS packages
  • 33 tables
  • 3 Stored Procedures (SP)
  • a User Defined Function (UDF)
    The first question that came up was “where do we executed the SSIS packages”?

As we all know from Best Practices it is recommended to have a dedicated environment for ETL and one for the relational engine. But what about PDW?

PDW is a solution for a relational SQL Server engine based on a MPP architecture. So the Best Practice about a dedicated ETL server is still valid. The good news is, that within the PDW architecture there is also a small ETL server included.

Why small? The server itself is not a high end scalable ETL server (it only has 6 Cores, 24 GB RAM, connected via Infiniband to the other servers). And because PDW is an appliance there is no option to put in more cores or memory in. So for the POC this server was good enough but for a real life project scenario an additional ETL server with enough hardware resources would be needed.

Relational Query Performance

To measure relational query performance the customer provided us 3 representative relational SQL queries with corresponding runtimes on the current environment. So we took these and executed them against PDW.

Next Steps

Within Part 2 of this PDW POC series I will dig into more details about our test cases, about the migration, issues we faced with and of course the results and lessons learned we got. So stay tuned…

Disclosure: I am a real user, and this review is based on my own experience and opinions.
it_user6414 - PeerSpot reviewer
CEO at a tech company with 51-200 employees
Consultant
We examined SAP Data Warehouse and IBM Data Warehouse but SQL Data Warehouse provided a more extensive way of integrating data.

Valuable Features:

• SQL Data Warehouse provides our organization with a way to access all corporate data in a uniform way. • SQL Data Warehouse allows us to manage our database using network or hierarchical database systems rather than solely relying on relational DBMSs. • It supports the bottom-up process of data warehouse creation where several data marts can be joined to form one big warehouse. I found this very handy especially where a few teams are involved.

Room for Improvement:

• Data cleaning involves the use of special software to help the cleaning team uniformly match attributes from the target source data to its destination. • It is expensive. SQL Data Warehouse is not supported by all versions of SQL Server such as the Express version and therefore becomes unavailable to many limiting its acquisition and use. • Uses a different approach to databases than that used in relational DBMS hence posing a challenge to novices.

Other Advice:

• It comes as a special tool of cleaning data through the data transformation process where data is extracted from many sources before it is assembled and consolidated to form a single database. This is the main factor that led my company to choose SQL Data Warehouse as the special tool to consolidate relational databases. Close examination was done on SAP Data Warehouse and IBM Data Warehouse but SQL Data Warehouse provided a more extensive way of integrating data. I personally value the role that SQL Data Warehouse has contributed in achieving the goals of business intelligence. Users will to generate timely reports from huge databases as well as database cleaning should embrace this provision.
Disclosure: I am a real user, and this review is based on my own experience and opinions.
Buyer's Guide
Download our free Data Warehouse Report and find out what your peers are saying about Microsoft, Oracle, Teradata, and more!
Updated: June 2022
Product Categories
Data Warehouse
Buyer's Guide
Download our free Data Warehouse Report and find out what your peers are saying about Microsoft, Oracle, Teradata, and more!