Coming October 25: PeerSpot Awards will be announced! Learn more

Microsoft Analytics Platform System OverviewUNIXBusinessApplication

Buyer's Guide

Download the Data Warehouse Buyer's Guide including reviews and more. Updated: September 2022

What is Microsoft Analytics Platform System?
The Microsoft Analytics Platform System can meet the demands of your evolving data warehouse environment with its scale-out, massively parallel processing integrated system supporting hybrid data warehouse scenarios. It provides the ability to query across relational and non-relational data by leveraging Microsoft PolyBase and industry-leading big data technologies. It offers the lowest price per terabyte for large data warehouse workloads.

Microsoft Analytics Platform System was previously known as Microsoft APS, MS Analytics Platform System.

Microsoft Analytics Platform System Customers
Transport for London, E-Plus Mobilfunk GmbH & Co. KG, Prometeia, Tangerine, SSM Health Care, Service Corporation International
Microsoft Analytics Platform System Video

Archived Microsoft Analytics Platform System Reviews (more than two years old)

Filter by:
Filter Reviews
Industry
Loading...
Filter Unavailable
Company Size
Loading...
Filter Unavailable
Job Level
Loading...
Filter Unavailable
Rating
Loading...
Filter Unavailable
Considered
Loading...
Filter Unavailable
Order by:
Loading...
  • Date
  • Highest Rating
  • Lowest Rating
  • Review Length
Search:
Showingreviews based on the current filters. Reset all filters
Data Solution Architect at a government with 10,001+ employees
Real User
This suite of products performs many different and important tasks as a well-integrated system
Pros and Cons
  • "This is a well-integrated solution and that integration empowers results."
  • "Releases of new products and functionality is never accompanied by associated documentation, training and resources that adequately explain the release."

What is our primary use case?

I am a freelancer in this area. Microsoft Analytics Platform System is a suite of many different products. I have been busy using Microsoft Analytics Platform for the last 10 years and it is not really one single product. There are many different tools and many different technologies used both on cloud and on-premise. It is maybe a hundred products. There is not one single component that I would call Microsoft Analytics Platform System because it is just that: a system.  

Because that is the case, it is very difficult to pick one thing as most important or that we use it for most because it is so versatile and serves very different needs for different users. Sometimes a part of the suite is used by just a couple of people and sometimes by a complete team or even groups of teams. It really depends on the situation and the solution the product provides for a given project. No part is the primary part. I could only say that the use case is to work with a unified system to enhance collaboration, analysis, and productivity, and only that.  

Personally, I get assignments from companies and I then implement those assignments for those companies. I, myself, do not use those products at all. So I am not going to use more or less of a particular part or service. I go to the assignment and I implement the solution in Microsoft Analytics Platform System for them. I do all kinds of different projects, from small to medium to very big. My personal use case is to do the implementations for projects for those companies.  

What is most valuable?

The most valuable part of the product is that it is a system. It has different tools for different services for different kinds of scenarios. It is a very rich tool and an integrated technology-rich platform. The total integration with the rest of Microsoft products is probably the most valuable piece that creates flexibility and compatibility and makes the tool a very useful one.  

What needs improvement?

In general, I am not really very satisfied with the tutorials that are out there. When Microsoft releases a new tool, technology — whatever it is — oftentimes it is not really very easy to get your hands on the insightful information and documentation, training courses, and other training materials. If you can find them, they may not explain what you need to know in a nice way. Often times they are a little bit fragmented. These user-oriented guides are something that should be better and released along with the products they are supposed to support.  

For example, we have servers in Azure called Azure Data Factory which I work with quite a lot. When a new feature or new release happens, finding the right documentation or resources that explain these features and how do you work with them is a little bit more difficult than it should be, in my opinion.  

There are probably a lot of extra features that might be considered to add to the scope of this solution. However, adding ports for different types of users may be one of the best. Certain users are advanced users and they can find their way around. But sometimes non-technical users or those that do not have a lot of technical background can find the complexity a little bit difficult to work with. Better handling of user gateways and privileges would be a benefit.  

For how long have I used the solution?

We have been using this solution for the last three or four years.  

Buyer's Guide
Data Warehouse
September 2022
Find out what your peers are saying about Microsoft, Teradata, Apache and others in Data Warehouse. Updated: September 2022.
635,162 professionals have used our research since 2012.

What do I think about the stability of the solution?

I do not personally see many issues with the stability as long as everything is configured correctly.  

What do I think about the scalability of the solution?

Essentially the scalability sometimes is a little bit difficult depending on how it needs to be applied in some scenarios. I have been working for very different companies from medium-sized to quite large — a few thousand users. Oftentimes only for groups of 50 plus users. Scalability is inherent in that scope and it can be done. The specific reasons and application of scaling may make it more or less challenging, but it can be done.  

How are customer service and support?

I would say that the technical support is satisfactory. It is neither really good nor really bad.  

Which solution did I use previously and why did I switch?

I have used other products like SAP or other third-party tools here and there. Most of my experience is with Microsoft Azure and I have not really considered working with other tools, platforms and solutions too much simply because Microsoft is best at integrating with their own products.  

How was the initial setup?

For me, the setup is quite straightforward.  

The deployment can take just a few weeks in some cases. In some cases, it is a month and even years because of the scope of the rollout. So it really depends on the project.  

What about the implementation team?

As I do the installations, I do not need to use outside services.  

What other advice do I have?

I would recommend and do recommend using this product for others who need it.  

On a scale from one to ten where one is the worst and ten is the best, I would rate this product as an eight-out-of-ten.  

Disclosure: My company has a business relationship with this vendor other than being a customer: Partner
PeerSpot user
reviewer1188402 - PeerSpot reviewer
Manager - Infrastructure at a government with 1,001-5,000 employees
MSP
Integrated with other Azure products but the real application time monitoring needs to be improved
Pros and Cons
  • "I like that it's integrated with other Azure products."
  • "I think the biggest problem with the product is that it does a data ingest model, which is very expensive."

What is our primary use case?

Our primary use case is for monitoring our servers. We have a virtual fleet within Azure and we monitor that. 

What is most valuable?

Because we use Azure, it's built into Azure. We probably haven't been using it properly, so there's lots of bits and pieces to Microsoft Analytics. To be quite honest, we've never sat down and formulated a plan on how to use it to a hundred percent. We probably have used a bit of it at the moment.

I like that it's integrated with other Azure products. 

What needs improvement?

I think the biggest problem with the product is that it does a data ingest model, which is very expensive.

For how long have I used the solution?

I have been using Microsoft Analytics Platform System for two to three years. 

What do I think about the stability of the solution?

We haven't had any crashes or issues with. It's pretty stable.

It requires two to three staff members for the maintenance. We use it occasionally, not on a daily basis. 

How are customer service and technical support?

We rarely contact their technical support. 

How was the initial setup?

Most of the setup is pretty easy. It's just out of the box. It's already there. We set it up by ourselves. 

What other advice do I have?

If you're going to do it, do it seriously. Implement it straight away. Don't do it piecemeal like we've done. Get the best out of it as you can. The trouble is Microsoft has changed their licensing. It's an extremely expensive product now. We are currently paying $200,000 a year for all the different parts of the suite during an ingest model Microsoft now charges us $700,000 a year.

They changed the whole product but and the way it used to be done. You would buy a license per server and now it's an ingest model on data.

I would rate it a four out of ten. To make it a higher score, they should have real application time monitoring. It's not that good. They have to drill back. It's not very interactive. 

Which deployment model are you using for this solution?

Public Cloud
Disclosure: I am a real user, and this review is based on my own experience and opinions.
PeerSpot user
Buyer's Guide
Data Warehouse
September 2022
Find out what your peers are saying about Microsoft, Teradata, Apache and others in Data Warehouse. Updated: September 2022.
635,162 professionals have used our research since 2012.
Delivery Lead at a tech consulting company with 1-10 employees
Real User
Has unique functionality for advanced analytics but could use enhancement in machine learning and AI features
Pros and Cons
  • "The Cube Solution is quite different when compared to the rest of the competition and has unique functionality for advanced analytics."
  • "Functionality needs to be more up-to-date with competing products."
  • "The pricing model needs to be improved."
  • "Machine learning and artificial intelligence capabilities need to be more friendly for beginning users."

What is our primary use case?

My primary use for this product would be as a data warehouse and to do business analysis.  

What is most valuable?

I think each component of the product has its own advantages, but I do not think I should explain every component and instead focus on one that stands out. One thing I have concluded through use is that the Cube Solution is quite different when compared to the rest of the competition and has unique functionality for advanced analytics. Also, the variety of charts that are available for the BI are a nice, functional addition. The rest is probably almost the same as other products in the category.  

I think Microsoft Stack offers the end-to-end solution I need. If I go for other products, they may have the end-to-end functionality but not all of the tools I already have in Microsoft Stack. Some of the other products just cover the ETO (Extract, Transform and Load) part, some of them just cover the visualization. I think that the Microsoft solution does better in truly covering warehousing end-to-end.  

What needs improvement?

With the release of the 2019 version, I think Microsoft Stack has the capability for machine learning and in fact, it can now live in a Linux environment. I have had to deal exclusively with Microsoft technology for about 17 years working with the product and I personally have not deployed that capability, but I think that one thing is a big improvement in potential flexibility. If they can continue to do one important thing with each release like what Oracle is already doing, I think it is good.  

For example, if you do select star (*) from one table, Oracle returns the first 50 results. Microsoft will return all the results regardless of the number of rows in a table. I think these key features and functionality are something that Microsoft should improve because it makes sense how Oracle treats the customer queries. There are a few other improvements that can be made, but I can see key limitations in what Microsoft has in comparison to Oracle. They should concentrate on the most important features and add them.  

In my opinion, the standard technical support at Microsoft should be improved as well. It is not really helping a product to be noticed in the market if they allow support to just remain at the market standard. So I think it should be improved for clients who do not choose to pay for premium services.  

What I would like to see in upcoming releases is improvement in the machine learning and the AI to make it much easier for people to jumpstart their efforts. The foundation for this is probably already there the data platform, but making people able to do machine learning solutions and artificial intelligence very fast would help them have success and become more involved to learn about the technology. So, I would say to make that learning experience as short as possible and provide useful examples. Then that will help.  

For how long have I used the solution?

I have been using the Microsoft Analytics Platform System since 2008, so twelve years.  

What do I think about the stability of the solution?

I have worked with Microsoft products before as an engineer for data platforms, so I do not see many issues with stability. For people who do not have that much knowledge about technology and architecture, I think performance something they might have problems with if they do not design and configure the product properly.  

What do I think about the scalability of the solution?

In my case, the scalability is okay because I know how to work with the architecture and the design. I do not think many people would know that. If someone is coming from a wider experience base and thinks that just because they have worked with other solutions this will work easily, they may end up building something not scalable. So the issue of scalability is not really dependent on the product but is rather is the fault of the design engineer and their knowledge.  

How are customer service and technical support?

I have definitely been in touch with Microsoft's technical support because I have worked for Microsoft before. Because of that, I have got a lot of experience with Microsoft support directly.  

I have worked with Microsoft support in the capacity of premier services. When they provide services to premier customers they definitely need to serve at the highest standard possible. From the escalation standpoint, sometimes users find it very disappointing because it is difficult to get through the initial support level. But when it comes to customer satisfaction overall, I think their services are above average compared with other similar product providers. But, of course, customers need to pay a premium price to get that kind of attention in support in the first place.  

Which solution did I use previously and why did I switch?

In comparing the Microsoft and Oracle products I think the main difference comes down to ease-of-use. I think the Oracle product track and the architecture is designed for people with less depth-of-knowledge about the product. If you do not have knowledge about the Oracle products, generally the product can be maintained and useful because it is designed to work that way. But for Microsoft, if you do not have much knowledge to maintain the database and if you have a very high workload, you will end up having technology that is much more difficult to maintain.  

I think Oracle's trade secret is really incorporating a lot of features inside that were designed for less maintenance and administrative attention. For example, Oracle has something called Materialize View. It is kind of like a local duplication of physical tables. In Microsoft, there is no feature like Materialize View. From a performance perspective, it definitely will have an advantage in performance using local data and fields. Inside Oracle, the way it displays the query results is also a performance advantage. But with Oracle, even if people lack knowledge about writing more complicated PL/SQL script, they will find it easier to use. With Microsoft, if you do not know about how to write a good script, then the experience will not be as easy or as good.  

I think the ease-of-use is why Oracle is much more expensive than Microsoft Stack. But if you are going to be using SQL and scripts on a larger scale in Microsoft, you can end up with quite expensive investment anyway.  

Microsoft needs to change the license structure in my opinion. This is because I think Oracle — when it comes to visualization — has an advantage in terms of the total cost of ownership. Microsoft does not have visualization between virtual SQL and physical SQL, so customers end up paying more if they have multiple visual sequel services.  

How was the initial setup?

Because I am so used to Microsoft technology, I do not find much complexity in the initial setup of these products. I think the setup for Microsoft Stack is quite straightforward. But if somebody does not have much knowledge about the technology and Microsoft, they might try to take more advanced steps. If their configuration is not designed properly, they will end up with a platform that is not able to scale according to their workload. I think that it is a common pitfall in Microsoft technology because people think it is easy because of its friendly interface, but without understanding the product you can not use it to its capability.  

If you do not have considerable experience, it is better to install it with the help of a consultant or integrator. Otherwise, you need to have somebody on your team who is really good on the backend who has the technical knowledge to do it correctly rather than treat it as a simple solution.  

What's my experience with pricing, setup cost, and licensing?

Besides the standard licensing users have to pay additional fees for technical support. The default support I think is just the same as with other products and it has become industry standard to be average. But if you pay the additional premium price for the above-average standard of service, you do experience an enhanced support experience.  

Which other solutions did I evaluate?

I have experience working with business intelligence solutions and data science platforms. The majority of that experience is in working with is Microsoft Azure Stack and Oracle. Really my experience is with the whole Microsoft Technology Stack. I tried to do some research to figure out what is the best tool that I can use to cater to both worlds of data warehousing. The reason for the research came about because of a potential opportunity with a customer that is at the stage of doing the initial build of its data warehouse. It is an initial build but at the same time, they want that solution to be able to drive them to the future of big data analytics.  

So, while I have experience with Stack already and know what it can do, I was comparing newer products which I think are potentially the best to see which is the optimal solution because there are new solutions and technologies on the market. It would be to help achieve an end-to-end data warehouse that is best from data loading to extractions through transformation as well as the visualization using a product that still has strong prospects for future development.  

What other advice do I have?

My advice to people considering this solution is that as a user and administrator you need to know the internal workings of the product. We can downplay that software by simply saying that it is just a database engine like all the other ones without finding out the real capabilities. You need to know the capabilities in-depth to know what sets the product apart from other products and if the features are the features and capabilities that you need.  

On a scale from one to ten where one is the worst and ten is the best, I would rate this product overall as a five. This is probably because there is still a lot of room for improvement, features that other products have that are missing, and a lot of open-source technology nowadays that are very good and people can use instead. I still think five says it is average compared to modern technologies and advancements.  

Which deployment model are you using for this solution?

On-premises
Disclosure: My company has a business relationship with this vendor other than being a customer: Partner
PeerSpot user
SenkoAltic - PeerSpot reviewer
Solution Manager at Erste group
Real User
Easy to install, good connectivity with databases, and good reporting

What is our primary use case?

We are integrating services to collect data and this solution is used to analyze data, prepare results, and then generate reports.

What is most valuable?

The most valuable feature is database connectivity. This solution will connect to any database, you can combine databases, and you can create a cube or tabular model. This includes, for example, a relational model.

What needs improvement?

The flexibility of this solution needs to be improved because you cannot make changes at every one of the different steps.

For how long have I used the solution?

I have been using this solution for five years.

What do I think about the stability of the solution?

This is a stable solution.

What do I think about the scalability of the solution?

We have approximately five hundred users and I expect our usage to increase in two years' time.

How was the initial setup?

The initial setup of this solution is straightforward and we had no issues.

What about the implementation team?

We performed the implementation ourselves.

What other advice do I have?

One needs to continually practice with this solution to keep improving. Every day, there are new challenges.

I would rate this solution an eight out of ten.

Which deployment model are you using for this solution?

On-premises
Disclosure: I am a real user, and this review is based on my own experience and opinions.
PeerSpot user
it_user813411 - PeerSpot reviewer
CTO with 11-50 employees
Reseller
Helps our customers to discover trends, which provides useful information based on their business

What is our primary use case?

We are VAR/integration/development partner for our customers. Multi-dimensional analysis of financial and services data for financial institutions (including banks), telecom, and healthcare. For some financial institutions, we do some mining and machine learning scenarios.

We also combine cube manipulations on mobile devices, like tablet and smartphone, using Power BI.

We have worked with Microsoft Analysis Services from SQL Server 7.0 and with the Analytics Platform System from beginning. 

How has it helped my organization?

We help customers in many ways from customized analysis for detection of anomalies in tax, operations, customer relationship, and marketing campaigns, etc. We also use mining and ML to helping them discover trends, which provides useful information based on their business.

What is most valuable?

It is closely integrated with other products in the MS portfolio.

What needs improvement?

Hybrid environments are complex to manage. We need to support customers frequently, even when they have done many training classes to ensure technology transfer.

Disclosure: My company has a business relationship with this vendor other than being a customer: Reseller.
PeerSpot user
PeerSpot user
Senior Data Architect at a pharma/biotech company with 1,001-5,000 employees
Vendor
APS combines both Microsoft PDW and unstructured Hadoop analytical capabilities in a single, easy-to-manage EDW appliance.

Originally published at https://www.linkedin.com/pulse/microsoft-analytics...

In April 2014, Microsoft announced the future vision for their data and analytics platforms. Microsoft Parallel Data Warehouse (PDW) was rebranded as Microsoft Analytics Platform System (APS) with additional appliance component offerings. APS combines MPP SQL Server data warehouse with HDInsight, Microsoft’s 100% Apache Hadoop component directly into the appliance. APS is a big data analytics appliance capable of analyzing data of any type, structured or unstructured, and of any size.

Microsoft ASP integrates data from SQL Server PDW with unstructured big data from Hadoop through the PolyBase data querying technology. Polybase gives APS a huge advantage over the competition because it you to talk with big data in the regular T-SQL language you already use and understand.

With APS, NoSQL doesn’t replace relational databases. Structured and unstructured data technologies complement one another, and queries can be executed across both universes.

By integrating Hadoop into the same rack as the relational data warehouse, organizations can save on consulting, development and configuration costs for Hadoop with an integrated appliance.

Why Microsoft APS is an EDW Game Changer

While ASP provides many innovations and improvements, four stand out as strategic game changes CIOs should consider when evaluating data warehouse and analytics strategies. SQL Server PDW, HDInsight, Polybase and xVelocity Columnstores as a Platform for Data Mining and Analysis.

By combining both Microsoft Parallel Data Warehouse (PDW) and unstructured Hadoop analytical capabilities in a single, easy-to-manage EDW appliance, Microsoft APS is well positioned to help organizations use information to enhance their competitive position.

Microsoft Parallel Data Warehouse (PDW)

Microsoft SQL Server Parallel Data Warehouse (PDW) and xVelocity Columnstores are covered in my article Microsoft Parallel Data Warehouse (PDW).

HDInsight & Hortonworks

HDInsight is Microsoft’s 100% Apache Hadoop distribution based on Hortonworks Data Platform. HDInsight is the phoenix that emerged from the ashes of Dryad. Dryad was Microsoft’s own proprietary and competing version of Hadoop that Microsoft tinkered with for 5 years before abandoning it.

PolyBase & Big Data Hadoop Integration

It’s not enough to store data in Hadoop. Businesses today need to figure out how they can analyze Hadoop data fast and seamlessly in order to make more informed business decisions. Unstructured and high volume data are the two fastest growing types of enterprise data.

Organizations are using Apache Hadoop to store process non-relational data from sources like blogs, clickstream data that is generated at a rapid rate, social sentiment data with different schemas customer feedback, sensor data, or telemetry data feeds. Most of this data is not suitable for relational database management systems and often ends up isolated from business users because it is not integrated with data in the traditional data warehouse.

Technologies, like Hadoop are generally used, but implementing Hadoop with traditional data warehouse and business intelligence platforms pose new challenges. Hadoop is both open source, Java-based and manages non-relational data across many nodes. It’s easy to add data to Hadoop, but not so quick to extract and analyze it. The idea is that, if the data is there, it may take a while to retrieve it, but at least the data is stored somewhere in the system. MapReduce doesn't have to be implemented in Java, however.

Big Data is not only about figuring out how to store, manage, and analyze data from non-relational sources, but also about mashing together various non-relational data with an organization’s relational data to gain business insight. See my articles on Big Data Data Lakes & Don’t Drown in the Data and 360 Degree View & Unifying Enterprise Data in a Sea.

PolyBase is the Microsoft APS query tool that enables you to easily query PDW and HDInsight data using T-SQL, without investing in Hadoop-based skills or training.

Microsoft PolyBase is a fundamental breakthrough on the data processing engine which enables integrated query across Hadoop and relational data. PolyBase opens up a whole new world of data analysis and integration possibilities. This integration allows organizations to merge large volumes of non-relational data stored within Hadoop with their traditional enterprise data. Customers can continue to use their existing analytics tool set to analyze their organization’s big data.

Without manual intervention, PolyBase Query Processor can accept a standard SQL query and join tables from a relational source with tables from a Hadoop source to return a combined result seamlessly to the user. Queries that run too slow in Hadoop can now run quickly in PDW, data mining queries can combine Hadoop and PDW data, Hadoop data can be stored as relational data in PDW, and query results can be stored back to Hadoop.

By using the power of Microsoft APS to run queries on Hadoop data in HDInsight, it is now possible to do more in-depth data mining, reporting, and analysis without acquiring the skills to run MapReduce queries in Hadoop. PolyBase gives you the flexibility to structure the Hadoop data you need, when you need it, as it’s brought into PDW for fast analysis. You can seamlessly select from both Hadoop data in HDInsight and PDW data in the same query, and join data from both data sources. To satisfy a query, PolyBase transfers data quickly and directly between PDW’s Compute Nodes and Hadoop’s Data Nodes.

APS uses external tables to point to data stored in text files on a Hadoop HDFS cluster. Once an external table is created, the table can be used in a select statement in the same manner as a PDW table. PolyBase uses a single Transact-SQL query interface to leverage PDW and Hadoop, so you don’t need to learn a host of new skills to run MapReduce queries in Hadoop. PolyBase hides all the complexity of using Hadoop so most business users do not need to know anything about Hadoop.

PolyBase uses ‘predicate pushdown’ to Hadoop that generates map-reduce jobs behind the scenes to do the work on the Hadoop side instead of distributed query data movement when necessary.

With PolyBase, organizations can take advantage of flexible hybrid Hadoop solutions and query across Hortonworks, Cloudera, and even into the cloud with Microsoft Azure HDInsight. PolyBase is only available in Microsoft APS and is not available in SQL Server SMP at this time.

Integration with Business Intelligence Tools

APS has deep integration with Microsoft’s BI tools and other leading non-Microsoft tools, making it simple to use the BI tools you are familiar with to perform analysis. APS’s deep integration with Business Intelligence (BI) tools makes APS a comprehensive platform for building end-to-end data mining and analysis solutions. APS integrates with the Microsoft BI Stack including Reporting Services, Analysis Services, PowerPivot for Excel, and PowerView. But, APS also integrates with a growing list of leading non-Microsoft BI platforms, such as Business Objects, Cognos, SAP Data Integrator, Tableau, MicroStrategy, QlikView, Oracle Business Intelligence, and TIBCO Spotfire.

Easy to Use & Manage

APS is designed for simplicity. The complexity is already engineered into the appliance so that you don’t have to handle the details. The appliance arrives with the hardware and software already configured and installed. PDW handles all the plug and play details of distributing the data across the appliance nodes, performs all the extra steps required to process queries in parallel, and manages the low-level hardware and software configuration settings. No tuning is required because the appliance is already built and tuned to balance CPU, memory, I/O, storage, network, and other resources.

Minimal Learning Curve

EPS has a minimal learning curve. There’s no need to hire new talent in order to move from SQL Server SMP to SQL Server PDW and EPS. DBAs who already know T-SQL can easily transfer their SQL Server SMP knowledge to PDW. Some T-SQL query statements are added or extended to accommodate the MPP architecture. There’s less DBA maintenance. You don’t need to create indexes besides a clustered columnstore index. DBAs can spend more of their time as architects and not baby sitters. In my opinion, the alignment of APS with existing IT skills may be its biggest competitive advantage.

The appliance model is key to getting great performance. Tuning a large database using traditional approaches is extremely difficult and requires highly skilled DBAs. One of the main problems with the SMP model, is the difficulty of understanding and tuning the interface between the DBMS software and the underlying OS and hardware platform. With SMP, there are a plethora of tuning parameters and options for the DBA and OS administrator to setup. In the appliance model, the entire software and hardware stack from SQL to storage is automatically controlled. As a result, virtually all the complexity is removed.

Manageable Costs

Microsoft APS has manageable costs. APS has lower price/terabyte over other companies by a significant margin. About 2x lower than Teradata, Oracle, Greenplum and others. It’s worth noting that Microsoft’s offering is cheaper than the competition not because of lower quality or missing capabilities, but because of a different business strategy. The strategy of commoditizing markets and then selling higher volumes to make up for lower margins. Given that SQL Server is one of the most popular enterprise databases on the planet, and APS falls under the SQL umbrella, it has enough of a relative advantage that it could easily become the biggest Big Data appliance player of all.

Microsoft APS & Hub and Spoke Architecture

See my article Microsoft APS & Hub and Spoke Architecture about using Microsoft APS to Build a Hub and Spoke EDW Architecture.

These views are my own and may not necessarily reflect those of my current or previous employers.

Disclosure: I am a real user, and this review is based on my own experience and opinions.
PeerSpot user
it_user852315 - PeerSpot reviewer
it_user852315Senior Technical Recruiter at a training & coaching company with 201-500 employees
Real User

Hi Bru,

Smokin hot stuff! You’ve trimmed my dim. I feel as bright and fresh as your prolific website and blogs!

I have an event that comes from EMS Queue , lets say Student. I have two rules, A and B. Lets say A has no condition, but B has condition student_event.age>10. A does not consume the event but B rule has student_event.consume(). Student event has TTL as 10 mins.
But nice Article Mate! Great Information! Keep up the good work!

Merci Beaucoup,
Sanjay

See all 3 comments
PeerSpot user
Senior Data Architect at a pharma/biotech company with 1,001-5,000 employees
Vendor
It combines the best features of both EDW and decentralized data marts.

Originally published at https://www.linkedin.com/pulse/microsoft-aps-hub-spoke-architecture-stephen-c-folkerts

Using Microsoft APS to Build a Hub and Spoke EDW Architecture

Scalable, cost-effective Enterprise Data Warehouse (EDW) implementation is an elusive goal for many large organizations. Three common approaches include centralized EDW or the ‘monolithic’ approach. Decentralized collections of data marts, and attempts at Hub and Spoke architectures that combine the two. Microsoft APS combines the best features of both EDW and decentralized data marts.

Centralized EDW

The monolithic approach has become more common in large EDW installations. However, centralized EDW tends to be extremely expensive and very inflexible. As a result, business units become frustrated at the inability of the EDW to meet their needs within a reasonable cost and timeframe.

Decentralized EDW & Data Marts

A divide-and-conquer strategy is a natural and effective approach to addressing large-scale problems such as creating an integrated, enterprise-wide data warehouse. Decentralized EDW architectures align well with this approach and fit the many compartmentalized demands presented by large organizations. Decentralized data marts are more responsive to business unit needs, but often result in many versions of the same data that are very difficult to keep consistent across the enterprise. Each approach, centralized or decentralized, tends to evolve or degenerate into the other, but neither is a tenable long-term solution for most large organizations.

Microsoft APS & Hub and Spoke Architecture

A Microsoft APS appliance enables a true ‘hub and spoke’ architecture, where the centrally managed ‘hub’ contains detailed enterprise data, and departments or business units use ‘spokes’ to exchange data with the hub according to their unique schemas. This architecture can exist across many connected SQL Server MPP databases.

Microsoft expands the Hub and Spoke concept to include not only MPP appliances but also standard symmetric multi-processing (SMP) instances of SQL Server and SQL Server Analysis Services, (SSAS), allowing either to be viewed as nodes within a grid. The result is a highly flexible, affordable, and scalable platform that makes large-scale Hub and Spoke EDW architectures a practical reality. They combine the benefits of central control and governance with the agility of decentralized data marts, but without the inherent delivery pains, headaches, risks, and costs associated with previous strategies.

Hub and Spoke Architectures

Hub and Spoke architectures match the structure of most large enterprises by offering a combination of a centralized EDW and a set of dependent data marts. The EDW hub allows the entire enterprise to set and enforce common standards while answering questions that cut across business units. The data mart spokes allow business units to meet their own needs quickly and at relatively low cost while still conforming to the needs of the overall enterprise.

The Hub and Spoke architecture allows business units to set their own budgets and priorities, while contributing as necessary to the central EDW. This close fit between the architecture of the business and the architecture of the DW platform means Hub and Spoke systems are widely regarded as the best overall approach. In practice, Hub and Spoke systems have been notoriously difficult to implement.

Distributing data from a centralized EDW reliably and quickly enough to meet the needs of the business units is a big challenge in the face of growing data volumes. To try to compensate for this, complex and cumbersome ETL processes are developed to transfer data, between the hub and spokes, resulting in high maintenance costs and an inability to change with the business. In general, efforts to build a Hub and Spoke architecture have quickly degenerated into a set of siloed data marts after being torn apart by conflicting business units and requirements.

One response to the difficulties of building a Hub and Spoke architecture has been to simply centralize everything onto one monolithic EDW. A centralized EDW platform quickly becomes overloaded with conflicting use cases. Solving any one problem requires evaluation of all existing dependencies, which drives rigid change control processes and ultimately impacts cost and time-to-delivery for projects. And if virtualized data marts are used, all the queries and I/O, execute physically in the hub. Business units become frustrated by the inability of IT to quickly meet new requirements with the central EDW and start building their own independent physical data marts as a result.

Decentralized EDW

With a decentralized approach, which the other two tend to degenerate into anyway, business units simply build their own independent data marts. Although such an approach is obviously responsive to business needs, it doesn’t allow management to answer cross-enterprise questions easily or quickly. Keeping all copies of data across a decentralized infrastructure current and accurate can become overwhelming. The problem becomes worse as relatively low bandwidth data movement options drive complex data transformations that scale poorly. And it’s very difficult to apply any real measure of enterprise-wide standards, controls or regulatory compliance.

Microsoft’s EDW Platform

A Microsoft APS appliance can be viewed as a highly-specialized grid of servers being pulled together to collectively form an EDW platform. Taking this view, it is a small step to think of PDW as both a grid of appliances and a grid of nodes. Moving data across this grid of appliances is incredibly efficient, since data can be moved directly from node to node within the grid. This maximizes parallelism across the environment and minimizes the conversion overhead associated with export and load operations. Such a grid of appliances can be used to implement a data warehousing Hub and Spoke architecture.

Microsoft expands the Hub and Spoke solution to include not only MPP appliances but also standard SMP instances of SQL Server and SSAS to be viewed as nodes within a grid. A grid of SMP databases and MPP appliances can be used as the basis for any large-scale data warehouse environment or architecture. However, it is particularly suitable for a Hub and Spoke architecture.

MPP for Hub and Spoke

Microsoft PDW with high speed parallel database copy is fundamental to solving one of the most intractable problems in large-scale data warehousing. Building an effective, scalable, and affordable Hub and Spoke solution. The basic idea is to take a divide-and-conquer approach to building an EDW. This avoids performance problems due to conflicts between queries from different business units. Provides a dedicated, high-speed, network interconnecting all hub and spoke databases. And business analysts view the appliance as a set of separate data marts, but can drill into detailed data on the hub where required.

The Microsoft Hub and Spoke Solution

Imagine a fairly large MPP appliance acting as the hub for a set of MPP appliance and SMP database data marts. The hub holds detailed data, probably in a normalized schema, for a number of business units or the entire enterprise. The hub is loaded in near real time or in daily batches from source systems leveraging a preferred ETL solution. Data is then transformed or restructured to a denormalized structure (star, cube, etc.), as needed, and transferred to the appropriate data mart(s) via the high speed grid for consumption by end users. If a data mart requires data from sources that are not covered by the hub, this data is loaded independently using standard ETL tools. However, most of the data required (both fact and dimensions) comes from the hub.

Users connect to the independent data mart appliances as usual for running queries. This allows each data mart to be tuned for the needs of a particular set of users and sized to handle the required level of performance and concurrency. While the data marts can be independently designed to meet the needs of each business, it will be possible to leverage existing data mart applications such as Microsoft Analysis Services, Reporting Services, Excel, or other BI vendor products.

Bandwidth within the grid is large enough to enable the direct copy of detailed fact data or entire data marts. This can greatly simplify the data mart creation and update process by using a publish-subscribe model as opposed to complex transformation logic that, coupled with expensive export and load scenarios, creates significant challenges for traditional federated approaches. The end result is an EDW platform that can handle a very complex workload while being extremely scalable at a sensible cost.

Disaster Recovery and High Availability

The Microsoft EDW platform provides the capability to set alternate database systems within the dedicated high speed network as failover targets. As an example, a user attempting to connect to a spoke that is currently unavailable would automatically be redirected to an alternate spoke specified within the standard connection protocol. This simple approach becomes very powerful when combined with the Hub and Spoke architecture. The high-speed and bandwidth of the grid copy facility allows full copies of end-user data marts to be moved to multiple spokes. This effectively recreates the end-user view of the data on multiple spoke systems, each a valid failover option for the other in an outage scenario.

This concept can also be leveraged across multiple data centers to provide an effective disaster recovery architecture. Individual appliances can be replicated on a second site and automatically kept up-to-date. Note that not all of the appliances on a grid would need to be replicated. In most scenarios only the hubs need to be replicated, as spokes can be recreated from the hubs. This provides the flexibility for each business unit to decide whether or not to provide a disaster recovery capability, based on their own service-level agreements (SLAs).

Microsoft’s Grid-Enablement Strategy

This approach offers customers an attractive alternative to centralized, monolithic approaches. Data marts can be tailored to meet the individual needs of business units (both in terms of capacity and performance). Furthermore, customers can buy into the Microsoft EDW approach with the deployment of a few stand-alone data marts on standard SQL Server SMP reference architectures. From this relatively low-cost start point, you can scale into the hundreds of terabytes while delivering manageable flexibility without sacrificing cost and performance.

Microsoft Analytics Platform System (APS)

See my article Microsoft Analytics Platform System (APS) for a more in-depth look at Microsoft APS.

These views are my own and may not necessarily reflect those of my current or previous employers.

Disclosure: I am a real user, and this review is based on my own experience and opinions.
PeerSpot user