Wednesday 20 July 2011

BI on SQL Azure


Introduction

A newly emerged concept ‘cloud computing’ is a phenomenon which is quickly embedding into all sectors of IT industry. Data Warehouse (DW) and Business Intelligence (BI) is no exception and are already touching it. It’s right time to identify the potential of existing DW/BI applications post migration to Cloud. Below we will try to cover DW/BI’s compatibility with cloud. ETL, OLAP cube, physical architecture and reporting tools are the major DW/BI processes to be considered and kept in mind before migration. Below we are trying to map each of these processes with cloud respectively.
·         ETL (Extract, Transform and Load) is mostly used process in DW. Here we generally receive large files in week-ends or out of business hours to be processed and loaded. It may be better to use the resources according to demand/requirement for the ETL instead of fixed capacity.
·         OLAP (Online Analytical Processing) cubes are highly intensive to compute and require stronger processing. High Performance Computing and scalability is the main feature of cloud and it is aligned to OLAP cubes requirements.
·         Massively Parallel Processing, back up valuable data, mirrored data and high availability are  the architecture needs in DW/BI environment.SQL Azure platform have built in  provisions for a architecture, with no physical administrator, high availability and built in fault tolerance.
·         There are lots of reporting products/tools used in DW/BI environment. There isn’t a lot of existing support for all tools in cloud but Microsoft is continuously enhancing and improving SQL Azure as a platform as a service to support reporting and all other BI needs.

What can be migrated to cloud?

Now the question comes what can be migrated to cloud? The existing STAR/Snowflake schema design can be migrated as it is to the cloud. Storage allocations could be a challenge. In order to overcome this challenge, existing database’s space usage should be calculated and same amount of space should be requested on the cloud. The amount of data that can be stored in SQL Azure is unlimited but since there is a limit of 50 GB for size of business database, in-order to migrate data which is more than 50 GB, we need to break it in parts and keep it across multiple databases each of 50 GB. Now if you are thinking how this data which is kept across multiple databases can be accessed. The solution lies in skills to generate queries. Parallel queries can be used to access this data.
Once a highly scalable database infrastructure is setup on SQL Azure platform, we can use traditional bcp tool or bcp utility of SQL 2008 R2 to move data from the existing on-premise (headquarters) DW to SQL Azure. SQL Server 2008 R2 Import and Export Data Wizard can also be used. SQL Server Migration Assistant (SSMA for Access v4.2) supports migrating your schema and data from Microsoft Access to SQL Azure.
Once we load DW data on cloud then we need a continuous process to load day to day operational data from various sources on cloud. We could build an ETL framework between office (on premise) and cloud. On premise ETL framework (SSIS packages) should be used to sync data as cloud doesn’t support Business Intelligence Development Studio for now. There could be various choices to sync data and one with minimum cost should always be preferred.

·         SQL Azure to SQL Azure synchronization
·         Office Head Quarter (on-premise) to cloud
·         Cloud to Office Head Quarter (on-premise)
·         Bi-directional

Alternatively, Windows Azure Appfabric Integration can be used as an ETL platform. Of course it is not a good choice but alternate option to use Appfabric integration instead of SSIS packages.

At this time there is no direct support for OLAP and CUBE processing on SQL Azure; alternatively with the HPC (High Performance Computing) attributes using multiple Worker roles, manually aggregation of the data can be achieved. It is interesting to see how OLAP cube processing can fit on cloud? Grid computing attribute of cloud can be used in processing of OLAP cubes.
Grid computing is a type of parallel and distributed system that enables the sharing, selection, and aggregation of geographically distributed "autonomous" resources dynamically at run time depending on the work load and availability requirements. We could implement Grid Computing in a cloud computing environment and it is a valid to say that "every cloud is a grid". With appropriate support from Cloud Computing Platform and with work load management, a Cloud platform can be effectively used as a Grid to effectively process the OLAP. It is not so worthy as lots of task similar to BI engine needs to be created. It is just an idea that HPC is an important feature of cloud and same can be utilized to process OLAP cube.
SQL Server Reporting Services tool is supported by cloud but it is in early stage. Limited CTP of the new SQL Azure Reporting is kicked off late last year and it is getting a good response. We can build as well as use these reports in cloud with a small limitation that SQL Azure Reporting currently can connect only to SQL Azure database. I believe SQL Server Reporting Services tool should have a great feature on SQL Azure. However, SQL Azure is supporting Excel and Power pivot which makes it very useful and powerful.



Suitability

SQL Azure is suitable for small and medium size (SMB) companies. It can also be suitable for other applications/scenarios which are listed below:
·         It can be used for Proof of concepts (POCs) for simple applications.
·         New Start up firms can use SQL Azure to avoid initial capital investments on hardware and software licenses.
·         Educational and Training institute can use SQL Azure to reduce their infrastructure cost.
·         Generally in DW data comes from multiple sources and there might be a need to share some data across customer, partners, vendors and employee. SQL Azure suits best here. The consolidated data should be pushed in SQL Azure and then share across customer, partners, vendors and employee.
·         It can also used by the organizations which use their application on quarterly/half yearly/yearly interval. CBSE/ICSE and state boards declare their result annually, IT computation and declaration are done once in a year. Here SQL Azure should be the best choice.

Limitation

A limitation of SQL Azure Database that cannot be missed to mention is that it doesn’t support all of the features and data types that are found in SQL Server. Analysis Services, Replication, and Service Broker, of Business Intelligence Development Studio (BIDS) are the services that are not currently provided on the Windows Azure platform.

Conclusion

·         On Premises ETL solution can be used on SQL Azure.
·         There is very limited support for OLAP as of now. All BI components of Business Intelligence Development Studio (BIDS) should be supported by cloud including usage based aggregations, portioning etc. These value added features are desired and we expect Microsoft should implement them.
·         SQL Azure should support all features which are available in normal database version.
·         Elasticity, Scalability and Reliability supports physical architecture required for DW/BI.
·         SSRS, Excel and Power Pivot reports are supported on SQL Azure.
·         SQL Azure can be used by small and medium level organization.
·         It can also be used for high volume of data by overcoming the size limitation of SQL Azure database.
·         As per current scenario Microsoft Cloud Platform seems to be positioned as one of the leading platforms for DW/BI because it demonstrates great potential in early stage. We expect a great future of SQL Azure for DW/BI if Microsoft promises to enhance it continuously to overcome its limitation.