Skip to main content

Query Excel file data using SQL Server PolyBase

What is PolyBase?

PloyBase is the technology used to connect external data sources to the SQL server. It can be used to query tables or files in an external data source. Initially, this was introduced with SQL Server 2016. At that time, it was limited to a few external data sources such as Big data clusters and Azure Blog Storage data. However, in SQL Server 2019 version it was expanded to add more data sources.

The following data sources can connect using SQL Server 2019 instance.

  • Big data clusters
  • Azure Blob storage
  • Relational Database (ex: SQL Server, Oracle, Teradata)
  • NoSQL (ex: MongoDB)
  • ODBC (ex: Excel)

 

1 Polybase basic hierarchy
                                                                         Polybase basic hierarchy

 

How to configure PolyBase in SQL Server?

You can install PolyBase with a new SQL Server instance or you can add PolyBase feature to an existing SQL Server instance. In both scenarios, you need to select PolyBase Query Service for External Data and Java connector for the HDFS data source (Not mandatory).

 

2 Add Polybase to SQL Server
                                                                       Add Polybase to SQL Server

 

 

After installation complete, you need to go to the Services and check the following new services you started. If not running, you need to start these two services manually.

 

3 PolyBase services
                                                                               PolyBase services

 

To confirm that the PolyBase installation is successful, you can run the following command in SQL Server Management Studio new query window.

SELECT SERVERPROPERTY ('IsPolyBaseInstalled') AS IsSuccessfullyInstalled;

If the installation is successful, it will display ‘1’ in the results window.

 

Query excel file using PolyBase

SQL Server 2019 introduced a facility to connect Excel file to SQL Server Using PolyBase via ODBC driver. Before starting, you need to make sure that the correct ODBC driver is installed on your computer. To check that you need to open the ODBC Data Source Administrator application. In the ODBC Data source driver list, the following driver should be installed. If not, you can download it from the here.

 

4 ODBC driver installation
                                                              ODBC driver installation

 

To read excel file data using SQL Server PolyBase, you need to follow the below steps.

  • First, create a new SQL Server Database.
  • Then, generate the Master Key for above created database using the following command.
CREATE MASTER KEY ENCRYPTION BY PASSWORD = '<Enter new Password>;

GO
  • Next step is to create a new excel file or download a sample excel file from the web (For this example, download the excel file from here)

 

  • After downloading the excel file, create a new External data source for it using the following command. You should give the correct excel file path for CONNECTION_OPTIONS. For the password, you can use the previously created password for the master key.

 

 

OPEN MASTER KEY DECRYPTION BY PASSWORD = '<Enter created master key Password >';
GO
CREATE EXTERNAL DATA SOURCE FinancialSample WITH
(
    LOCATION = 'odbc://noplace',

  CONNECTION_OPTIONS = 'Driver={Microsoft Excel Driver (*.xls, *.xlsx, *.xlsm, *.xlsb)}; DBQ=D:\PolyBase\FinancialSample.xlsx'
 );

 

  • If this command is successfully executed, it should create a new Data Source in External Data Source section in the created database.

 

5 Created external data source
     Created external data source

 

  • Next, you need to generate an external table for the excel file. For that, you can use the following command. Excel sheet column names and Table column names should match each other, data types should match each other as well. For the Location, you have to use the name of the excel sheet.

 

  CREATE EXTERNAL TABLE dbo.FinancialSample

    (

        Segment NVARCHAR(255),
        Country NVARCHAR(255),
        Product NVARCHAR(255),
        DiscountBand NVARCHAR(255),
        UnitsSold FLOAT(53),
        ManufacturingPrice FLOAT(53),
        SalePrice FLOAT(53),
        GrossSales FLOAT(53),     
        COGS FLOAT(53),
        Profit FLOAT(53),
        MonthNumber FLOAT(53),
        MonthName NVARCHAR(255),
        Year NVARCHAR(255)
    )
    WITH
    (
        LOCATION = '[Sheet1$]',
        DATA_SOURCE = FinancialSample
    );

 

  • When this is successfully executed, it should display the new External Table in your created database.

 

6 Created external table
             Created external table

 

 

  • Now you can Query Excel file data using the SQL Server. To do that you can run the following select command for the created external table. If this is executed successfully, the result section should display excel file data.

 

SELECT  [Segment]

      ,[Country]
      ,[Product]
      ,[DiscountBand]
      ,[UnitsSold]
      ,[ManufacturingPrice]
      ,[SalePrice]
      ,[GrossSales]
      ,[COGS]
      ,[Profit]
      ,[MonthNumber]
      ,[MonthName]
      ,[Year]

  FROM [PolyBaseDemo].[dbo].[FinancialSample]

 

7 Query Results window
                                                                                           Query Results window

 

  • If you change data in your excel file and rerun the above select query, those changes will appear in the results window. There’s no need to run any other Import query.

 

Importance

You are reading these data from the originally located excel file. You’re not importing data to SQL Server Database using any other external method. Therefore, you can save the data importing time. Also, you can query the real time data in the file. So, this is very important when working with big data.

 

 


LEAVE A COMMENT







POST COMMENTS


© 2020 Creative Software. All Rights Reserved | Privacy | Terms of Use