Data Virtualization in SQL Server 2019

SQL Server 2019 CTP 2.2 has lots of enhancement in particular Data Warehouse(DW) and Big Data management. This post is not a “what’s new in…” post so after some brief explanations we’re going to create “virtual tables” in SQL Server that represent MongoDB table and will be able to query from SQL Server to data stored in MongoDB.

We can explain DW in short, a historical, detailed data repository of the business that holds many subject areas to perform relational queries and analyzing.

So DW is actually a central management of combined data collected from various sources in a sense.

Let me put it that way, data subsets which located in disparate DB servers may be stored on multiple sources like relational or non-relational systems and may have a different schema or format. As result, we need to transfer that intense data into database of DW for analyzing.

With tradetional way, we build an ETL(Extract,Load,Transform) process to transfer data into Data Warehouse for business requirements and needs. This is mostly a heavy, costly, time-consuming and complex process to achieve. Note that this is most known “bash processing” of ETL way. Other ways of ETL have pretty modern process to manage data however this post is not completely about ETL.

Well, Data Virtualization eliminates ETL process. No need to extract data, move, transform and the other processes.

In this post, I’m gonna show you how to virtualize data which stored MongoDB and publish it in SQL Server 2019. 

First off, get ready to MongoDB Server and create non-sql data. If you install MongoDB from scratch, you must configure TLS to provide secure connection to SQL Server. In my environment, MongoDB works on Ubuntu 18.04 OS.

#cd /etc/ssl/
#openssl req -newkey rsa:2048 -new -x509 -days 365 -nodes -out mongodb-cert.crt -keyout mongodb-cert.key
#cat mongodb-cert.key mongodb-cert.crt > mongodb.pem

We created a SSL certificate and afterwards run MongoDB with that certificate:

#mongod -port 27017 -bind_ip localhost,172.19.5.205,live.mongo.local -dbpath /var/lib/mongodb -sslMode requireSSL -sslPEMKeyFile /etc/ssl/mongodb.pem

MongoDB server is now running on 27017 port. Then create an admin user on MongoDB that will be used in connection setting on SQL Server side:

Optionally you can create a root user as well:

db.createUser(
{
user: “mongo-root”,
pwd: “Mongo”,
roles: [ { role: “root”, db: “admin” } ]
}
)

Next, create a database and collection (table) then insert some data. For those who don’t like command prompt, Robo 3T is a nice GUI to manage MongoDB server. My database name is “Partner_Company” and collection name is “Partner_Employees”. Let’s insert some data:

Here is data in MongoDB:

Now we can start to configure SQL server to access this data. On SQL Server side, be sure region settings of Windows Server OS must be ENG and SQL Polybase services are running state.

—Check Polybase:
exec sp_configure ‘polybase enabled’

—Enable polybase feature:
exec sp_configure @configname = ‘polybase enabled’, @configvalue = 1;
RECONFIGURE;

Note that SQL Server service then must be restarted.

—Credential secret:
USE Company
go
CREATE MASTER KEY ENCRYPTION BY PASSWORD = ‘passWord1’;

—MongoDB Credential:
USE Company
go
CREATE DATABASE SCOPED CREDENTIAL MongoDB
WITH IDENTITY = ‘mongoadmin’, Secret = ‘passwordX’;  —user credentials we created in MongoDB before.

—MongoDB Data Source:
USE Company
go
CREATE EXTERNAL DATA SOURCE MongoDBDS
WITH (
LOCATION = ‘mongodb://172.19.5.205:27017’,
— PUSHDOWN = ON | OFF,
CREDENTIAL = MongoDB
);

Before creating virtual table, let’s have a look column types in MongoDB:

Lastly, create virtual table and columns that correspond to the collection in MongoDB:

That’s it! Data in MongoDB is virtualized on SQL Server now. Eventually we can query to partner_employee table through SQL Server.

As a matter of fact, we combined two different platform and created a unified data platform.

As well as MongoDB support, SQL Server 2019 Polybase External tables provides to access data from Spark, Oracle, Teradata and any ODBC data source.