What are the prerequisites for connecting to Azure Data Lake Storage using TIBCO Data Virtualization's inbuilt adapter?

What are the prerequisites for connecting to Azure Data Lake Storage using TIBCO Data Virtualization's inbuilt adapter?

book

Article ID: KB0073143

calendar_today

Updated On:

Products Versions
TIBCO Data Virtualization 8.4.x

Description

The below fields are required when a new data source is created using an Azure Data Lake Storage adapter:

1) URI
2) Application (Client) ID
3) Client Secret
4) Refresh Url 

Issue/Introduction

This article explains how to obtain all the necessary parameters to connect to the Azure Data Lake Storage using the TIBCO Data Virtualization's newly available adapter in TIBCO Data Virtualization 8.4. This article provides detailed step by step process in order to collect all the required parameters.

Environment

All supported environment

Resolution

The below steps can be followed for reference to get the details about the required fields:

1) Begin with creating a storage account, for this an ADLS Gen 2 account can be considered. Give it a name e.g. ADLS_Test1

User-added image

2) Create a container under ADLS_Test1 with the name 'gen2container' (name could be anything) and grant it a public access level as 'container' (anonymous read access for containers and blobs). 
3) Under the 'ADLS_Test1' storage account the newly created container can be seen.
4) Go to 'ADLS_Test1' >> Access control (IAM) >> Add role assignments' and select 'Storage Blob Data Reader'. Assign access to 'User assigned managed identity', Subscription as 'Azure subscription 1' and save it.

User-added image

5) Repeat the same steps mentioned above to add a few more roles to this user such as 'Storage Blob Data Owner', 'Storage Blob Data Contributor', 'Owner', and lastly 'Storage Queue Data Contributor'.
6) Create a fresh App registration by clicking on New Registration, Name: AppReg_Test1.

User-added image

7) Look for the ' Client ID' that is required for the data source connection.

User-added image

8) Go to Certificates & secrets and then Add a client secret: Testclientsecret and add it. Copy the value for the ' Client Secrets'.

User-added image

User-added image

9) For ' Refresh Url', go to Azure Active Directory >> App registrations >> ADLS_Test1 >> Endpoints >> copied OAuth 2.0 token endpoint (v1).

User-added image

All the three parameters have been collected: Client ID, Client Secret, Refresh Url

For the last parameter, i.e.  URI

The URI syntax for Data Lake Storage Gen2 is dependent on whether or not your storage account is set up to have Data Lake Storage Gen2 as the default file system.
If the Data Lake Storage Gen2 capable account you wish to address is not set as the default file system during account creation, then the shorthand URI syntax is:

abfs[s]1://<file_system>2@<account_name>3.dfs.core.windows.net

1) Scheme identifier: The abfs protocol is used as the scheme identifier. You have the option to connect with or without a Transport Layer Security (TLS), previously known as Secure Sockets Layer (SSL), connection. Use abfss to connect with a TLS connection.
2) File system: The parent location that holds the files and folders. This is the same as Containers in the Azure Storage Blobs service.
3) Account name: The name was given to your storage account during creation.

Note: 
1) Connecting to Azure Data Lake Storage is only possible in TIBCO Data Virtualization 8.4.x since it has an inbuilt adapter for Azure Data Lake Storage. The adapter is not available for the prior versions.
2) A  Microsoft Azure account is a must-have. 
3) TIBCO Data Virtualization does not support BLOB storage as a storage account.

Additional Information

1) https://docs.cloudera.com/documentation/enterprise/5-12-x/topics/admin_adls_config.html
2) https://docs.microsoft.com/en-us/azure/storage/blobs/data-lake-storage-introduction-abfs-uri