Data Blocks Layer

      Data Blocks Layer


        Article summary

        Introduction

        Assette Data Blocks Layer (DBL) is designed to grab data from different sources and expose data for consumption by the Assette Data Objects Layer (DOL) as well as by certain application functionality (via special data blocks called system data blocks). Data Block Layer can grab data from different data sources such as APIs, databases (Snowflake, SQL Server, etc.), etc. Then this data can be transformed using a data processing pipeline and exposed to the Data Object Layer (DOL) that is used to construct presentation-oriented (and marketer-friendly) datasets called data objects.

        Assette DBL data pipelines are developed using the pipe and filter architecture style. These data pipelines are always started with an interface block that is responsible for bringing in data into the pipeline. Then there can be any number of other blocks to transform this data before making them available for the consumption by data objects. It is also possible to branch the pipelines creating a tree structure, so that the final output is generated by combining the outputs of more than one branch.

        In addition to the main blocks that are part of the data pipeline, there is another type of block called "helper blocks”, that live outside of the data pipeline. These blocks are called helper blocks, since their primary purpose is to help other blocks (interface blocks, transformation blocks, etc.). Helper blocks can be shared among many data blocks.

        Fig. 01: Assette Data Block Layer Data Pipelines

        Assette Data Block Platform

        The Data Blocks Platform includes the below: 

        • Specific version of Python Interpreter 

        • Specific versions of Python packages (Open Source, Commercial or Assette Provided) that were selected by Assette to be supported in DBL. 

        • Platform features such as dependency management, block execution, metadata management, diagnostics, and monitoring, etc. 

         There will be two versions of Data Block Platforms active at any given time. Data Block developers can develop and test their data blocks pointing to any of these platform versions. However, only one version of these platforms will be considered as the main platform version and will be used for all execution purposes.

        When a new version is released, Assette will retire the older version. The retirement date will be communicated to you in advance. If you have any data blocks based on the older version (the to-be-retired version), these blocks must be updated to a newer version before the retirement date. This is like upgrading your Python code when a new interpreter version is available.

        Data Blocks Classes

        All data blocks in Data Blocks Layer can be classified into three classes.

        • Assette Data Blocks 

        These are data blocks developed and maintained by Assette. Typically, these blocks cover common industry display rules and calculations, so developers can avoid spending time on writing code common to industry needs (and instead focus on client’s specific content needs). Use of these blocks is optional. You may even copy these blocks, customize, and create your own blocks (i.e. create Client Written Data Blocks). Any updates Assette does to these blocks are automatically deployed to all clients and versioned. It is up to the client to use any new version. All versions are always available.

        • System Data Blocks 

        These data blocks are required for the Assette Application to work. Some of these system blocks are mandatory, while others are required based on the application modules used by a client.

        Clients must map certain system data blocks to client’s inputs (data fields). Assette will never introduce any breaking changes to system data blocks. However, it is possible to introduce new parameters or columns, that the client can take advantage of. Any changes are released as a new version, and all versions are always available.

        • Client Written Data Blocks 

        Data Blocks written and maintained by the client or a third-party vendor. 

        Data Block Types

        The data block type defines what type of work the block does. The data block definition defines how exactly the block does the work. The data block definition can be Python code or a JSON document that defines a set of tasks.

        Both Assette Data Blocks and Client Written Data Blocks fall into these types.

        There are five data block types:

        • Interfaces

        API Call – Retrieve data from an API.

        Mongo Database Call – Retrieve data from Mongo DB.

        Snowflake Database Call – Retrieve data from Snowflake.

        SQL Server Database Call – Retrieve data from SQL Server Database.

        Local Database – Retrieve data from the local (client specific) database.

        Constant – Define the block output as a static JSON document.

        • Transformations

        Python – Python blocks are to transform the outputs of one or more other blocks and generate its output.

        Mapping – Mapping blocks support simple data transformations without having to write Python code.

        Caching – Caching blocks support caching data for a fixed period. The data will be expired after this period. These blocks are used to locally cache the outputs of interface or calculation blocks to improve performance.

        Text Template – These blocks are used to generate text outputs using the outputs of other data blocks using the standard Jinja text template engine.

        OpenAI These blocks can be used to interface with OpenAI for any AI-driven tasks.

        Investment Concept – This is a special type of Python block that returns “investment facts” about common concepts such as attribution, purchases/sales, etc. or any custom concept. The investments facts are returned as a set of key-value pairs. The main difference of this type is, these blocks are augmented with more metadata, so that you can use the output of these blocks to generate language using Language Fragment blocks.

        Language Fragment – These blocks are responsible for generating language using OpenAI. These are special type of data blocks maintained using Language Fragments UI in Authoring Center and therefore are read-only in the data block editor.

        • Decorators 

        Decorator blocks are used to apply reusable data transformations to many blocks. These blocks are created once and can be used many times to transform the output of any other blocks. Assette Data Blocks that cover features like major and minor ticks in charts, dropping duplicate periods, etc. are implemented as decorator blocks. 

        •  Specialized 

        Attribute Master – This is a specialized block type (in the System Data Block class) designed to create a single system data block that reads configurations from the Attribute Master table in client database (in the Assette application) and expose attribute values. Since this functionality cannot be implemented using any of the above generic blocks, a special block type has been introduced. 

        • Helpers 

        Settings – Keep the settings of other data blocks. These settings include URLs, database names, credentials, etc. Since these blocks contain sensitive data, the contents of these blocks are encrypted before storing in the database. Settings blocks are useful to configure interface blocks, OpenAI blocks, etc. 

        Text Settings – Text settings blocks store text files that can be used to configure other blocks. These blocks are usually used to store PEM certificates to securely access Snowflake accounts. 

        Python Environment – These are special types of configuration blocks that are used to store lists of Python packages that should be imported before executing a Python block. This list should always be a subset of the full list of Python packages supported by the current version of the Data Block Platform. 

        Client Python Library – These blocks enable clients to write reusable Python functions and classes. When these blocks are added as dependencies to other Python blocks, all functions/classes defined in these blocks are automatically imported to those blocks. 

        Data Block Output Types

        Data blocks can be classified by output type as below,

        Data Table – Return the output in table form (as a list of records). This is the most widely used data block output type.

        Text – Return the output in text form. Language Fragments are a specialized form of data blocks that return OpenAI generated language. In addition to these, the data blocks that return account name, product name, as of date, etc. come under this.

        Values – Return the output as a set of key-value pairs. Investment concepts are a special type of data block that returns this type of output. In addition to that, the data blocks that return answers to a set of DDQ questions are in this form.

        Binary – Returns the output in the form of a binary file (a byte array and its mime type). The data blocks that fetch pre-created images (e.g. photos, investment process visual, etc.) from an external CMS or DAM system are examples of this type of data block.

        None – All helper blocks that live outside of the data pipeline are of this type.

        In addition to the main output, the data blocks that output data tables, text and binary data can return additional information as a set of key-value pairs. These are called output variables. For example, a data block that perform calculations based on account benchmarks can return those benchmark names as output variables, a data block that calculate performance can return inception date as an output variable.