In this age of data, it is growing prominently with each passing day. As the data volume increases, there is a need for something that can store and process such an amount of data more efficiently. The most popular choice to cater to this data need is Snowflake.
Fosfor Spectra is a modern enterprise DataOps platform covering major data connectors and a wide range of transformation capabilities. Spectra’s push-down optimization is the best solutions and services when the source and target of the data is Snowflake. Snowflake is a cloud-based data warehouse or Relational Database Management System (RDBMS).
In general, all RDBMS provides SQL functions to cover common operations like concatenation, count etc. Such functions are known as in-built or system-defined functions.
User-defined function
RDBMS allows its users to perform operations that are not available through the system-defined functions by allowing them to create user-defined functions. User-defined functions are usually written for the requirement of their users. Any complex logic can be implemented through user-defined functions. In other words, user-defined functions provide a mechanism for enhancing the RDBMS server by adding more functions to it.
Expression
Expressions are formulas written within SQL queries to perform some arithmetic, logical or string operations. Simple expressions can be columns, values, or functions. Complex expressions can be formed by joining two or more simple expressions with the help of arithmetic or logical operators.
Importance of UDF (User-defined function) in data integration pipeline
With UDF, Data scientists can:
- Carry out complex transformation logic as per users’ requirements.
- Create a wrapper for the system-defined functions to avoid syntax-related issues during data migration.
- Parameterize UDFs to use the interface at multiple places with different inputs.
Importance of expressions in data integration pipeline
UDFs and expressions are two different things. UDFs can contain multiple statements, but an expression is a single statement. When it is evaluated, it generates a single value.
Expressions in data integrations ensure that software users can:
- Create complex formulas with a name.
- Save the cost of creating UDFs in target environments.
- Use portable expressions.
- Use portable expressions.
Snowflake user-defined functions and expressions
Snowflake currently supports the following languages for creating UDFs:
- SQL
- JavaScript
- Java
These may return scalar or tabular results. For example, one can use Snowflake’s UDF to check if a given year is a leap year:
create or replace function IS_LEAP_YEAR(year INTEGER)
returns BOOLEAN
as
$$
select year % 4 == 0 AND year % 400 != 0 AND year % 100 != 0; $$ ;
Snowflake expressions:
Snowflake SQL follows ANSI SQL standard. Snowflake SQL expressions are a combination of:
- Columns
- Values
- System-defined functions
- User-defined functions
- Operators adhering to the rules provided by ANSI SQL
Some examples of expressions include:
- Expressions to check whether year is leap or not
year % 4 == 0 AND year % 400 != 0 AND year % 100 != 0 - Expression to calculate gross salary:
(basic_salary + hra + (earned_points * bonus)) – income_tax
Snowflake UDFs in Fosfor Spectra:
Spectra provides some predefined Snowflake UDFs that can be created during the Snowflake warehouse or spark cluster initialization in Spectra. Some UDFs are mentioned below.
# | UDF | Description | Run Engine |
1 | SOUNDEX(V String) | Implements Soundex Algorithm for lookup operations | Snowflake, SPARK |
2 | LEVENSHTEIN(STR1 String, STR2 String, FLAG Boolean) | Implements Levenshtein Algorithm for lookup operations | Snowflake, SPARK |
3 | CONVERT(COL float, FROM_DEMO STRING, TO_DEMO STRING) | Covert column value to kilometre, centimetre etc. | Snowflake, SPARK |
4 | UNZIP(binaryArray) | Unzip binary content | SPARK |
5 | DECODE(binaryArray, charset) | Decode content | SPARK |