Navigating Feature Management with Snowflake and Refract, the Fosfor Decision Cloud’s Insight Designer

8 min read

Reading Time: 8 minutes

Introduction

At the heart of data science and analytics lies the critical role of features, acting as a cornerstone for precise decision-making and the development of predictive models. Feature refers to an individual, measurable property or characteristic of the data that is used as input for machine learning models. Despite organization’s earnest efforts to unlock the latent potential within their raw data, expectations often need to be managed. This narrative unfolds as we delve into the powerful capabilities of Snowflake and Refract, the Fosfor Decision Cloud’s Insight Designer, understanding how their seamless combination untangles the complexities and elevates data and feature management to new heights.

Challenges in feature management

Building and managing features in an ever-evolving landscape of data-driven decision-making can be daunting, as it involves navigating various challenges. While features play a vital role in providing valuable insights, organizations often encounter obstacles while trying to leverage them to their full potential. These challenges put the data teams to the test and highlight the need for intelligent solutions. In this context, it is essential to explore the difficulties of managing features and understand the most common problems organizations face while trying to unleash the power of their data.

Limited Data Lineage tracking – Data Lineage is the line of information, unveiling your data’s origin, transformations, and destinations. The absence of Data Lineage hampers transparency and accountability. It becomes exceptionally crucial in feature management, where the origin and evolution of features play a pivotal role in data-driven initiatives. This constraint in tracking Data Lineage can impede transparency, accountability, and the ability to make informed decisions.
Lack of version control – The need for version control introduces a significant challenge. Version control is equivalent to a safety net, ensuring that every iteration, modification, or feature enhancement is systematically documented and accessible. Without this safeguard, organizations face the risk of versioning chaos, where it becomes difficult to track changes, replicate successful models, or maintain consistency across different stages of feature development.
Collaboration bottlenecks – The bottleneck emerges when data engineers offer inconsistent or low-quality data for feature creation, causing difficulties for data scientists and ML engineers in constructing reliable models. This issue is exacerbated by the isolated work of data scientists and ML engineers during the feature engineering phase, leading to duplicated efforts, potential inconsistencies in feature extraction, and delays in model development.
Data quality and consistency – The accuracy of features hinges on ensuring error-free data derivation, as inaccuracies can compromise insights and decision-making. Comprehensive documentation and metadata detailing the data used for feature creation foster transparency and understanding among team members. These practices are indispensable for organizations seeking reliable and impactful feature management, ensuring the robustness and effectiveness of their data-driven initiatives.
Difficulty in model deployment – Integrating features and models into production is intricate, particularly when facing disparities between development and deployment environments. Challenges arise in deploying models at scale due to resource and scalability considerations, impacting the optimization of feature-driven models across diverse use cases. Real-time updates pose additional challenges, requiring prompt reflection of changes without disrupting ongoing processes.

Unlocking new possibilities in feature management with Snowflake and Refract, the Fosfor Decision Cloud’s Insight Designer

Exploring feature management challenges reveals that integrating databases and purpose-driven products is a catalyst for overcoming hurdles, streamlining processes, and enhancing efficiency. Snowflake’s Snowpark, a dynamic database, seamlessly integrates with Refract, a purpose-driven product, forming a powerful alliance that unlocks new possibilities in feature management, data processing, and insights generation.

Snowflake’s Snowpark

In the arena of feature management and database operations, Snowflake’s Snowpark emerges as a game-changing tool, enhancing the capabilities of the Snowflake cloud data platform. It allows for the seamless integration of personalized code, utilizing languages like Java and Scala, directly into the Snowflake environment. This functionality goes beyond traditional data processing, empowering users to address specific challenges related to feature management and database tasks.

Refract, the Fosfor Decision Cloud’s Insight Designer

Refract, an enterprise ML platform, consolidates top ML frameworks and templates to facilitate the preparation, construction, training, and deployment of high-quality Machine Learning (ML) models. This platform ensures a smooth and personalized “Build-to-Run” transition in AI workflows, reducing user effort by up to 70%. It expedites data science, AI, and ML life cycles through no-code automated features, significantly cutting down time and effort on various pre- and post-model development steps, including data provisioning, preparation, managing features, model deployment, governance, monitoring, and more.

Exploring the impact of Refract and Snowflake-Snowpark Integration

Step 1 Enable a suitable role for creating a feature store project

Refract is committed to fostering a cohesive and unified approach, promoting seamless collaboration among data scientists, data engineers, and machine learning engineers within our feature store framework. This commitment is driven by the dual goals of achieving comprehensive end-to-end traceability and facilitating agile and swift iterations across machine learning development stages.

Screen 1

Step 2 Establish a specific use-case context

Creating a use case to predict Customer Lifetime Value. Customer Lifetime Value (CLV) aims to assess the total predicted value a customer is expected to bring to a business over the entire duration of their relationship. CLV is a strategic metric that focuses on understanding and maximizing the long-term revenue potential of each customer.

Screen 2

Step 3 Integrate your feature repository with Refract for centralized management

Refract seamlessly integrates with GitHub, providing a familiar environment for managing and versioning your feature repository. This integration ensures a smooth transition and enhances collaboration across teams.

Screen 3

Step 4 Connect to Snowflake to create an offline/online store

Ensure you have a Snowflake account and the necessary credentials to access your Snowflake instance. If you don’t have an account, sign up for Snowflake and create the required user roles and permissions.

Screen 4

Step 5 Configure your feature repository.

Click on “Configure feature store” to get started.

Screen 5.a

Enter the basic details and select the file.

Screen 5.b

Post configuring the feature store repository into Refract, proceed with the selection of offline and online configurations. Ensure you add the Snowflake connection details that you created.

Screen 5.c

And configure the frequency for materialization.

Screen 5.d

Your feature store is ready to get started. The project comprises of feature view, feature service, entity, and jobs.

Screen 5.e

Step 6 Overview of feature view

Feature view refers to a user interface or representation that provides a comprehensive overview of the features stored within the feature store.

Screen 6.a

Screen 6.b

Entities help organize and structure the features, which are the characteristics or properties of these things. So, when you retrieve features from a feature store, you’re essentially getting information about specific entities in your data, allowing you to understand and analyse their attributes.

Screen 6.c

This page consists of a type of database used to create the feature store. This database serves as a central hub for storing and retrieving features used in machine learning and data science applications. It typically includes mechanisms for version control, metadata storage, and efficient retrieval of features for analysis or model training.

Screen 6.d

Step 7 Overview of feature service

Feature service typically refers to a mechanism or interface that allows users, applications, or machine learning models to interact with and retrieve features from the feature store.

Screen 7.a

Figure 1 :Screen 7.b

Screen 7.c

This screen provides users with all the feature views as a part of the feature service.

Screen 7.d

Step 8 Spin up any notebook

Refract allows the user to create custom coding environment by creating new templates.

Screen 8.a

Through this step users can launch an empty template to set up a notebook environment.

Screen 8.b

Now users can read the feature store project into the new notebook to fetch the historical data.

Screen 8.c

Users can also fetch the historical data by creating the entity dataframe.

Screen 8.d

Fetch the feature service from the feature store and use the entity dataframe.

Screen 8.e

This step involves creating new features or transforming existing ones to enhance the performance of a model.

Screen 8.f

Here we are training a model using Random Forest to predict customer lifetime value.

Screen 8.g

Now finally we are registering the model in Refract as shown below.

Screen 8.h

Step 9 Deploy the model

Here we are deploying the model in Refract. Implement your machine learning model into a production environment, making it operational for real-world use, and enabling it to generate predictions or insights as intended.

Screen 9.a

Once the model is deployed, users can seamlessly measure the metrics of the deployed model.

Screen 9.b

Conclusion

This blog sheds light on the critical aspects of feature management within the dynamic landscape of data-driven environments. The collaborative synergy between Snowflake and Refract, the Fosfor Decision Cloud’s Insight Designer emerges as a powerful solution to address the challenges associated with feature engineering, storage, and utilization. By leveraging Snowflake’s robust data platform and Refract’s feature management capabilities, organizations can establish a seamless and efficient workflow. The integration provides a centralized repository for features, ensuring end-to-end traceability, rapid iterations, and a cohesive strategy across the entire data science and machine learning lifecycle.

Author

Ram Sagar Moghalashetty

Specialist - Data Sciences - Fosfor Decision Cloud

Ram is an experienced Data Scientist and MLOps Engineer with over 9 years of experience. He has a proven track record of a decade and has consistently demonstrated his ability to foster collaboration across diverse teams, which has resulted in successfully delivering impactful AI solutions. His expertise in fine-tuning models and implementing robust MLOps pipelines has allowed him to transform raw data into actionable insights, making Ram a pivotal force in the field of data-driven decision-making.

More on the topic

Read more thought leadership from our team of experts

Building a culture of curiosity to drive better business outcomes

Read this Fosfor-sponsored HBR whitepaper to learn how decision-makers harness the full potential of automation technology to drive curiosity…and innovation

Culture of curiosity

Remember when the world wide web forever changed how people satisfy their curiosity? It was a watershed moment for humanity and the ways we communicate, do business, innovate, and uncover new insights.

Transform curiosity into limitless insight with Fosfor

Fosfor reimagines analytics to deliver decision intelligence to every business user. Going beyond answers, Fosfor instantaneously spots anomalies, trends, and pattern.

Privacy & Cookie policy

Privacy & Cookies policy

Cookie name	Active
sess_map

What is a cookie?

A cookie is a small piece of data that a website asks your browser to store on your computer or mobile device. The cookie allows the website to “remember” your actions or preferences over time. On future visits, this data is then returned to that website to help identify you and your site preferences. Our websites and mobile sites use cookies to give you the best online experience. Most Internet browsers support cookies; however, users can set their browsers to decline certain types of cookies or specific cookies. Further, users can delete cookies at any time.

Why do we use cookies?

We use cookies to learn how you interact with our content and to improve your experience when visiting our website(s). For example, some cookies remember your language or preferences so that you do not have to repeatedly make these choices when you visit one of our websites.

What kind of cookies do we use?

We use the following categories of cookie:

Category 1: Strictly Necessary Cookies

Strictly necessary cookies are those that are essential for our sites to work in the way you have requested. Although many of our sites are open, that is, they do not require registration; we may use strictly necessary cookies to control access to some of our community sites, whitepapers or online events such as webinars; as well as to maintain your session during a single visit. These cookies will need to reset on your browser each time you register or log in to a gated area. If you block these cookies entirely, you may not be able to access gated areas. We may also offer you the choice of a persistent cookie to recognize you as you return to one of our gated sites. If you choose not to use this “remember me” function, you will simply need to log in each time you return.

Cookie Name	Domain / Associated Domain / Third-Party Service	Description	Retention period
__cfduid	Cloudflare	Cookie associated with sites using CloudFlare, used to speed up page load times	1 Year
lidc	linkedin.com	his is a Microsoft MSN 1^st party cookie that ensures the proper functioning of this website.	1 Day
PHPSESSID	ltimindtree.com	Cookies named PHPSESSID only contain a reference to a session stored on the web server	When the browsing session ends
catAccCookies	ltimindtree.com	Cookie set by the UK cookie consent plugin to record that you accept the fact that the site uses cookies.	29 Days
AWSELB		Used to distribute traffic to the website on several servers in order to optimise response times.	2437 Days
JSESSIONID	linkedin.com	Preserves users states across page requests.	334,416 Days
checkForPermission	bidr.io	Determines whether the visitor has accepted the cookie consent box.	1 Day
VISITOR_INFO1_LIVE		Tries to estimate users bandwidth on the pages with integrated YouTube videos.	179 Days

Category 2: Performance Cookies

Performance cookies, often called analytics cookies, collect data from visitors to our sites on a unique, but anonymous basis. The results are reported to us as aggregate numbers and trends. LTI allows third-parties to set performance cookies. We rely on reports to understand our audiences, and improve how our websites work. We use Google Analytics, a web analytics service provided by Google, Inc. (“Google”), which in turn uses performance cookies. Information generated by the cookies about your use of our website will be transmitted to and stored by Google on servers Worldwide. The IP-address, which your browser conveys within the scope of Google Analytics, will not be associated with any other data held by Google. You may refuse the use of cookies by selecting the appropriate settings on your browser. However, you have to note that if you do this, you may not be able to use the full functionality of our website. You can also opt-out from being tracked by Google Analytics from any future instances, by downloading and installing Google Analytics Opt-out Browser Add-on for your current web browser: https://tools.google.com/dlpage/gaoptout & cookiechoices.org and privacy.google.com/businesses

Cookie Name	Domain / Associated Domain / Third-Party Service	Description	Retention period
_ga	ltimindtree.com	Used to identify unique users. Registers a unique ID that is used to generate statistical data on how the visitor uses the web site.	2 years
_gid	ltimindtree.com	This cookie name is asssociated with Google Universal Analytics. This appears to be a new cookie and as of Spring 2017 no information is available from Google. It appears to store and update a unique value for each page visited.	1 day
_gat	ltimindtree.com	Used by Google Analytics to throttle request rate	1 Day

Category 3: Functionality Cookies

We may use site performance cookies to remember your preferences for operational settings on our websites, so as to save you the trouble to reset the preferences every time you visit. For example, the cookie may recognize optimum video streaming speeds, or volume settings, or the order in which you look at comments to a posting on one of our forums. These cookies do not identify you as an individual and we don’t associate the resulting information with a cookie that does.

Cookie Name	Domain / Associated Domain / Third-Party Service	Description	Retention period
lang	ads.linkedin.com	Set by LinkedIn when a webpage contains an embedded “Follow us” panel. Preference cookies enable a website to remember information that changes the way the website behaves or looks, like your preferred language or the region that you are in.	When the browsing session ends
lang	linkedin.com	In most cases it will likely be used to store language preferences, potentially to serve up content in the stored language.	When the browsing session ends
YSC		Registers a unique ID to keep statistics of what videos from Youtube the user has seen.	2,488,902 Days

Category 4: Social Media Cookies

If you use social media or other third-party credentials to log in to our sites, then that other organization may set a cookie that allows that company to recognize you. The social media organization may use that cookie for its own purposes. The Social Media Organization may also show you ads and content from us when you visit its websites.

Ref links:

LinkedIn – https://www.linkedin.com/legal/privacy-policy Twitter – https://gdpr.twitter.com/en.html & https://twitter.com/en/privacy & https://help.twitter.com/en/rules-and-policies/twitter-cookies Facebook – https://www.facebook.com/business/gdpr Also, if you use a social media-sharing button or widget on one of our sites, the social network that created the button will record your action for its own purposes. Please read through each social media organization’s privacy and data protection policy to understand its use of its cookies and the tracking from our sites, and also how to control such cookies and buttons.

Category 5: Targeting/Advertising Cookies

We use tracking and targeting cookies, or ask other companies to do so on our behalf, to send you emails and show you online advertising, which meet your business and professional interests. If you have registered on our websites, we may send you emails, tailored to reflect the interests you have shown during your visits. We ask third-party advertising platforms and technology companies to show you our ads after you leave our sites (retargeting technology). This technology allows us to make our website services more interesting for you. Retargeting cookies are used to record anonymized movement patterns on a website. These patterns are used to tailor banner advertisements to your interests. The data used for retargeting is completely anonymous, and is only used for statistical analysis. No personal data is stored, and the use of the retargeting technology is subject to the applicable statutory data protection regulations. We also work with companies to reach people who have not visited our sites. These companies do not identify you as an individual, instead rely on a variety of other data to show you advertisements, for example, behavior across websites, information about individual devices, and, in some cases, IP addresses. Please refer below table to understand how these third-party websites collect and use information on our behalf and read more about their opt out options.

Cookie Name	Domain / Associated Domain / Third-Party Service	Description	Retention period
BizoID	ads.linkedin.com	These cookies are used to deliver adverts more relevant to you and your interests	183 days
iuuid	demandbase.com	Used to measure the performance and optimization of Demandbase data and reporting	2 years
IDE	doubleclick.net	This cookie carries out information about how the end user uses the website and any advertising that the end user may have seen before visiting the said website.	2,903,481 Days
UserMatchHistory	linkedin.com	This cookie is used to track visitors so that more relevant ads can be presented based on the visitor’s preferences.	60,345 Days
bcookie	linkedin.com	This is a Microsoft MSN 1st party cookie for sharing the content of the website via social media.	2 years
__asc	ltimindtree.com	This cookie is used to collect information on consumer behavior, which is sent to Alexa Analytics.	1 Day
__auc	ltimindtree.com	This cookie is used to collect information on consumer behavior, which is sent to Alexa Analytics.	1 Year
_gcl_au	ltimindtree.com	Used by Google AdSense for experimenting with advertisement efficiency across websites using their services.	3 Months
bscookie	linkedin.com	Used by the social networking service, LinkedIn, for tracking the use of embedded services.	2 years
tempToken	app.mirabelsmarketingmanager.com		When the browsing session ends
ELOQUA	eloqua.com	Registers a unique ID that identifies the user’s device upon return visits. Used for auto -populating forms and to validate if a certain contact is registered to an email group .	2 Years
ELQSTATUS	eloqua.com	Used to auto -populate forms and validate if a given contact has subscribed to an email group. The cookies only set if the user allows tracking .	2 Years
IDE	doubleclick.net	Used by Google Double Click to register and report the website user’s actions after viewing clicking one of the advertiser’s ads with the purpose of measuring the efficiency of an ad and to present targeted ads to the user.	1 Year
NID	google.com	Registers a unique ID that identifies a returning user’s device. The ID is used for targeted ads.	6 Months
PREF	youtube.com	Registers a unique ID that is used by Google to keep statistics of how the visitor uses YouTube videos across different web sites.	8 months
test_cookie	doubleclick.net	This cookie is set by DoubleClick (which is owned by Google) to determine if the website visitor’s browser supports cookies.	1,073,201 Days
UserMatchHistory	linkedin.com	Used to track visitors on multiple websites, in order to present relevant advertisement based on the visitor’s preferences.	29 days
VISITOR_INFO1_LIVE	youtube.com		179 days

Third party companies	Purpose	Applicable Privacy/Cookie Policy Link
Alexa	Show targeted, relevant advertisements	https://www.oracle.com/legal/privacy/marketing-cloud-data-cloud-privacy-policy.html To opt out: http://www.bluekai.com/consumers.php#optout
Eloqua	Personalized email based interactions	https://www.oracle.com/legal/privacy/marketing-cloud-data-cloud-privacy-policy.html To opt out: https://www.oracle.com/marketingcloud/opt-status.html
CrazyEgg	CrazyEgg provides visualization of visits to website.	https://help.crazyegg.com/article/165-crazy-eggs-gdpr-readiness Opt Out: DAA: https://www.crazyegg.com/opt-out
DemandBase	Show targeted, relevant advertisements	https://www.demandbase.com/privacy-policy/ Opt out: DAA: http://www.aboutads.info/choices/
LinkedIn	Show targeted, relevant advertisements and re-targeted advertisements to visitors of LTI websites	https://www.linkedin.com/legal/privacy-policy Opt-out: https://www.linkedin.com/help/linkedin/answer/62931/manage-advertising-preferences
Google	Show targeted, relevant advertisements and re-targeted advertisements to visitors of LTI websites	https://policies.google.com/privacy Opt Out: https://adssettings.google.com/ NAI: http://optout.networkadvertising.org/ DAA: http://optout.aboutads.info/
Facebook	Show targeted, relevant advertisements	https://www.facebook.com/privacy/explanation Opt Out: https://www.facebook.com/help/568137493302217
Youtube	Show targeted, relevant advertisements. Show embedded videos on LTI websites	https://policies.google.com/privacy Opt Out: https://adssettings.google.com/ NAI: http://optout.networkadvertising.org/ DAA: http://optout.aboutads.info/
Twitter	Show targeted, relevant advertisements and re-targeted advertisements to visitors of LTI websites	https://twitter.com/en/privacy Opt out: https://twitter.com/personalization DAA: http://optout.aboutads.info/

Save settings

Overview

Partners

What’s hot

Industries

Roles

Knowledge hub

About Fosfor

The Fosfor Decision Cloud