Supporting data sovereignty on AI platforms: A primer

Reading Time: 4 minutes

Introduction

One of the critical challenges organizations face while building a new AI platform is considering whether the platform meets the data sovereignty and data protection regulations of the operating country. This is more crucial now than ever, as most organizations are amid their cloud adoption journeys.

In this blog, I will explain:

  • What is data sovereignty?
  • Why is data sovereignty essential?
  • What are the business and architectural challenges in meeting data sovereignty regulations?
  • How does Refract, the Insight Designer module of the Fosfor Decision Cloud, help clients build AI platforms in line with data sovereignty requirements?

What is data sovereignty?

Data sovereignty is a concept wherein a nation or legal jurisdiction possesses the prerogative and entitlement to oversee and manage data originating within its confines. Under this principle, the government holds the authority to administer the acquisition, retention, manipulation, and dissemination of data that has its origins within the geographical boundaries of the nation.

For example, in the case of a multi-national corporation, operations would typically span countries with varying levels of data regulations, and in some cases even require the compute cycles to be executed locally within the country.

Let’s assume a client has operations across Europe, wherein all the Schengen countries within Europe except for one country (For e.g., country 3) can share the same infrastructure, storage, and compute. In this case, since country 3 has a stricter data sovereignty regulation, data collected, stored, or processed locally cannot be accessed outside country 3.

If an enterprise has operations in three countries, where country 1 and country 2 are Schengen countries governed by the same data regulations, and country 3 has stricter laws that mandate all the data storage, compute, and consumption happen locally in the country, users will have to adapt a distributed architecture. Let’s assume we have three users; User 1 and 2 belong to country 1 and 2, respectively, while user 3 belongs to country 3. As illustrated in Figure 1 below, while the workloads of user 1 and 2 could be pushed to a shared infrastructure, user 3’s workload should be pushed to a dedicated infrastructure.


Figure 1: Shared & dedicated infrastructure architecture

Why is data sovereignty essential?

As enterprises generate vast volumes of data via various channels such as eCommerce, mobile devices, and social media, there is a considerable responsibility for safeguarding this massive data collection. With an evolving presence in laws and regulations across nations, data sovereignty ensures that sensitive data–such as personal information or trade secrets–aren’t easily abused by cybercriminals.
Data sovereignty also provides companies willing to comply with local regulations, a competitive advantage over peers. This is particularly true as compliance demonstrates a commitment to protecting customer data, building trust with customers, and gaining an edge over those who disregard data security.

It is essential to note that enterprises need to ensure they meet the data sovereignty requirements of the countries they operate in, or else they face the risk of huge penalties and reputational risk.

What are the business and architectural challenges in meeting data sovereignty regulations?

Meeting data sovereignty regulations poses significant challenges for businesses operating in an era where data has become a global currency. These regulations, which require data to be stored and processed within specific geographic boundaries, have far-reaching implications for organizations of all sizes and industries. In this increasingly interconnected world, businesses navigate a complex web of legal, operational, and compliance issues. This article delves into the business challenges associated with data sovereignty regulations, shedding light on the critical considerations and strategies needed to effectively address these concerns while staying competitive in a data-driven economy.

The following are some of the challenges associated with meeting data sovereignty regulations:

  • Data localization: Regulations often require data to be stored within specific geographic boundaries. This can be costly as it may necessitate setting up local data centers or using cloud providers with data centers in the relevant region.
  • Data management: Managing data in compliance with various regulations can be complex and resource-intensive. Businesses must implement robust data governance, encryption, and access control mechanisms.
  • Compliance costs: Achieving compliance often involves substantial financial investments in technology, legal counsel, and compliance audits, which can strain a company’s budget.
  • Legal and regulatory complexities: Data sovereignty laws and regulations can vary widely from one jurisdiction to another. Understanding and navigating this legal landscape can be daunting, especially for businesses with an international presence.
  • Business disruption: Complying with data sovereignty regulations can lead to disruptions, including service downtime or changes in data processing practices, which may impact customer experience and revenue.
  • Data transfer restrictions: Regulations can limit the cross-border transfer of data, which can hinder global business operations and disrupt supply chains.
  • Data security: Businesses must implement robust security measures to protect data within specific regions, as breaches can result in severe penalties and reputation damage.
  • Vendor selection: Choosing the proper data storage and processing vendors that comply with local regulations can be challenging, as not all cloud service providers may have a presence in every region.
  • Privacy concerns: Meeting data sovereignty requirements often involves addressing privacy concerns and ensuring customer data is handled per local privacy laws.
  • Data portability: Regulations may require businesses to enable data portability, allowing individuals to move their data between service providers, which can be technically challenging.
  • Contractual obligations: Businesses may need to renegotiate contracts with vendors and customers to ensure compliance with data sovereignty laws, which can be time-consuming and costly.
  • Risk management: Companies must develop risk mitigation strategies to address the potential legal and financial risks associated with non-compliance.
  • International expansion challenges: Expanding into new markets means dealing with additional data sovereignty regulations, creating complexities for global expansion strategies.
  • Data residency and backup: Ensuring the data is always accessible and recoverable, even when subjected to local regulations, can be a considerable technical challenge.
  • Monitoring and reporting: Meeting compliance often requires continuous monitoring and reporting on data handling practices, which can be resource-intensive.
  • Employee training: Businesses must ensure that employees are aware of and trained in compliance with data sovereignty regulations, which may require ongoing education programs.

Navigating these challenges is essential for businesses to thrive in a data-driven world while complying with the complex and ever-evolving landscape of data sovereignty regulations.

Here are some architectural approaches that can help companies to mitigate these challenges:

Option 1: A separate cluster for each country.

Option 2: A common cluster for a group of countries and a separate cluster for countries with stricter regulations, with a separate domain name and separate metadata.

Option 3: A common cluster for a group of countries and a separate cluster for countries with stricter regulations, with a common domain name and common metadata.

The following are the pros and cons of each approach:

Option 1: A separate cluster for each country

Pros

Stricter data isolation for each country, as each country will have its database for managing metadata.

Cons

  • An expensive solution as enterprises need to procure more VMs and clusters.
  • Maintenance overhead as more clusters need to be maintained.
  • No central discoverability of assets across the enterprise.
  • No common domain name, and hence, user experience might be slightly different for users from different countries.

Option 2: A common cluster for a group of countries and a separate cluster for countries with stricter regulations, using a separate domain name and separate metadata

Pros

  • Less expensive compared to option 1 as less hardware is required.
  • Lesser maintenance compared to option 1 as there are fewer clusters.

Cons

  • No central discoverability of assets across the enterprise.
  • No common domain name, and hence, user experience might be slightly different for users from different countries.

Option 3: A common cluster for a group of countries and a separate cluster for countries with stricter regulations, using a common domain name and common metadata

Pros

  • Less expensive compared to option 1 as less hardware is required.
  • Lesser maintenance compared to option 1 as there are fewer clusters.
  • Central discoverability of models and other assets as we maintain common metadata.
  • Common domain name, and hence, the user experience will be the same for all users.

Cons

  • None of significance.

As you can see, option 3 offers the most advantage to enterprises operating across multiple countries with varying data sovereignty regulations.

How does Refract help clients build AI platforms in line with data sovereignty requirements?

Refract, the Insight Designer module of the Fosfor Decision Cloud is an enterprise-grade AI platform that can manage the complete lifecycle of an AI project, from data discovery to data extraction, model deployment, and model monitoring.

Since Refract is an AI platform built using microservices, the platform can be hosted on-premises or on any cloud platform.

The following are some of the key features of Refract:

    • Data extraction: It has a massive collection of connectors and a built-in SDK called Refractio, which can be used to extract data from various data sources.
    • Data profiling: Out-of-the-box data profiling capabilities like completeness, accuracy, basic statistics, missing values, etc.
    • Data preparation (feature engineering): It has 100+ out-of-the-box functions for data preparation and feature engineering.
    • Model development: It supports multiple development environments like JupyterLab, VS Code, Spark, R, Python, etc.
      Model registration: Built-in SDK for model registration.
    • Model deployment: One-click deployment of models and applications.
    • Model consumption: Models can be consumed via API or a Streamlit application.
    • Model monitoring: Out-of-the-box capabilities for model monitoring.

Why Refract is perfect for building AI platforms in line with data sovereignty requirements?

Refract has a microservices architecture with common application metadata, so even if multiple instances of the application are running, we can still have a common discoverability feature across the platform.

Refract also leverages options like Azure Front Door, which ensures all the users have a common domain name, even though there might be multiple instances of the application running.

How the solution works?

Refract uses the below-shown architecture (Figure 2) to ensure that workloads for all Schengen countries (as discussed in the earlier example, excluding country 3) are scheduled on the common shared infrastructure, storage, and compute. Country 3’s workloads will be scheduled on different infrastructure, storage, and compute.


Figure 2: Shared & dedicated infrastructure architecture that Refract implements.

Note: In the above image, we have referenced Azure as an example, but the solution can be implemented with any other cloud provider as well.

The following is the sequence of events when a user tries to log in:

    1. Whenever a user tries to log in to the portal, the domain name will be the same for users from Country A and others.
      The application identifies the origin of the traffic, and if the traffic is originating from Country A, then the workload will be pushed onto the nodes sitting in Country A. If the traffic originates from other countries, the workload will be pushed to the shared infrastructure.
    2. The Fosfor Decision Cloud maintains application metadata in a common DB, so all the models are visible in a central location. If the client needs separate metadata, then the Fosfor Decision Cloud’s architecture is flexible enough to maintain separate metadata for individual countries.

Conclusion

In conclusion, the Fosfor Decision Cloud emerges as a pivotal partner for global enterprises navigating the intricate terrain of AI implementation amidst the stringent demands of data sovereignty regulations.

Its commitment to ensuring data compliance without sacrificing a seamless user experience underscores its significance in the AI landscape. By providing centralized asset discoverability, the Fosfor Decision Cloud transcends geographic boundaries and regulatory complexities, uniting disparate assets under a unified platform. Moreover, its adaptable architectural design facilitates the management of separate metadata for countries where strict data sovereignty rules prevail, offering a tailored solution when business requirements necessitate it.

Ultimately, the Fosfor Decision Cloud empowers businesses to concentrate on AI model development and management, alleviating concerns surrounding data sovereignty and thus facilitating the uninterrupted pursuit of innovation and excellence.

Go to fosfor.com to learn more.

Author

Ravikumar S Haligode

Senior Specialist – Data Science, Fosfor

With over 15 years of IT experience, Ravikumar has worked closely with senior stakeholders from business, operations, and system owners to identify opportunities for cost reduction, revenue enhancement, and customer experience using a data-driven approach. He has worked on multiple AI/ML projects, with extensive experience in building and evaluating models, tuning hyperparameters for optimum performance, and retraining models.

More on the topic

Read more thought leadership from our team of experts

AI in a box: How Refract simplifies end-to-end machine learning

The modern tech world has become a data hub reliant on processing. Today, there is user data on everything from driving records to scroll speed on social media applications. As a result, there has been a considerable demand for methods to process this data, given that it holds hidden insights that can propel a company into the global stage quicker than ever before.

Read more

Bias in AI: A primer

While Artificial Intelligence (AI) systems can be highly accurate, they are imperfect. As such, they may make incorrect decisions or predictions. Several challenges need to be solved for the development and adoption of technology. One major challenge is the bias in AI systems. Bias in AI refers to the systematic differences between a model's predicted and true output. These deviations can lead to incorrect or unfair outcomes, which can seriously affect critical fields like healthcare, finance, and criminal justice.

Read more

Generative AI - Accelerate ML operations using GPT

As Data Science and Machine Learning practitioners, we often face the challenge of finding solutions to complex problems. One powerful artificial intelligence platform that can help speed up the process is the use of Generative Pretrained Transformer 3 (GPT-3) language model.

Read more
We use cookies to personalise content and ads, to provide social media features and to analyse our traffic. We also share information about your use of our site with our social media, advertising and analytics partners. View more
Cookies settings
Accept
Privacy & Cookie policy
Privacy & Cookies policy
Cookie name Active

What is a cookie?

A cookie is a small piece of data that a website asks your browser to store on your computer or mobile device. The cookie allows the website to “remember” your actions or preferences over time. On future visits, this data is then returned to that website to help identify you and your site preferences. Our websites and mobile sites use cookies to give you the best online experience. Most Internet browsers support cookies; however, users can set their browsers to decline certain types of cookies or specific cookies. Further, users can delete cookies at any time.

Why do we use cookies?

We use cookies to learn how you interact with our content and to improve your experience when visiting our website(s). For example, some cookies remember your language or preferences so that you do not have to repeatedly make these choices when you visit one of our websites.

What kind of cookies do we use?

We use the following categories of cookie:

Category 1: Strictly Necessary Cookies

Strictly necessary cookies are those that are essential for our sites to work in the way you have requested. Although many of our sites are open, that is, they do not require registration; we may use strictly necessary cookies to control access to some of our community sites, whitepapers or online events such as webinars; as well as to maintain your session during a single visit. These cookies will need to reset on your browser each time you register or log in to a gated area. If you block these cookies entirely, you may not be able to access gated areas. We may also offer you the choice of a persistent cookie to recognize you as you return to one of our gated sites. If you choose not to use this “remember me” function, you will simply need to log in each time you return.
Cookie Name Domain / Associated Domain / Third-Party Service Description Retention period
__cfduid Cloudflare Cookie associated with sites using CloudFlare, used to speed up page load times 1 Year
lidc linkedin.com his is a Microsoft MSN 1st party cookie that ensures the proper functioning of this website. 1 Day
PHPSESSID ltimindtree.com Cookies named PHPSESSID only contain a reference to a session stored on the web server When the browsing session ends
catAccCookies ltimindtree.com Cookie set by the UK cookie consent plugin to record that you accept the fact that the site uses cookies. 29 Days
AWSELB Used to distribute traffic to the website on several servers in order to optimise response times. 2437 Days
JSESSIONID linkedin.com Preserves users states across page requests. 334,416 Days
checkForPermission bidr.io Determines whether the visitor has accepted the cookie consent box. 1 Day
VISITOR_INFO1_LIVE Tries to estimate users bandwidth on the pages with integrated YouTube videos. 179 Days
.avia-table-1 td:nth-of-type(1):before { content: 'Cookie Name'; } .avia-table-1 td:nth-of-type(2):before { content: 'Domain / Associated Domain / Third-Party Service'; } .avia-table-1 td:nth-of-type(3):before { content: 'Description'; } .avia-table-1 td:nth-of-type(4):before { content: 'Retention period'; }

Category 2: Performance Cookies

Performance cookies, often called analytics cookies, collect data from visitors to our sites on a unique, but anonymous basis. The results are reported to us as aggregate numbers and trends. LTI allows third-parties to set performance cookies. We rely on reports to understand our audiences, and improve how our websites work. We use Google Analytics, a web analytics service provided by Google, Inc. (“Google”), which in turn uses performance cookies. Information generated by the cookies about your use of our website will be transmitted to and stored by Google on servers Worldwide. The IP-address, which your browser conveys within the scope of Google Analytics, will not be associated with any other data held by Google. You may refuse the use of cookies by selecting the appropriate settings on your browser. However, you have to note that if you do this, you may not be able to use the full functionality of our website. You can also opt-out from being tracked by Google Analytics from any future instances, by downloading and installing Google Analytics Opt-out Browser Add-on for your current web browser: https://tools.google.com/dlpage/gaoptout & cookiechoices.org and privacy.google.com/businesses
Cookie Name Domain / Associated Domain / Third-Party Service Description Retention period
_ga ltimindtree.com Used to identify unique users. Registers a unique ID that is used to generate statistical data on how the visitor uses the web site. 2 years
_gid ltimindtree.com This cookie name is asssociated with Google Universal Analytics. This appears to be a new cookie and as of Spring 2017 no information is available from Google. It appears to store and update a unique value for each page visited. 1 day
_gat ltimindtree.com Used by Google Analytics to throttle request rate 1 Day
.avia-table-2 td:nth-of-type(1):before { content: 'Cookie Name'; } .avia-table-2 td:nth-of-type(2):before { content: 'Domain / Associated Domain / Third-Party Service'; } .avia-table-2 td:nth-of-type(3):before { content: 'Description'; } .avia-table-2 td:nth-of-type(4):before { content: 'Retention period'; }

Category 3: Functionality Cookies

We may use site performance cookies to remember your preferences for operational settings on our websites, so as to save you the trouble to reset the preferences every time you visit. For example, the cookie may recognize optimum video streaming speeds, or volume settings, or the order in which you look at comments to a posting on one of our forums. These cookies do not identify you as an individual and we don’t associate the resulting information with a cookie that does.
Cookie Name Domain / Associated Domain / Third-Party Service Description Retention period
lang ads.linkedin.com Set by LinkedIn when a webpage contains an embedded “Follow us” panel. Preference cookies enable a website to remember information that changes the way the website behaves or looks, like your preferred language or the region that you are in. When the browsing session ends
lang linkedin.com In most cases it will likely be used to store language preferences, potentially to serve up content in the stored language. When the browsing session ends
YSC Registers a unique ID to keep statistics of what videos from Youtube the user has seen. 2,488,902 Days
.avia-table-3 td:nth-of-type(1):before { content: 'Cookie Name'; } .avia-table-3 td:nth-of-type(2):before { content: 'Domain / Associated Domain / Third-Party Service'; } .avia-table-3 td:nth-of-type(3):before { content: 'Description'; } .avia-table-3 td:nth-of-type(4):before { content: 'Retention period'; }

Category 4: Social Media Cookies

If you use social media or other third-party credentials to log in to our sites, then that other organization may set a cookie that allows that company to recognize you. The social media organization may use that cookie for its own purposes. The Social Media Organization may also show you ads and content from us when you visit its websites.

Ref links:

LinkedInhttps://www.linkedin.com/legal/privacy-policy Twitterhttps://gdpr.twitter.com/en.html & https://twitter.com/en/privacy & https://help.twitter.com/en/rules-and-policies/twitter-cookies Facebookhttps://www.facebook.com/business/gdpr Also, if you use a social media-sharing button or widget on one of our sites, the social network that created the button will record your action for its own purposes. Please read through each social media organization’s privacy and data protection policy to understand its use of its cookies and the tracking from our sites, and also how to control such cookies and buttons.

Category 5: Targeting/Advertising Cookies

We use tracking and targeting cookies, or ask other companies to do so on our behalf, to send you emails and show you online advertising, which meet your business and professional interests. If you have registered on our websites, we may send you emails, tailored to reflect the interests you have shown during your visits. We ask third-party advertising platforms and technology companies to show you our ads after you leave our sites (retargeting technology). This technology allows us to make our website services more interesting for you. Retargeting cookies are used to record anonymized movement patterns on a website. These patterns are used to tailor banner advertisements to your interests. The data used for retargeting is completely anonymous, and is only used for statistical analysis. No personal data is stored, and the use of the retargeting technology is subject to the applicable statutory data protection regulations. We also work with companies to reach people who have not visited our sites. These companies do not identify you as an individual, instead rely on a variety of other data to show you advertisements, for example, behavior across websites, information about individual devices, and, in some cases, IP addresses. Please refer below table to understand how these third-party websites collect and use information on our behalf and read more about their opt out options.
Cookie Name Domain / Associated Domain / Third-Party Service Description Retention period
BizoID ads.linkedin.com These cookies are used to deliver adverts more relevant to you and your interests 183 days
iuuid demandbase.com Used to measure the performance and optimization of Demandbase data and reporting 2 years
IDE doubleclick.net This cookie carries out information about how the end user uses the website and any advertising that the end user may have seen before visiting the said website. 2,903,481 Days
UserMatchHistory linkedin.com This cookie is used to track visitors so that more relevant ads can be presented based on the visitor’s preferences. 60,345 Days
bcookie linkedin.com This is a Microsoft MSN 1st party cookie for sharing the content of the website via social media. 2 years
__asc ltimindtree.com This cookie is used to collect information on consumer behavior, which is sent to Alexa Analytics. 1 Day
__auc ltimindtree.com This cookie is used to collect information on consumer behavior, which is sent to Alexa Analytics. 1 Year
_gcl_au ltimindtree.com Used by Google AdSense for experimenting with advertisement efficiency across websites using their services. 3 Months
bscookie linkedin.com Used by the social networking service, LinkedIn, for tracking the use of embedded services. 2 years
tempToken app.mirabelsmarketingmanager.com When the browsing session ends
ELOQUA eloqua.com Registers a unique ID that identifies the user’s device upon return visits. Used for auto -populating forms and to validate if a certain contact is registered to an email group . 2 Years
ELQSTATUS eloqua.com Used to auto -populate forms and validate if a given contact has subscribed to an email group. The cookies only set if the user allows tracking . 2 Years
IDE doubleclick.net Used by Google Double Click to register and report the website user’s actions after viewing clicking one of the advertiser’s ads with the purpose of measuring the efficiency of an ad and to present targeted ads to the user. 1 Year
NID google.com Registers a unique ID that identifies a returning user’s device. The ID is used for targeted ads. 6 Months
PREF youtube.com Registers a unique ID that is used by Google to keep statistics of how the visitor uses YouTube videos across different web sites. 8 months
test_cookie doubleclick.net This cookie is set by DoubleClick (which is owned by Google) to determine if the website visitor’s browser supports cookies. 1,073,201 Days
UserMatchHistory linkedin.com Used to track visitors on multiple websites, in order to present relevant advertisement based on the visitor’s preferences. 29 days
VISITOR_INFO1_LIVE youtube.com 179 days
.avia-table-4 td:nth-of-type(1):before { content: 'Cookie Name'; } .avia-table-4 td:nth-of-type(2):before { content: 'Domain / Associated Domain / Third-Party Service'; } .avia-table-4 td:nth-of-type(3):before { content: 'Description'; } .avia-table-4 td:nth-of-type(4):before { content: 'Retention period'; }
Third party companies Purpose Applicable Privacy/Cookie Policy Link
Alexa Show targeted, relevant advertisements https://www.oracle.com/legal/privacy/marketing-cloud-data-cloud-privacy-policy.html To opt out: http://www.bluekai.com/consumers.php#optout
Eloqua Personalized email based interactions https://www.oracle.com/legal/privacy/marketing-cloud-data-cloud-privacy-policy.html To opt out: https://www.oracle.com/marketingcloud/opt-status.html
CrazyEgg CrazyEgg provides visualization of visits to website. https://help.crazyegg.com/article/165-crazy-eggs-gdpr-readiness Opt Out: DAA: https://www.crazyegg.com/opt-out
DemandBase Show targeted, relevant advertisements https://www.demandbase.com/privacy-policy/ Opt out: DAA: http://www.aboutads.info/choices/
LinkedIn Show targeted, relevant advertisements and re-targeted advertisements to visitors of LTI websites https://www.linkedin.com/legal/privacy-policy Opt-out: https://www.linkedin.com/help/linkedin/answer/62931/manage-advertising-preferences
Google Show targeted, relevant advertisements and re-targeted advertisements to visitors of LTI websites https://policies.google.com/privacy Opt Out: https://adssettings.google.com/ NAI: http://optout.networkadvertising.org/ DAA: http://optout.aboutads.info/
Facebook Show targeted, relevant advertisements https://www.facebook.com/privacy/explanation Opt Out: https://www.facebook.com/help/568137493302217
Youtube Show targeted, relevant advertisements. Show embedded videos on LTI websites https://policies.google.com/privacy Opt Out: https://adssettings.google.com/ NAI: http://optout.networkadvertising.org/ DAA: http://optout.aboutads.info/
Twitter Show targeted, relevant advertisements and re-targeted advertisements to visitors of LTI websites https://twitter.com/en/privacy Opt out: https://twitter.com/personalization DAA: http://optout.aboutads.info/
. .avia-table tr {} .avia-table th, .flex_column .avia-table td { color: #343434; padding: 5px !important; border: 1px solid #ddd !important; } .avia-table th {background-color: #addeec;} .avia-table tr:nth-child(odd) td {background-color: #f1f1f1;}
Save settings
Cookies settings