Data management solution using Spectra

4 min read

Reading Time: 4 minutes

Zhamak Dehghani coined the term data mesh in 2019. Since this data transformation, the internet started flooding with information about its advantages and all the problems it would solve. Many articles compare data lake with data mesh.

This article will not explain, in great detail, what a data mesh or data lake is, but, instead, how you can adapt to these systems. We’ll also dive into why organizations must put some abstraction on top of these data management systems to make the right decisions based on their organization-specific scenarios.

Just so you have it, if you are not already aware of these terms, you can refer below articles before joining me back here.

1. Data Mesh Principles and Logical Architecture

(https://martinfowler.com/articles/data-mesh-principles.html)

2. Data Lake Principles and Logical Architecture

( https://www.guru99.com/data-lake-architecture.html )

3. Data Mesh is not a Data Lake

(https://www.linkedin.com/pulse/data-mesh-lake-jeffrey-t-pollock/)

If you have skipped the above three articles, then here is a high-level summary.

What is a data lake?

Data lakes are systems used for organizing and managing data centrally. This data can be in any form, i.e., it can be structured, semi-structured or unstructured. Data lakes typically follow a schema-on-read approach. If we go a little back in history, data lake came into the picture because of the rise of big data. This was triggered by the advent of new data generating sources and more users getting acquainted with the internet.

What is a data mesh?

Data mesh is a system that focuses on decentralizing data and data ownership in an organization. This means anyone in an organization can use and manage data until they have proper credentials for more access. Data mesh advocates that data can stay where it is regardless of whether it lies in different databases. It focuses more on serving this data as a data product that can be made accessible to all the authorized stakeholders. Historically, data mesh came into the picture to overcome challenges people faced with centralized data management systems.

Why organizations should have abstraction

Ever-changing design patterns

If we look from a 30,000 feet view, data lakes and data mesh are design patterns. That is, they are a way to organize and implement data to make easy to access, manage and maintain more efficiently. They are not bringing in any computation revolution. These patterns focus primarily on the ownership of data, its maintenance and distribution to the topmost in a hierarchy.

Now, the moment we consider this as a design pattern, it becomes clear that there is no thumb rule to it, and we cannot say one will fit all forever. These design patterns are bound to evolve. As with every new implementation in a production environment, fresh sets of challenges lead to further brainstorming and give birth to some new design patterns.

The challenge to catching up with these design patterns is the threat of obsoletion. By the time you analyze it, alter it for your organization, migrate your data, and overcome all the challenges, there comes a new approach to doing it more efficiently.

Another challenge is functionality. With new designs in the management systems, you don’t know whether they will be more efficient for you than your existing design. If you search around the internet, some of the early adopters of data mesh have already started posting articles advising enterprises on how to avoid common pitfalls of data mesh setup.

This ultimately becomes our first reason to abstract this entire process. We cannot always spend a lot of development effort to come to a single conclusion. Instead, we can think of a system that can help us give a quick leap into the future and help us make informed decisions. The abstraction should be such that you can easily change your source and destination without any engineering efforts to your core processing logic.

Changes in ownership of data

When learning about data lake and data mesh, one of the crucial topics is data ownership. Thankfully, data mesh brings a perspective that the maintenance and ownership of data should be with the team that understands that data more appropriately. Data owners should be responsible for serving that data in a consumable manner to end-users in the organization.

This ownership distribution brings up the need for the data owners to capture and maintain feedback from the end-user. If the feedbacks contain some enhancement, they need to take care of it and again expose this new enhanced data to the end-user.

If the individual owners start tracking this independently, there is a lot of duplication of efforts that these owners will spend in brainstorming and coming up with some automation. The story does not end here. What if tomorrow, some new design patterns come up with a unique view on data ownership?

In that case, we should abstract the access control part and develop data management systems that will facilitate data access management, data discovery, data tagging, and feedback capture. This can help organizations better organize their data access hierarchy and create a sophisticated, efficient way to adapt new design patterns.

Shifts in the thinking process

If we try to make sense of current design patterns, there is a constant effort to move data closer to the end-user so that the end-user can question the data as needed. Again, let’s take the example of the latest data approach as a product mentioned in the data mesh. It underlines that this thought process is more inclined to abstract the technical nitty-gritty of data transformations and serve consumable data.

However, what if we have an abstracted data management solution systems where domain experts can create formulas/expressions based on their expertise, and engineers can serve the data by applying those formulas to the underlying data structure? It will make more sense if these expressions are maintained at some central portal so that anyone unaware of these expressions can study these and apply them to his own data. For example, suppose someone is not aware of how to calculate simple interest in a banking domain. In that case, such people can simply access the already created expressions and use these expressions on their data.

There isn’t a one-size-fits-all solution

Both data lake and data mesh have their importance. One cannot conclude that only one way to organize and transform data will solve all the problems. There could be scenarios where data lake is the apt choice, while data mesh can be a more sensible choice in other cases. There could even be a third scenario where both can co-exist.

Organizing data centrally in a data lake makes sense when you need more performant queries, as we can avoid latency caused by connecting siloed data. At the same time, a data mesh is suitable for organizations storing their data across multiple databases and does not want a dependency on the central team.

As the internet community grows, data generation speed is growing multi-fold, thereby introducing new forms of data. At present, data is categorized into structured, semi-structured, and unstructured. One never knows when something new can pop up.

Moreover, organizing data is a skill. No matter whatever groundwork you do, there are chances you will learn it through iterations because data and use cases will always differ from organization to organization.

Hence, we need some abstract data management systems to help data scientists quickly run over these iterations and make them less cumbersome for you, the enterprise owner. This will help us make informed decisions based on our specific data and circumstances.

The future of data mesh and data lake

In the ever-changing space of technology and design patterns, organizations should invest in creating tools that can enable them to migrate and implement design patterns as required. As the world evolves, enterprises should not be binding themselves to use only one design pattern decided by someone ages ago. Because it is challenging to keep pace with updated versions and new technologies, organizations should evaluate their needs and put a conscious effort into the build vs. buy methodology.

We at Spectra are already helping organizations overcome these challenges by starting a client with smaller problem resolutions tools. If that makes sense and adds value to the organization, they can then adapt this tool to have smooth sailing over different architectures and technology upgrades.

Spectra is a comprehensive enterprise DataOps platform (data ingestion, transformation, and preparation) platform built to manage complex, varied data pipelines using a low-code user interface. Its domain-specific features deliver remarkable data solutions at speed and scale. Maximize your ROI with faster time-to-market, time-to-value and reduced cost of ownership.

The advantage of employing an enterprise data management solution like Spectra is that you get the best of the learnings across implementations, reducing your optimization iterations. Moreover, if any new connector comes in, you don’t have to take the pain of analyzing and developing it. With Spectra, you can directly use it without writing a single line of code.

Author

Mahesh Jadhav

Associate Principal, Product Engineering, LTIMindtree

Mahesh is an Oracle-certified Java professional with over a decade of experience. He has hands-on experience in Snowflake, DBT, Apache Spark, Kubernetes, Bigdata, Spring boot, and application development. Mahesh is actively involved in building LTIMindtree’s new generation decision cloud, the Fosfor Decision Cloud.

More on the topic

Read more thought leadership from our team of experts

How Spectra and Snowflake can illuminate your data journey

Considering that more people are becoming tech-savvy, enterprises need to focus on data and derive actionable insights as quickly as possible.

How modern data platforms are enabling enterprise DataOps

With the ever-increasing volume and complexity of data, extracting value from it has become more challenging. Organizations are facing challenges in providing the right set of data to the right team while maintaining data security at the same time. This presents an opportunity for organizations to become agile with their data management solution, provide the right data set to the correct domain team, and ensure collaboration among teams to get the maximum out of their data.

The top 7 features of modern data pipelines

As the amount of data generated by industries increases daily, the importance of reevaluating the data stack has never been greater. Studies predict that by 2025, the world will create approximately 463 exabytes of data each day. Given this vast amount of data and its value, a modern data infrastructure is needed to store, transform, and extract meaningful insights.

Privacy & Cookie policy

Privacy & Cookies policy

Cookie name	Active
sess_map

What is a cookie?

A cookie is a small piece of data that a website asks your browser to store on your computer or mobile device. The cookie allows the website to “remember” your actions or preferences over time. On future visits, this data is then returned to that website to help identify you and your site preferences. Our websites and mobile sites use cookies to give you the best online experience. Most Internet browsers support cookies; however, users can set their browsers to decline certain types of cookies or specific cookies. Further, users can delete cookies at any time.

Why do we use cookies?

We use cookies to learn how you interact with our content and to improve your experience when visiting our website(s). For example, some cookies remember your language or preferences so that you do not have to repeatedly make these choices when you visit one of our websites.

What kind of cookies do we use?

We use the following categories of cookie:

Category 1: Strictly Necessary Cookies

Strictly necessary cookies are those that are essential for our sites to work in the way you have requested. Although many of our sites are open, that is, they do not require registration; we may use strictly necessary cookies to control access to some of our community sites, whitepapers or online events such as webinars; as well as to maintain your session during a single visit. These cookies will need to reset on your browser each time you register or log in to a gated area. If you block these cookies entirely, you may not be able to access gated areas. We may also offer you the choice of a persistent cookie to recognize you as you return to one of our gated sites. If you choose not to use this “remember me” function, you will simply need to log in each time you return.

Cookie Name	Domain / Associated Domain / Third-Party Service	Description	Retention period
__cfduid	Cloudflare	Cookie associated with sites using CloudFlare, used to speed up page load times	1 Year
lidc	linkedin.com	his is a Microsoft MSN 1^st party cookie that ensures the proper functioning of this website.	1 Day
PHPSESSID	ltimindtree.com	Cookies named PHPSESSID only contain a reference to a session stored on the web server	When the browsing session ends
catAccCookies	ltimindtree.com	Cookie set by the UK cookie consent plugin to record that you accept the fact that the site uses cookies.	29 Days
AWSELB		Used to distribute traffic to the website on several servers in order to optimise response times.	2437 Days
JSESSIONID	linkedin.com	Preserves users states across page requests.	334,416 Days
checkForPermission	bidr.io	Determines whether the visitor has accepted the cookie consent box.	1 Day
VISITOR_INFO1_LIVE		Tries to estimate users bandwidth on the pages with integrated YouTube videos.	179 Days

Category 2: Performance Cookies

Performance cookies, often called analytics cookies, collect data from visitors to our sites on a unique, but anonymous basis. The results are reported to us as aggregate numbers and trends. LTI allows third-parties to set performance cookies. We rely on reports to understand our audiences, and improve how our websites work. We use Google Analytics, a web analytics service provided by Google, Inc. (“Google”), which in turn uses performance cookies. Information generated by the cookies about your use of our website will be transmitted to and stored by Google on servers Worldwide. The IP-address, which your browser conveys within the scope of Google Analytics, will not be associated with any other data held by Google. You may refuse the use of cookies by selecting the appropriate settings on your browser. However, you have to note that if you do this, you may not be able to use the full functionality of our website. You can also opt-out from being tracked by Google Analytics from any future instances, by downloading and installing Google Analytics Opt-out Browser Add-on for your current web browser: https://tools.google.com/dlpage/gaoptout & cookiechoices.org and privacy.google.com/businesses

Cookie Name	Domain / Associated Domain / Third-Party Service	Description	Retention period
_ga	ltimindtree.com	Used to identify unique users. Registers a unique ID that is used to generate statistical data on how the visitor uses the web site.	2 years
_gid	ltimindtree.com	This cookie name is asssociated with Google Universal Analytics. This appears to be a new cookie and as of Spring 2017 no information is available from Google. It appears to store and update a unique value for each page visited.	1 day
_gat	ltimindtree.com	Used by Google Analytics to throttle request rate	1 Day

Category 3: Functionality Cookies

We may use site performance cookies to remember your preferences for operational settings on our websites, so as to save you the trouble to reset the preferences every time you visit. For example, the cookie may recognize optimum video streaming speeds, or volume settings, or the order in which you look at comments to a posting on one of our forums. These cookies do not identify you as an individual and we don’t associate the resulting information with a cookie that does.

Cookie Name	Domain / Associated Domain / Third-Party Service	Description	Retention period
lang	ads.linkedin.com	Set by LinkedIn when a webpage contains an embedded “Follow us” panel. Preference cookies enable a website to remember information that changes the way the website behaves or looks, like your preferred language or the region that you are in.	When the browsing session ends
lang	linkedin.com	In most cases it will likely be used to store language preferences, potentially to serve up content in the stored language.	When the browsing session ends
YSC		Registers a unique ID to keep statistics of what videos from Youtube the user has seen.	2,488,902 Days

Category 4: Social Media Cookies

If you use social media or other third-party credentials to log in to our sites, then that other organization may set a cookie that allows that company to recognize you. The social media organization may use that cookie for its own purposes. The Social Media Organization may also show you ads and content from us when you visit its websites.

Ref links:

LinkedIn – https://www.linkedin.com/legal/privacy-policy Twitter – https://gdpr.twitter.com/en.html & https://twitter.com/en/privacy & https://help.twitter.com/en/rules-and-policies/twitter-cookies Facebook – https://www.facebook.com/business/gdpr Also, if you use a social media-sharing button or widget on one of our sites, the social network that created the button will record your action for its own purposes. Please read through each social media organization’s privacy and data protection policy to understand its use of its cookies and the tracking from our sites, and also how to control such cookies and buttons.

Category 5: Targeting/Advertising Cookies

We use tracking and targeting cookies, or ask other companies to do so on our behalf, to send you emails and show you online advertising, which meet your business and professional interests. If you have registered on our websites, we may send you emails, tailored to reflect the interests you have shown during your visits. We ask third-party advertising platforms and technology companies to show you our ads after you leave our sites (retargeting technology). This technology allows us to make our website services more interesting for you. Retargeting cookies are used to record anonymized movement patterns on a website. These patterns are used to tailor banner advertisements to your interests. The data used for retargeting is completely anonymous, and is only used for statistical analysis. No personal data is stored, and the use of the retargeting technology is subject to the applicable statutory data protection regulations. We also work with companies to reach people who have not visited our sites. These companies do not identify you as an individual, instead rely on a variety of other data to show you advertisements, for example, behavior across websites, information about individual devices, and, in some cases, IP addresses. Please refer below table to understand how these third-party websites collect and use information on our behalf and read more about their opt out options.

Cookie Name	Domain / Associated Domain / Third-Party Service	Description	Retention period
BizoID	ads.linkedin.com	These cookies are used to deliver adverts more relevant to you and your interests	183 days
iuuid	demandbase.com	Used to measure the performance and optimization of Demandbase data and reporting	2 years
IDE	doubleclick.net	This cookie carries out information about how the end user uses the website and any advertising that the end user may have seen before visiting the said website.	2,903,481 Days
UserMatchHistory	linkedin.com	This cookie is used to track visitors so that more relevant ads can be presented based on the visitor’s preferences.	60,345 Days
bcookie	linkedin.com	This is a Microsoft MSN 1st party cookie for sharing the content of the website via social media.	2 years
__asc	ltimindtree.com	This cookie is used to collect information on consumer behavior, which is sent to Alexa Analytics.	1 Day
__auc	ltimindtree.com	This cookie is used to collect information on consumer behavior, which is sent to Alexa Analytics.	1 Year
_gcl_au	ltimindtree.com	Used by Google AdSense for experimenting with advertisement efficiency across websites using their services.	3 Months
bscookie	linkedin.com	Used by the social networking service, LinkedIn, for tracking the use of embedded services.	2 years
tempToken	app.mirabelsmarketingmanager.com		When the browsing session ends
ELOQUA	eloqua.com	Registers a unique ID that identifies the user’s device upon return visits. Used for auto -populating forms and to validate if a certain contact is registered to an email group .	2 Years
ELQSTATUS	eloqua.com	Used to auto -populate forms and validate if a given contact has subscribed to an email group. The cookies only set if the user allows tracking .	2 Years
IDE	doubleclick.net	Used by Google Double Click to register and report the website user’s actions after viewing clicking one of the advertiser’s ads with the purpose of measuring the efficiency of an ad and to present targeted ads to the user.	1 Year
NID	google.com	Registers a unique ID that identifies a returning user’s device. The ID is used for targeted ads.	6 Months
PREF	youtube.com	Registers a unique ID that is used by Google to keep statistics of how the visitor uses YouTube videos across different web sites.	8 months
test_cookie	doubleclick.net	This cookie is set by DoubleClick (which is owned by Google) to determine if the website visitor’s browser supports cookies.	1,073,201 Days
UserMatchHistory	linkedin.com	Used to track visitors on multiple websites, in order to present relevant advertisement based on the visitor’s preferences.	29 days
VISITOR_INFO1_LIVE	youtube.com		179 days

Third party companies	Purpose	Applicable Privacy/Cookie Policy Link
Alexa	Show targeted, relevant advertisements	https://www.oracle.com/legal/privacy/marketing-cloud-data-cloud-privacy-policy.html To opt out: http://www.bluekai.com/consumers.php#optout
Eloqua	Personalized email based interactions	https://www.oracle.com/legal/privacy/marketing-cloud-data-cloud-privacy-policy.html To opt out: https://www.oracle.com/marketingcloud/opt-status.html
CrazyEgg	CrazyEgg provides visualization of visits to website.	https://help.crazyegg.com/article/165-crazy-eggs-gdpr-readiness Opt Out: DAA: https://www.crazyegg.com/opt-out
DemandBase	Show targeted, relevant advertisements	https://www.demandbase.com/privacy-policy/ Opt out: DAA: http://www.aboutads.info/choices/
LinkedIn	Show targeted, relevant advertisements and re-targeted advertisements to visitors of LTI websites	https://www.linkedin.com/legal/privacy-policy Opt-out: https://www.linkedin.com/help/linkedin/answer/62931/manage-advertising-preferences
Google	Show targeted, relevant advertisements and re-targeted advertisements to visitors of LTI websites	https://policies.google.com/privacy Opt Out: https://adssettings.google.com/ NAI: http://optout.networkadvertising.org/ DAA: http://optout.aboutads.info/
Facebook	Show targeted, relevant advertisements	https://www.facebook.com/privacy/explanation Opt Out: https://www.facebook.com/help/568137493302217
Youtube	Show targeted, relevant advertisements. Show embedded videos on LTI websites	https://policies.google.com/privacy Opt Out: https://adssettings.google.com/ NAI: http://optout.networkadvertising.org/ DAA: http://optout.aboutads.info/
Twitter	Show targeted, relevant advertisements and re-targeted advertisements to visitors of LTI websites	https://twitter.com/en/privacy Opt out: https://twitter.com/personalization DAA: http://optout.aboutads.info/

Save settings

Overview

Partners

What’s hot

Industries

Roles

Knowledge hub

About Fosfor

The Fosfor Decision Cloud