As we all know, production ML (Machine Learning) is more engineering than machine learning. Building a prototype in machine learning has become very simple nowadays, all thanks to different open-source projects like sci-kit, TensorFlow, Keras, etc. But operationalizing that model to get the insights from the model which can be used in day-to-day business decisions is challenging and needs more engineering knowledge than data science knowledge.
That’s where Fosfor Refract can contribute to help its users to setup standard processes and tools to avoid common pitfalls of engineering around machine learning and take them through a guided approach to create the best-in-breed production ML engine.
Here is a detailed narrative of some of the common challenges we face on a regular basis and how Refract solves them:
- Decoding Coding: We have heard many developers say that this code was working on my laptop and not sure why it’s failing in production. The primary reason for this is, in any ML project we develop, mostly the user code contributes to 1% of the overall package and 99% is contributed by third-party packages and OS level dependencies. So, it is very important to preserve all the attributes you used during the development to be shipped to production. Refract does the same thing by giving a centralized configurable environment for data scientists to write their model code. During operationalizing your model code, all the dependencies are shipped along making it a seamless journey for its users.
- Overcoming Repetitive Patterns: In a large organization, project similarities are a common problem and so is a duplication of efforts. In the current siloed development approach, this is unavoidable. Refract offers a collaborative environment across the enterprises to avoid such scenarios. Any data scientist can go through the entire catalog of models or use cases created earlier and adopt it in its current project, or continue enhancing to suit their needs rather than starting from scratch.
- Leaving behind Legacy Models: Maintaining the reproducibility of a model is very critical for debug any compliance reasons. But it is very difficult to do so as in ML project code, it is not the only thing that changes over time, in fact, data and model parameters also change the model behavior. Refract simplifies this complete orchestration and stores hyperparameters, model code, data used for every model runs. You can always look back at that data and reproduce the complete run with minimal efforts.