Blogs

Are You Ready for the Future of Big Data?

Avatar photo Team Zaloni September 5th, 2018

A few weeks ago, I went to the IBM World of Watson conference in Las Vegas, NV. Being one of the (roughly) 500 IBM Champions worldwide, this seems to be my yearly migration, my regular “blue shot”, my commitment to the program. You get the gist. But who knows… What happens in Vegas stays in Vegas.

This article is one of a 3-part summary about what I learned from World of Watson (WoW). All articles are independent and you do not have to read them in a specific order. In this part, I am covering concepts that explore the future of big data, data lakes, governance, and data wrangling. A more personal summary can be read on my blog (JGP.net), with some restaurant suggestions (just a few). The second part is focusing on Informix, its usage and how active it is in the IoT world.

Governance is Everywhere

You could argue that this isn’t breaking news, but before this conference, I had the feeling that it was a vain wish: it was a separate project that was not integrated in the business processes. As an example, this is exactly what Vanguard did when they started, more than 10 years ago, with their own DG (data governance) 1.0 tool.

Backfill Process

Vanguard is a pioneer in the whole governance process. They started, like everyone, with Excel and other such office tools, switching to a full integrated process using IBM InfoSphere Governance Catalog (IGC).

They now use Open IGC (InfoSphere Governance Catalog) for its extensibility and thanks to this journey they pioneered, they have a serious experience in metadata management and governance. It reassures me as one of their 401k customers.

Thinking of metadata management, the “M” word is no longer a mystery. Proper governance requires cleansed and reliable metadata. For this reason, a lot of the tools integrate more and more of those features as part of their standard operating procedures.

And it’s a good feeling to see that I am not the only person working on the subject.

iData nervous system with integration & Governance

More than a long term vision, IBM demonstrated some of those concepts in their new Data Connect product.

Architecture and Methodology overview

Open IGC supports extensibility (hence its “Open” prefix). If you look closely you won’t see Zaloni’s data lake management platform (more on that in a future post).

The industry is definitely more and more in need for such integration, automatic metadata management is becoming key to governance.

Data Lenses and the Future of Machine Learning

Last week, Jane came to Carrboro High School wearing a green top with a pink skirt. Paul, who likes Jane, complimented her on the choice of cloth, but Julie who witnessed the scene thought he was making fun of her bestie, who’s going out with Philipp. But as Julie, who has a little crush on Paul who only has eyes for Jane, reported the incident to Philipp. Of course, Philipp did not like that, and they went behind the gym to solve the issue.

So, from a student point of view, this is a normal high school drama, while for the admin staff it was bullying, even if Philipp and Paul solved their issues with a battle of “Magic the Gathering”.

This was the theme of the example that MIT Media Lab director, Joichi Ito, used to explain the concept of data lenses. Of course, this noble institution is not spending all this energy to solve high school dramas, but rather to enhance the output of analytics and, very precisely, the idea of giving a “job experience” feeling in traditional Machine Learning techniques.

So what’s a typical use-case? Imagine an experienced cop (not starting any debate here). He has this knowledge based on his experience, he sees clues, where we would not see anything, he knows where to look for indices. The idea of the data lenses is to build this prism through which the machine will see data differently.

Fututre of Machine learning

Joi Ito is reminding us that we do not all have a PhD.

What does it mean concretely? You tint your Machine Learning model with the experience of the professional.

Cloud, Cognitive, and Analytics

These are the 3 keywords you should remember from World of Watson 2016.

IBM believes in Cloud, which some might say is not really surprising. I strongly believed in Cloud even before it was called Cloud. And really, as keynote speaker Tom Friedman, and three time Pulitzer Prize winner: “This ain’t no cloud, folks. This is a technological supernova, the explosion of a star. And we know what happens with the explosion of a star — it’s the center of everything”.

Tom Friedman- IBM

It sure is a high-level view and it needs to be drilled down into concrete implementations, but everybody is working in some kind of cloud. My biggest belief is that hybrid clouds will be the predominant architecture for the next 5 years. This means that your software needs to be aware of this and benefits within. Not going for a shameless Zaloni-promotion here, but this is exactly the idea behind DLM (Data Lifecycle Management) where their Platform can archive data in the cloud when it’s cold.

Cognitive is just about AI. But, AI stands less and less for artificial intelligence, rather it stands for augmented intelligence. As augmented reality displays additional information on your screen, augmented (aka extended) intelligence will help you make better decisions.

Augmented Reality- Pokemon
An example of Augmented Reality in Pokémon Go, a female Nidoran is walking on my desk as I work on metadata architecture.

Thanks to smarter applications that can pre-analyze your data, your analytics will get smarter, more impactful. This brings me to what was the biggest insight of the conference: Just as IBM did with Linux a few years ago – phasing out all their operating systems in favor of Linux – IBM now defines Spark as an Analytics Operating System.

IBM- Spark

Rob D. Thomas, VP Product Development IBM Analytics, and Adam Kocoloski, CTO for Data Services, co-founder of Cloudant, on Spark as an Analytics Operating System.

For me, this is a huge step forward and confirms that our choice of using Apache Spark as our underlying transformation engine for the Zaloni Arena DataOps platform is the way to go. I look forward to embracing even more Spark features in our products (but I can’t share more for now).

Stay tuned. Not exactly everything that happens in Vegas stays in Vegas.

about the author

This team of authors from Team Zaloni provide their expertise, best practices, tips and tricks and use cases across varied topics incuding: data governance, data catalog, dataops, observability, and so much more.