Mobile Monitoring Solutions

Search
Close this search box.

Podcast: Obeya Rooms for Transparency, Collaboration and Communication

MMS Founder
MMS Carol McEwan Greg Woods

Article originally posted on InfoQ. Visit InfoQ

Subscribe on:






Transcript

Introductions [00:20]

Good day folks. This is Shane Hastie for the InfoQ engineering culture podcast. Today I’m sitting down with Carol McEwan and Greg Woods, Carol and Greg are from iObeya. First of all, welcome Carol and Greg. Thanks for taking the time to talk to us.

Carol McEwan: Thanks for having us.

Greg Woods: Yeah, thanks for having us.

Shane Hastie: What’s this Obeya thing? It’s a term we’re starting to see bandied around a bit. But for the benefit of our audience who might not have come across it or who are just seeing the word. What does it mean? Where is it from?

What is an obeya space? [00:53]

Greg Woods: Obeya comes from the Japanese term for large room. And this large room concept was developed by Toyota in the nineties, as they were developing the Prius, they were looking for a new way of visualizing work, visualizing communication, and understanding that they had to come up with something very innovative in a shorter time span and under a budget crunch as well. They were looking to find a new way of working and so they develop the Obeya space and sometimes folks refer to that as the war room and it’s a way of visualizing work so that people can stand in front of the wall or stand in front of the board and have a conversation about what’s happening there. So it’s a way of improving and accelerating dialogue about what needs to get done.

Shane Hastie: How does that differ from something that many of our audience will be familiar with the scrum board or the Kanban board?

Carol McEwan: Yeah, so it’s not a whole lot different, it’s just walls across an enterprise. It takes it up a notch, right. So we know at a team level a scrum board, or we know the team level, a Kanban board. But think about, as you scale, how do you know how you have work dependencies across a team or a team of teams. How can you then start to visualize that work to know, am I taking on too much or am I taking on too little or, oh, I didn’t know that team over there was working on X when I’m actually working on it as well. So maybe we can work together on that. Think about scaling your practices beyond just the board for your team.

Greg Woods: The Obeya may include other walls, such as a visualization of, if you’re looking at the enterprise level, all a visualization of a strategic plan and some of the KPIs that are associated with that and then, as Carol was talking about, getting more detail into the execution of work, Kanban also comes from the Japanese terminology of card. It’s a trigger. It’s a visual cue to know essentially when to move work from one place to another, or when in the case of physical space, a bin might be empty and needs to be refilled or a bin is full and needs to be moved to the next workstation. So whether you’re working in the physical space with a card system like that or virtually with a Kanban board or a scrum board, it’s that visual cue to make progress, to continue executing the work that needs to get done.

Shane Hastie: Greg, you mentioned in the conversation we had before, that your background in particular is lean. So how does this Obeya support lean thinking and lean ways of working in organizations?

The application of lean ideas through obeya [03:43]

Greg Woods: That’s a great question. So lean at the heart of it is ensuring the customer gets what they want when they want it and at the price that they want it. So it’s all about connecting the dots and creating a seamless workflow from the point of your supplier, to your customers. So your classic kind of SIPOC diagram, suppliers, inputs, outputs, processes, and customers. And so this is all connected through visual cues. As I was mentioning about the Kanban, these are the triggers of how do we make sure work progresses as efficiently and effectively as possible.

Shane Hastie: And Carol we’ve known each other for a while, but we’ve come from a very strong agile background. Is there anything specific from an agile perspective that would add to that explanation?

Obeya supports agile approaches [04:31]

Carol McEwan: Absolutely. I’m learning a whole lot more from Greg since I joined iObeya, and then from my knowledge of the agile space, I think what we’re finding is this difference between business agility or doing agile and being agile. Also what practices you are using as a practitioner in the work that you’re doing, those lines between agile and lean are getting a little fuzzier as are going along, I believe Kanban is a great example because some people don’t know in the agile space that Kanban came from lean, right? And so there’s a lot of practices that we’re doing in the agile space that came from lean that have come across to agile. And I see that same thing happening with the Obeya rooms and being able to tie that understanding for the individual to see how is the work that I am doing adding value to the organization’s expectations or their strategy.

If I make this decision down here, I know based on what I’m seeing in the Obeya room, that that decision will trickle all the way up and add value, or not. Sometimes decisions change and in agility we want to be agile, right? We want to be able to make decisions fast and change appropriately as the need arises. And this is an opportunity in an Obeya room to do that. And that’s what excited me so much about joining this organization was because of not looking at lean or not looking at agile so much, but looking at how can we be effective as an entire enterprise and have transparency from top to bottom and bottom to top,

Shane Hastie: I can see a room full of various boards helping but what does it take at a cultural perspective to make that happen?

The culture changes needed to make this effective [06:22]

Greg Woods: That’s a great question. I think Carol hit on it just a moment ago and it’s the notion of transparency as once you stick something up on a wall, it’s there for people to see whether it’s in the physical space or in the digital space. So from a cultural perspective, certainly leaders need to be comfortable with the fact that they’re sharing information that maybe they weren’t sharing before. And by sharing some of that information, they may get challenged on the information, or people may ask questions about decisions being made. So there’s a degree of comfort or confidence or both that leadership needs to have when they start to share information more openly like that. And it brings a greater degree of accountability, not only for the person say whether they’re part of the project team, trying to advance a task on a project or a developer writing a piece of code, but it’s a two-way street.

So leaders are going to be held accountable for, of things on their boards as well. And they’re visible and they’re out there. So from a cultural perspective, really being able to have that comfort and confidence to be able to stand in the space of, we’re putting it all out there and it’s transparent and we need to be able to have open and honest conversations about those things. So Key skills are your active listening skills, being able to give effective feedback, listening to effective feedback, and all of those pieces play into the kind of culture that transforms an organization when you make things transparent throughout the organization.

Carol McEwan: Yeah, and I couldn’t agree more. And I also think the other big thing is trust, right?

Greg Woods: Absolutely.

Leadership buy-in and support is not enough – active leadership engagement is crucial [07:59]

Carol McEwan: Trust huge. And I like to go back to, the state of agile report, right? And if you take a look at the state of agile report, the culture is always a big thing. Culture is king when you’re trying to do this kind of stuff, but they said that in the 14th state of agile report that a new choice this year is that there’s not enough leadership participation in order to help with this. So I think you don’t need leadership buy-in and support. You need leadership participation and engagement. And I was so excited when I saw that report because I know it’s true. We can’t say do as I say, it’s do as I do. So if you’re going to have a good, transparent wall of walls to walk, everybody has to participate in that. And it does open up that transparency and it just brings to light the decisions that are being made throughout the organization. So I think mind shift, change, mindset change. That’s what’s going to help us get there too.

Greg Woods: And I think we need to remember that culture is an outcome. It’s a collective outcome of the behaviors of the people in the organization. I always wonder about initiatives that are out there to “change the culture”. When the initiative is really about changing the behavior. And then however you change the behavior, whatever behaviors are enforced or reinforced or encouraged, that’s the culture that you’re going to get. Culture is an outcome. It’s not just a magic kind of state of being. It’s what you do day in, day out. It’s how you talk to your people day in, day out. It’s how leaders treat their people day in, day out. That’s what culture is. And that’s where it comes from.

Shane Hastie: Johan Rothman has said to me on more than one occasion, that culture is the lowest common denominator of the behaviors that you accept in the organization. What are the behaviors that we need to bring in to start to embody this transparency, to make something like an Obeya effective as a communication tool?

Behaviours that reflect and influence culture [10:02]

Greg Woods: I think one of the classics that you go back to is Stephen Covey-ism of “seek first to understand”, instead of judging what you see on the wall seek to understand why is that on the wall, help me understand that what that card is telling me and seeing if a card is still in the to do column instead of in the progressing column, or if it’s got data in there that says it’s “late”,  jumping to the conclusion of it’s late and seeking a punishment behavior mindset, shifting to the, ask the question first, seek first to understand, why is that card late? Or why is that card not progressed? And more specifically from a leadership perspective, it’s asking the question, if something’s not going, like it’s expected to, what can I do to help? What support can I provide in order to help my team progress. Moving from that place of sitting in judgment of why something hasn’t happened to the place of support to help make sure that it does happen.

Carol McEwan: Couldn’t agree more, Greg, I like to tell a quick story about a BVIR (a big visual information radiator). That’s really also how in the agile space, we might suggest similarity to a visual management system or an Obeya room. But I was part of an opportunity where we were taking agility into a school and teaching children and teachers. And this was a school that these kids needed a little bit more support than maybe a mainstream channel school. And they thought they put the marks up on the wall. Well, let me tell you that didn’t go down very well. However, when they stopped and thought about it, and they iterated and they inspected and adopted to the situation. What they did for the next semester was they put a BVIR, or they put all marks up on the wall, but they started every single child at exceeds expectations.

They all started at a hundred percent. So now what happened when a child, hey I’m a hundred percent and I’m doing great. And I don’t want that slip. And I don’t want anybody else to see that slip. So what happened was some children did obviously, but rather than the other kids berating them for having bad grade, they said, gosh, what happened? How can we help? Hey, I’m good in math. Can I help you? Hey, I’m good in something else, can I help you? And it became a team effort and they swarmed to help each other. And I see that same thing happening in teams in this situation too. Iinformation is not there to berate people. It’s so there to figure out where we can help the best in situations. So make sure we’re providing the right information. It’s not all the information, it’s the right information, so that we can make the right decisions to support the right people at the right time to get the right value. At the end of the day.

Greg Woods: I love that story that makes me think of one of the lean truism out there or phrases that we often hear is, you attack the problem or you attack the process, not the person you go after the piece that is objective. I have objective data about how my process is performing or marks in this case, and you’re not attacking the person, but you go after the process that’s been created, that’s an outcome that isn’t the outcome that you want. So let’s go look at that. Let’s not look at Joe, the operator or Sally, the student let’s figure out how do we really go after that, that we can attack, what can we change.

Carol McEwan: Right.

Shane Hastie: Building on that. There’s something intriguing there in the way that information was communicated. So there was an interpretation layer on top of the raw data. How, as leaders, do we figure out the right interpretation for that?

The importance of representing the data in a way that is useful and meaningful [13:44]

Carol McEwan: I think Greg hit on it a little bit ago. We can all jump to conclusions, but we have to seek to understand. We have to learn. We all have our own personal biases about things. And when we see one thing, we might jump to conclusion of something else. But when the information is presented, it’s only an opportunity to have conversation. And you cannot say based on that data, this is it. You have to be able to use these Obeya rooms as an open opportunity for conversation to be had. And that’s the power of Obeya rooms and opening up that level of communication. Because I think if you go back to the state of agile report, but I think one of the other things that’s an issue is communication between leadership and the people doing the work, or when you are trying to align strategy with execution, how are you doing that? How are you communicating those things and making sure that it’s only a level to open up a line of communication.

Greg Woods: And I think from a visual management perspective, that communication is when you take complex data or complex information and you create a simple visualization out of it. I like to say that the graph is the start of the conversation. When you look at the trend line and if it’s going up and up is good, then starting the conversation with what do doing well that’s got our trend line going in the right direction, or conversely, if the trend line is going down and that’s a representation of, that’s not exactly what we look to be achieving, what’s happening here? What’s happened over time. Let’s go investigate, what’s driving performance in the wrong direction. So being able to build on what Carol was saying, communication, and can start with that visual representation and that visual representation is the trigger for the conversation that needs to be had. How do we keep getting better? Or how do we course correct and get back to where we need to be?

Shane Hastie: So we’ve been talking about the physical Obeya spaces. iObeya is a virtual representation of that. How does this translate into the virtual space and how do we get the same feeling of absolute transparency that you get from being able to walk into a room and look around the walls? How does this translate into the virtual world that so many of us work in today?

Working with virtual Obeya spaces [16:07]

Greg Woods: That’s a great question. And that really strikes at the heart of how the company got started. A major auto manufacturer came to us and said take our flip charts and our sticky notes and put it into a digital environment. And the premise of how iObeya is structured is taking and basing the digital interaction on rooms. We call them rooms and boards. So you can step into a room or you can have permission to access a particular room. And then the boards are much like you’d see either on a wall or the flip charts that are hung in the physical space. And we have dividers that you can envision a corner of the room is a different color. So your left hand wall is on the left hand side of your room, and you’ve got your strategy.

And then there’s divider, which could mimic your physical corner of the room, and then you go into the next section of the room. So we’ve tried to create a structured space, just like the physical space. So it’s much, much easier to adapt and adapt into that digital space because the essential look and feel is like the physical space.

Carol McEwan: And one thing that I’d love to explain too, is the fact that you could think about the advantages of having a digital room over a physical room, right? Like sticky notes don’t fall off the wall, or you don’t have to roll up your program board because you got to move it to another room because you got too big. You could expand that as well. Or now you don’t even have to rewrite a card because you could synchronize a card and share them across rooms. So the value that this virtual room brings, it’s fun to work in too. So it’s unique, it’s flexible. And your imagination is the only thing that limits you sometimes because it is it’s really fun to work in. So it brings an added element to the opportunity.

Asynchronous collaboration over time and distance [18:02]

Greg Woods: I think another couple of key points there to build on what Carol is saying is, is when you’re in that space, it’s the ability to collaborate. Not only at the same time. So Carol and I could be working in the same digital space at the same time or since we’re located in different time zones. And I’ve got a client that I’m working with, that they have a facility on the West Coast (USA), a facility in South Korea and a facility in Singapore, obviously different time zones, but they’re all in the same digital space so they can work on boards. What we like to call is asynchronously.

So if the folks on the west coast make a change to the board, when the folks wake up the next day in Singapore, they can see the changes and they can continue work. So it’s a fantastic way of being able to engage your top talent in the organization, no matter where they are on the face of the earth, they can log in from New York. They can log in from Denver. They can log in from Singapore. They can log in from Paris. And if you can manage all those time zones to get folks together at the same time, you can do that. Or you can literally pass the work around the world and people can be in the same space, working on whatever needs to get done. And quite frankly, the work never has to stop because it’s always there. It’s always present. It’s always live and you can always interact with it.

Shane Hastie: If we can take this to an example that will hopefully resonate with our audience. If we think of an organization that’s got a number of software engineering teams working either in parallel on single products or on complimentary products, streams of work, what would an Obeya space look like for them? What are some of the things that would be up on those walls, whether they’re virtual or physical?

What an Obeya space for a software organisation could look like [19:45]

Greg Woods: The basic foundational piece is the board. So whether, physical, we might equate that to like a flip chart or maybe a whiteboard in our space. That’s a digital board as well as we have your basic sticky notes or what we also like to call cards that have more detailed information in them. So a description of the task, an owner, a due date, maybe a checklist of things that need to get accomplished. And not only do we have the capability of putting up these sticky notes onto a blank board, but we can have both what we call a static background, where you could have different images on there. So just as an example, the first one pops to mind is if you want to do an empathy map with your team, we can create the static background. We have a background catalog that includes an empathy map.

And so boom, right there got your map to work with. And then you can operate on top of it. As we talked about before the same sort of process you can engage with Kanban boards and we have Kanban boards that are dynamic so that we can actually count the number of cards in a particular column, whether it’s your, to do in process or done, however you structure your Kanban and set up a particular amount of whip in each of these columns as well. So if you only want five tasks in your to-do column and you’ve got six tasks listed that whip box or that whip visualization turns red. So you’ve got a number of different ways that you can actually interact with and design your workflows and design how you’re interacting with your information. And you can even create custom backgrounds as well. So it’s not only what’s available in the existing catalog, but if you have a particular way that you like to manage your work or your workflows, those are available to be customized for yourself.

So there’s a whole host of different ways that you can manage your teams, your projects, your work. We like to say that we’re a framework. We don’t say that you have to do lean a particular way and you have to use these boards in this order and do it this way. That’s not our design. Our design is a framework with a powerful toolkit. Then you’re able to manage the work, how you want to. So what would something look like? Imagine how you’re doing it today, just doing it in a digital space.

Carol McEwan: Or your retrospectives or problem solving workshops.  Also the voting, the voting is really cool that I’ve seen, different ways of doing it anonymously and those kind, you get to vote. But I remember doing some problem solving workshops where people were always afraid to vote on something if their leader had gone up and voted already. So the opportunity to vote anonymously and then see all of them collected all at once and things like that even makes it that much more powerful. But like Greg said too, just the way that became was to say, let’s just take everything that was analog and make it digital first and then make it better from there. But those are some of the things that I think make it special.

Using virtual spaces helps encourage diversity and inclusion [22:38]

Greg Woods: I appreciate what Carol’s saying there too, from a voting perspective and what that means to the group dynamic and how do you bring out different ideas? And so I think we’re just touching on the tip of the iceberg here, when it comes to, how do we more effectively facilitate conversation? How do we bring more diversity and inclusion into the digital space? If you’re not seeing what the other person looks like or what their position is in the organization, you can certainly be much, much more inclusive by saying, we’re all contributing here. It doesn’t matter where you’re from, what language you speak, what you look like, however you look to slice it. I think there’s an element of being more powerful or being able to create a more powerful environment that really contributes to diversity and inclusion as we move forward.

Carol McEwan: And the one other thing that I’d like to mention too, is you really get to, to see the incremental improvement that you’ve made along the way, because now you have a record, a source of truth that is digital now that you can keep and you could see the improvements across, whereas before you just kind of know, oh yeah. Remember how it used to be back in the day. Now you can actually look at this and say, look at that. Look how far we’ve come. You know, last year we were here and now we’re here and that’s pretty powerful. I think as a team to celebrate those successes that they’ve made. And we all know that’s what we makes people want to get up and go to work every day to feel like their part of the team and part of the success and the value that they’re bringing to the organization.

Greg Woods: I really appreciate what you’re saying there, because that brings us back full circle to the culture conversation. Celebration and recognition are keys to creating the type of culture, positive environment that organizations need and want to create to help make sure that they’re engaging their folks and getting the best out their folks.

Shane Hastie: Thank you so much. Some really interesting points and some good advice in there. Carol, Greg, if people want to continue the conversation, where do they find you?

Carol McEwan: That’s a great question. They can find cmcewan@iObeya.com.

Greg Woods: And I’m just gwoods@iObeya,com and certainly folks are welcome to check out our website, which is of course just www.iobeya.com.

Shane Hastie: Thanks so much.

Carol McEwan: Thank You.

Greg Woods: Thanks for having us, Shane.

Mentioned

QCon brings together the world’s most innovative senior software engineers across multiple domains to share their real-world implementation of emerging trends and practices.
Find practical inspiration (not product pitches) from software leaders deep in the trenches creating software, scaling architectures and fine-tuning their technical leadership
to help you make the right decisions. Save your spot now!

.
From this page you also have access to our recorded show notes. They all have clickable links that will take you directly to that part of the audio.

Subscribe for MMS Newsletter

By signing up, you will receive updates about our latest information.

  • This field is for validation purposes and should be left unchanged.


Which Enterprise Software Stock is a Better Buy? -Breaking – Street Register

MMS Founder
MMS RSS

Posted on mongodb google news. Visit mongodb google news

Inuit vs. MongoDB: Which Enterprise Software Stock is a Better Buy?© Reuters. MongoDB and Inuit: Which enterprise software stock is better?

Prominent companies are thriving in the enterprise software space due to continued digital transformation, adoption of hybrid lifestyles, and a growing market for it. Use Intuit (INTU). MongoDB Their solutions should be in high demand (NASDAQ:). Which stock is better to buy right now? Find out more. Intuit Inc., NASDAQ:, provides compliance and financial management products and services worldwide for small and medium-sized businesses and individuals. The Mountain View, Calif., company operates in four segments: Small Business & Self-Employed; Consumer; Credit Karma; and ProConnect. MongoDB, Inc., New York City (MDB), provides a global general-purpose database platform. MongoDB Enterprise Advanced (NYSE:), MongoDB Atlas (NYSE.) and Community Server are some of the products offered by this company. The company also offers professional services such as consulting and training.

Threats related to data security, especially on cloud-based platforms, continue to hamper the enterprise software market’s growth. However, there is still a strong chance that the market for enterprise software will grow quickly in the months ahead due to increased demand from nearly every industry. This comes as part of the wider digital transformation effort. A resurgence within COVID-19 cases is causing a revival in hybrid working arrangements. This is proving to be a benefit for enterprise software. A Statista report states that the global enterprise software market will grow by 8.74% between 2021 and 2026. Therefore, INTU and MDB both should be benefited.

INTU’s shares have gained 15.8% in price over the past month, while MDB has returned 0.5%. Also, INTU’s 65.6% gains over the past nine months are significantly higher than MDB’s 27.4% returns. Furthermore, INTU is the clear winner with 80.1% gains versus MDB’s 41.3% returns in terms of their year-to-date performance.

Continue reading on StockNews

Disclaimer: Fusion MediaWe remind you that this site does not contain accurate or real-time data. CFDs are stocks, indexes or futures. The prices of Forex and CFDs are not supplied by exchanges. They are instead provided by market makers. As such, the prices might not reflect market values and could be incorrect. Fusion Media does not accept any liability for trade losses that you may incur due to the use of these data.

Fusion MediaFusion Media and anyone associated with it will not assume any responsibility for losses or damages arising from the use of this information. This includes data including charts and buy/sell signal signals. You should be aware of all the potential risks and expenses associated with trading in the financial market. It is among the most dangerous investment types.

Article originally posted on mongodb google news. Visit mongodb google news

Subscribe for MMS Newsletter

By signing up, you will receive updates about our latest information.

  • This field is for validation purposes and should be left unchanged.


Building Large-Scale iOS Apps at Airbnb

MMS Founder
MMS Sergio De Simone

Article originally posted on InfoQ. Visit InfoQ

The Airbnb iOS team addressed the challenge of its growing mobile app codebase and complexity by adopting new tools and processes, including a modern build systems, module types, and dev apps.

The first hurdle the Airbnb team had to circumvent was Xcode slowness, especially when indexing files and building code, and its “unfriendliness” to code versioning systems:

Not only are Xcode project files challenging to review in pull requests, but the incidence of merge conflicts and race conditions in these project files increased with a larger team of engineers moving at a high velocity.

Xcode is very powerful and provides an unmatched level of integration with iOS and other tools by Apple, but it lacks a few features that were key to Airbnb engineers, such as a network cache for build artifacts, a query interface for the build graph, and a way to add custom steps as dependencies.

To deliver all these features to developers, Airbnb chose Facebook Buck, a build system based on the same Starlark language used by Google’s Bazel. Buck allows to generate Xcode workspaces from a declarative build graph, thus ensuring a level of optimal integration with the iOS development ecosystem. Instead of using Xcode native build system, which is not able to take advantage of artifact caching, Airbnb extended Buck so it generates Xcode projects that seamlessly invoke Buck for the build. This makes the build step up to 5-6 times faster when Buck HTTP cache is enabled.

Besides improving their infrastructure, Airbnb engineers designed a discovery-oriented organizational structure for their codebase by organizing modules into groups called module types. Each module type has a set of visibility rules that define which dependencies are allowed between modules of that type.

For example, a module type is a feature, which is a UIViewController, which may include child UIViewControllers but not other features. Features can only communicate with one another through feature interface, which is another module type with broader visibility.

According to Airbnb engineers, module types act as a table of contents for they entire codebase and immediately convey their meanings.

A third innovation to Airbnb iOS infrastructure is Dev Apps, which are on-demand, ephemeral Xcode workspace for a single module and its dependencies.

The popularity and success of both Android and iOS Dev Apps derive from a simple axiom: minimizing your IDE scope to only the files that you are editing tightens the development loop.

Dev Apps are generated using a command line tool which uses Buck to find out which files are required to build a requested module. The tool also generates a container app that hosts the feature and makes it runnable. Most Dev Apps can be built in less than two minutes, say Airbnb engineers.

This overall approach, which led to defining nearly 1,500 modules, also made it possible to effectively enforce code ownership and improve test coverage.

Airbnb approach to building iOS apps at scale involves much more than it can be covered here, so do not miss the original write up if you are interested.

Subscribe for MMS Newsletter

By signing up, you will receive updates about our latest information.

  • This field is for validation purposes and should be left unchanged.


Inuit vs. MongoDB: Which Enterprise Software Stock is a Better Buy? – Entrepreneur

MMS Founder
MMS RSS

Posted on mongodb google news. Visit mongodb google news

Because the enterprise software market is booming with continued digital transformation and adoption of hybrid lifestyles, prominent companies in this space, Intuit (INTU) and MongoDB (MDB), should witness increasing demand for their solutions. But which of these stocks is a better buy now? Read more to find out.

shutterstock.com – StockNews

Intuit Inc. (INTU) provides financial management and compliance products and services for consumers, small businesses, self-employed, and accounting professionals worldwide. The Mountain View, Calif., company operates in four segments: Small Business & Self-Employed; Consumer; Credit Karma; and ProConnect. In comparison, New York City-based MongoDB, Inc. (MDB) provides a general-purpose database platform worldwide. The company offers MongoDB Enterprise Advanced, MongoDB Atlas, and Community Server. It also provides professional services, including consulting and training.

Threats related to data security, especially on cloud-based platforms, continue to hamper the enterprise software market’s growth. Nevertheless, the enterprise software market is still expected to grow rapidly in the coming months on increasing demand from almost every industry as part of widespread digital transformation efforts. Furthermore, the resurgence of COVID-19 cases is leading to a resurgence in hybrid working arrangements, which is benefitting the enterprise software industry. According to a Statista report, the worldwide enterprise software market is expected to grow at an 8.74% CAGR between 2021 – 2026. So, both INTU and MDB should benefit.

INTU’s shares have gained 15.8% in price over the past month, while MDB has returned 0.5%. Also, INTU’s 65.6% gains over the past nine months are significantly higher than MDB’s 27.4% returns. Furthermore, INTU is the clear winner with 80.1% gains versus MDB’s 41.3% returns in terms of their year-to-date performance.

Click here to check out our Software Industry Report for 2021

But which of these two stocks is a better buy now? Let’s find out.

Latest Developments

On November 1, 2021, INTU completed its acquisition of Mailchimp, a global customer engagement and marketing platform for growing small and mid-market businesses. Sasan Goodarzi, CEO of INTU, said, “We’ll expand our AI-driven expert platform by integrating Mailchimp and QuickBooks in smart ways that will help businesses from start-up to scale-up grow and run with confidence.”

On November 4, MDB announced the company’s certification for compliance with ISO 27017:2015 and ISO 27018:2019 on top of its existing ISO 27001:2013 certification, and completion of its Cloud Security Alliance Security Trust and Risk Level 2 certification. These certifications demonstrate the maturity of MDB’s processes and programs against the highest international standards and further demonstrate its commitment to ensure customers’ security and privacy.

Recent Financial Results

INTU’s revenue increased 52% year-over-year to $2.01 billion for its fiscal first quarter, ended October 31, 2021. The company’s non-GAAP operating income grew 66% year-over-year to $555 million, while its non-GAAP net income came in at $423 million, representing a 69.2% year-over-year increase. Also, its non-GAAP EPS was $1.53, up 63% year-over-year.

MDB’s net revenue increased 44% year-over-year to $199 million for its fiscal second quarter, ended July 31, 2021. However, the company’s operating loss grew 45.6% year-over-year to $72.50 million, while its net loss came in at $77.10 million, representing a 19.5% year-over-year increase. Also, its loss per share was  $1.22, up 10.9% year-over-year.

Past and Expected Financial Performance

INTU’s revenue and total assets have grown at CAGRs of 18.9% and 44.5%, respectively, over the past three years. Analysts expect INTU’s revenue to increase 26.2% in its fiscal year 2022 and 14.6% in its fiscal 2023. The company’s EPS is expected to grow 164.7% for the quarter ending January 31, 2022, and 19.4% in fiscal 2022.

In comparison, MDB’s revenue and total assets have grown at CAGRs of 50.1% and 50.9%, respectively, over the past three years. The company’s revenue is expected to increase 37.7% in its fiscal year 2022 and 31.3% in fiscal 2023. However, Its EPS is expected to decline 3% for the quarter ending January 31, 2022, and 14.1% in its fiscal year 2022.

Profitability

INTU’s $10.32 billion trailing-12-month revenue is significantly higher than MDB’s $702.17 million. INTU is also more profitable with EBITDA and net income margins of 28.01% and 20.28%, respectively, compared to MDB’s negative values.

Furthermore, INTU’s 27.93%, 12.80%, and 15.64% respective ROE, ROA, and ROTC, compare with MDB’s negative values.

Valuation

In terms of trailing-12-month EV/S, MDB is currently trading at 46.99x, which is 151.3% higher than INTU’s 18.70x. And  MDB’s 49.78x trailing-12-month P/B ratio is 159.3% higher than INTU’s 19.20x.

So, INTU is relatively affordable here.

POWR Ratings

INTU has an overall B rating, which equates to a Buy in our proprietary POWR Ratings system. In contrast,  MDB has an overall rating of D, which translates to Sell. The POWR Ratings are calculated by considering 118 distinct factors, with each factor weighted to an optimal degree.

INTU has an A  grade for Sentiment, which is consistent with analysts’ expectations that its EPS will increase significantly in the coming months. In comparison,  MDB has a C grade for Sentiment, which is consistent with analysts’ expectations that its EPS will decline in the near term.

Moreover, INTU has an A grade for Quality. This is justified given INTU’s 0.84% trailing-12-month asset turnover ratio, which is 31.8% higher than the 0.64% industry average. MDB has a Quality grade of C, which is in sync with its 0.39% trailing-12-month asset turnover ratio, which is 39.4% lower than the 0.64% industry average.

Of the 168 stocks in the Software – Application industry, INTU is ranked #23, while MDB is ranked #134.

Beyond what I have stated above, we have also rated stocks for Stability, Momentum, Growth, and Value. Click here to view all the INTU ratings. Also, get all the MDB ratings here.

The Winner

Rapid digital transformation and increasing cloud demand are expected to drive the enterprise software market’s growth for an extended period. So, both INTU and MDB are expected to benefit. However, we think it is better to bet on INTU now because of its superior financials, lower valuation, and higher profitability.

Our research shows that odds of success increase when one invests in stocks with an Overall Rating of Strong Buy or Buy. View all the other top-rated stocks in the Software – Application industry here.

Click here to check out our Software Industry Report for 2021


INTU shares were trading at $688.43 per share on Friday morning, up $4.43 (+0.65%). Year-to-date, INTU has gained 82.46%, versus a 24.99% rise in the benchmark S&P 500 index during the same period.


About the Author: Nimesh Jaiswal

Nimesh Jaiswal’s fervent interest in analyzing and interpreting financial data led him to a career as a financial analyst and journalist. The importance of financial statements in driving a stock’s price is the key approach that he follows while advising investors in his articles.

More…

The post Inuit vs. MongoDB: Which Enterprise Software Stock is a Better Buy? appeared first on StockNews.com

Article originally posted on mongodb google news. Visit mongodb google news

Subscribe for MMS Newsletter

By signing up, you will receive updates about our latest information.

  • This field is for validation purposes and should be left unchanged.


Podcast: Meenakshi Kaushik and Neelima Mukiri on Responsible AI and Machine Learning Algorithm Fairness

MMS Founder
MMS Meenakshi Kaushik Neelima Mukiri

Article originally posted on InfoQ. Visit InfoQ

Subscribe on:






Transcript

Introductions 

Srini Penchikala: Hi everyone. My name is Srini Penchikala. I am the lead editor for AI/ML and data engineering community at InfoQ website. Thank you for tuning into this podcast. In today’s podcast, I will be speaking with Meenakshi Kaushik, and Neelima Mukiri, both from Cisco team. We will be talking about machine learning algorithm bias, and how to make machine learning models fair and unbiased.

Let me first introduce our guests, Meenakshi Kaushik currently works in product management team at Cisco. She leads Kubernetes and AI/ML product offerings in the organization. Meenakshi has interest in AI/ML space and is excited about how the technology can enhance human wellbeing and productivity. Thanks, Meenakshi, for joining me today. And Neelima Mukiri is a principal engineer at Cisco. She is currently working in Cisco’s on-premise and software service container platforms. Thank you both for joining me today in this podcast. Before we get started, do you have any additional comments about your research and projects you’ve been working on, that you would like to share with our readers?

Meenakshi Kaushik: Hi everyone. My name is Meenakshi. Thank you, Srini, for inviting Neelima and me to the podcast. And thank you for that great introduction. Other than what you have mentioned, I just want to say that it is exciting to see how machine learning is getting more and more mainstream. And we see that in our customers. So, this topic and this conversation is super important.

Neelima Mukiri: Thank you, Srini, for that introduction and thank you, Meenakshi. Definitely we are very excited about the evolution of AI/ML, and especially in the Kubernetes community. How Kubernetes is making it easier to handle MLOps. I’m excited to be here to talk about fairness and reducing bias in machine learning pipelines, as ML becomes more pervasive in the society. That’s a very critical topic for us to focus on.

Srini Penchikala: Thank you. Definitely I’m excited to discuss this with you today. So, let’s get started. I think the first question, Meenakshi, maybe you can start us off with this. So, how did you get interested in machine learning, and what would you like to accomplish by working on machine learning projects and initiatives?

Meenakshi Kaushik: Machine learning has been there for a while. What got me excited is when I started seeing real world use cases getting more and more deployed. So, for example, I remember a much change when I saw Amazon’s recognition, and it could recognize facial expression, and tell you your mood. What I took back from that is, “Oh, isn’t that helpful? You can change somebody’s mood by making them aware that today you’re not looking so happy.” So, that was pretty groundbreaking. And then more and more applications came along, especially in image recognition, where you could tell about patients’ health, and that became more and more real. And we, as Neelima pointed out, that got mainstream even with our customers, with the evolution of Kubernetes and Kubeflow. So, both these spaces together, where it became easier and easier to enable data scientists and even ordinary folks to apply machine learning in day to day, really got me excited. And this evolution is progressing, so I feel very happy about that.

Srini Penchikala: How about you, Neelima, what made you get interested in working on machine learning projects, and what are your goals in this space?

Neelima Mukiri: I’ve always been interested in AI/ML and the possibilities that it opens up. In the recent past years, advances in ML have been so marked compared to 10 years before. There’s so much improvement in what you can do, and it’s so much more accessible to every domain that we are involved in. The possibilities of self-driving cars, robotics, healthcare, all of these are real world implications that have a chance to affect our day to day lives. In addition to just how exciting the field is, being involved in Kubernetes and in Kubeflow as part of Cisco’s container platforms, we’ve seen our customers be very interested in using Kubeflow to make ML more accessible. And as part of that, we’ve started working on AI/ML in the context of Kubernetes.

Define AI fairness and examples of fair AI solutions

Srini Penchikala: Yeah. Definitely, kubernetes brings that additional dimension to the machine learning projects, to make them more cloud native and elastic and performant, right? So, thank you. Before we jump into the machine learning bias and fairness, which is kind of main focus of your discussion here, can you define AI fairness? What does it mean by AI fairness? And talk about a couple of examples of fair AI solutions, and an example of ML solution where it hasn’t been fair.

Meenakshi Kaushik: The fairness comes into the picture when you start subdividing the population. So for example, this is an example which we gave even in our KubeCon presentation, where let’s say a bank is looking at giving loans to the entire population. And it decides that 80% of the time it is accurate. So, overall in the population, things behave normally. But when you start looking at subsection of population, you want to see whether the subsection of population are equally getting represented in the overall decision making.

So, let’s say if 80% of the time, the loan application gets accepted, if you started slicing and dicing at a broad level between, let’s say, male and female, 80% of the time equally, do they get accepted? Or within the population where people who had previous loans, but defaulted, whether they get equally represented or not? So, fairness is about looking at a broad solution and then slicing and dicing into a specific subgroup, whether it is based on gender or racial differences or age differences. For example, with COVID vaccine, if it was tested on adults, it doesn’t work on children that well. So, it’s not fair to just push your machine learning data to children until you have looked at that population, and it’s fair to that population. So, fairness is about equity, and it’s really in the context of the stakeholder. A stakeholder decides at what level they want to define fairness, and what groups it wants to figure out, whether it is fair across or not.

Three sources of unfair algorithm bias

Srini Penchikala: That’s a good background on what is fair and what is not, right? So, maybe Neelima, you can probably respond to this. In your presentation at KubeCon conference last month, you both mentioned that the sources of unfair algorithm bias are data, user interactions, and AI/ML pipeline. Can you discuss more about these three sources and the importance of each of them in controlling the unfair bias and contributing to unfair bias?

Neelima Mukiri: So, data is the first step where you are bringing in real world information into a machine learning pipeline. So, let’s say you take the same example of deciding whether a person is creditworthy for a bank to give a loan or not. You populate the machine learning pipeline with data from previous customers, and the decisions made for the previous customers, whether to give them a loan or not. So, the data comes in from the real world, and it is filled with bias that is present in our real world because the world is evolving and what we consider as fair a few years back, may not be fair today. And we are all prone to prejudices and biases that we put into our decision making process. So, the first step in the machine learning pipeline is that collection of the data and processing the data, where we have the chance to evaluate and see what are the biases that are present.

For example, are there large subsets of the population that are missing in the dataset that we have collected? Or is the data skewed towards being more positive for some subset of a population? So, that’s the first step where we want to evaluate, understand and when possible, improve the bias that is present.

The next step is, in the machine learning pipeline as we build the model and as we are serving the model, every step of the pipeline we want to make sure that we are considering and evaluating the decisions that are made in the pipeline, the models that are built and the inference that is provided, to evaluate and see, is it being fair across the population set that you’re covering? And when you bring that decision to a user and present it, that can in turn reinforce bias by the user acting on it.

So, let’s say, you’re looking at policing and you are giving a wrong prediction of somebody being prone to commit a crime. It’s possible that the police will actually do more enforcement in that region, and that ends up assigning more people in that region as possible to create a crime, and then that feeds into your data and the cycle is reinforced. So, every step in the pipeline right from starting from where you collect your data, to building models, providing inference and then seeing how people act based on those inferences, is prone to bias. So, we need to evaluate and correct for fairness where possible.

AI/ML fairness toolkits

Srini Penchikala: Maybe, Meenakshi, you can respond to this one, right? So, can you talk about some of the AI/ML fairness toolkits that you talked about in the presentation, and why you both chose Kubeflow for your project?

Meenakshi Kaushik: As we were talking in the beginning of the presentation, we work in the Kubernetes domain. And although there are many machine learning lifecycle manager toolkits available on top of Kubernetes, Kubeflow has gained a lot of traction, and it’s used by many of our customers. It’s also pretty easy to use. So, we chose Kubeflow since it is one of the popular open source machine learning lifecycle manager toolkit. And really, what it allows you to do, it allows you to build all the way starting from your exploration phase into the production pipeline. It allows you to do everything. You can bring up a notebook and run your machine learning models, and then chain them together in a workflow and deploy it in production. So, that’s why we used Kubeflow.

And then on the machine learning toolkits, when we started this journey, we started looking at open source toolkits. And fairness is an upcoming topic, so there is a lot of interest, and there are a lot of toolkits available. We picked the four popular ones because they had a wide portfolio of features for fairness available. And the good thing is that they had many commonalities, but they also had interesting variations, so that it gives you a large variety of toolkits. So, let me quickly talk about the four toolkits. We started by looking at Aequitas. Aequitas fairness toolkit, I would say is the simplest toolkit when you want to get into. You just give your predictions and it will tell you about fairness. It would give you your entire fairness report. So, your prediction and your data and which population you want to look at fairness, the protected group, and it’ll just give you the data. So, it offers you an insight as a black box, which is pretty nice.

But what if you want to go next level deeper, or what if you wanted to do interactive analysis? In which case, what I found was that Google’s What-If Tool, was pretty nice. In the sense that it is a graphical user interface, and you can do interactive changes to your data to see when it is fair and whether, “Can I get a counterfactual? Can I change the threshold to see if it changes the bias in this subpopulation?” And how it impacts other things. For example, it might impact your accuracy if you try to change your bias threshold. So, What-If Tool is pretty good from that perspective. It is interactive and it will help you with that. Obviously, because it’s an interactive toolkit, if you have billions and billions of dataset, you won’t be able to pull all of those into this graphical user interface. But there is some strength to having a graphical toolkit.

Then the other toolkits which we looked at are AI 360 degree from IBM, and Microsoft’s Fairlearn. And these toolkits are awesome. They don’t have the interactive capability or a white box capability of Aequitas, but they have very simple libraries that you can pick and put in any of your machine learning workflow, on I guess, any notebook. In the case of Kubeflow, it’s Jupyter notebook, but you could definitely run it on Colab. And now as you are experimenting, you can see graphically using those libraries where their fairness criteria lies.

So, those are four toolkits, and all of these toolkits have strength in doing binary classification, because that’s where the machine learning fairness toolkits have started. For other areas like natural language processing and computer vision, things are evolving. So, these toolkits are adding more and more functionality into it. So, that’s an overview of the landscape that we looked at.

Srini Penchikala: Neelima, do you have any additional comments on that same topic? Any other criteria you considered for the different frameworks?

Neelima Mukiri: In terms of the toolkits, Meenakshi covered what we looked at primarily. And in terms of the criteria, one of the primary things that we were looking for was how easy is it to run these on your on-prem system versus having to put your data in a cloud. Given a lot of our customers are running their workloads On-prem and they have the data locality restrictions. That was one key thing that was important for us to understand. And all the toolkits we were able to run them on-prem in Kubeflow. Some of them are, especially What-If, is a lot easier to run directly. Go on to the website and run it in a browser, but you have to upload your data there. The other part that we looked at is the programmability or how easy is it to bundle this into a pipeline? And that’s where, I think, both Fairlearn and IBM AI 360 are easier to plug into, as well as a bunch of the TensorFlow libraries that are available for bias reduction and detection as well.

Yeah. So, the two axes which we were coming from was, how easy is it to plug in your data to it. And then where can you run it. How easy is it to run it in your machine learning pipeline versus having to have a separate process for it?

Srini Penchikala: So, once you chose the technology, Kubeflow, and also you have the model defined, and you have the requirements finalized, so the most important thing next is the data pipeline development itself, right? So, can you discuss the details of the data pipelines you are proposing as part of the solution to detect the bias, and improve the fairness of these programs, right? So, there are a few different steps you talked about in the presentation such as pre-processing, in-processing and post processing. So, can you provide more details on these steps? And also more importantly, how do you ensure fairness in every step in the data pipeline?

Neelima Mukiri: Basically, we divided the machine learning pipeline into three phases. In-processing, pre-processing, and post processing. Pre-processing is everything that comes before you start building your model. In-processing is what happens while you’re building your model, and post processing is, you’ve built your model and you’re ready to serve, is there something that you can do at that point? So, the first part, which is pre-processing is where you look at your data, analyze your data, and try to remove any biases that are present in the data. The type of biases that are better served by handling at the stage, are cases where you have a large skew in the data available for different subgroups. The example that we gave in the presentation was, let’s say, you’re trying to build a dog classifier, and you train it on one breed of dogs. It’s not going to perform very well when you try to give it a different dog breed, right?

So, that’s the place where you’re coming in with a large skew in the data available per subgroups, trying to remove it at the pre-processing phase itself. The type of biases that are more easy to remove, or better served by removing in the model building phase, are more of the quality of service improvements. So, let’s say you’re trying to train a medical algorithm to see what type of medicine or treatment regimen works best for a subset of population. You don’t really want to give everyone equal medicine or equal type of medication, you want to give them what is best serving their use case, what works well for that subset. So, you actually want to better fit the data.

And that’s where doing the bias reduction during the model training phase, which is the in-processing step, works better. And there are a bunch of techniques which are available to improve or to reduce bias in the model training stage, that we talk about in the presentation, like going through and generating an adversarial training step, where you’re trying to optimize for both accuracy as well as reducing the bias parameter that you specify.

Now, when we have trained the model, and we’ve decided on the model that we are going to use, we can still evaluate for bias and improve fairness in the post processing step. And the type of data that is really well suited for that is where you have existing bias in your data. So, the example of where you have the loan processing, where let’s say a subgroup of population is traditionally being denied loans even though they are very good at paying back the loan. So, there you can actually go and say, “Hey, maybe their income is less than this threshold, but this population has traditionally been better at paying back the loan than we’ve predicted, so let’s increase the threshold for them.” And you’re not changing the model, you’re not changing the data, you’re just increasing the threshold because you know that your prediction has been traditionally wrong.

So, that’s the post-processing step, where you can remove that kind of bias better. So, each step of the pipeline, I think it’s important to first evaluate and see, and try to remove the bias. And also try different mechanisms, because each thing works better in different scenarios.

Srini Penchikala: Meenakshi, do you have any additional comments on the data pipeline that you both worked on?

Meenakshi Kaushik: Yeah. What happens even before we have the ability to do pre-processing, in-processing or post-processing is, what do we have at hand? So, for example, sometimes we didn’t build the model, we just are consumers of the model. In which case, there isn’t much you can do other than post-processing. Or can we massage the output of the model to become fair? So, in that case, post-processing is and actually, it works very well in many scenarios.

So, it’s not that nothing is lost there, you can still change and make it more fair just by that. Now, sometimes you have access to data, you may not have access to the model. So, in addition to what Neelima is saying about going through the different phases of the pipeline, do not be afraid even if you have a limited view of your infrastructure, or how you are serving your customers. There is still an opportunity where you can massage the data, like at the pre-processing layer.

If you don’t have access to the model, but you have the ability to feed to the model the data, that’s good. You still have the ability where you can change at the pre-processing level to influence the decision. But it’s important to look at what really works. Sometimes the way I look at it is that, ideally it’s like security, you shift left, you try to change the earliest pipeline as possible. But sometimes influencing the earlier pipeline may not give you the best result. But ideally that’s what you want to do.

First, you want to fix the word so that you get perfect data. But if you cannot get the perfect data, can you massage it so that it is perfect? If that’s not possible, then you go lower in the pipeline and say, “Oh, okay, can I change my model?” At times, model changing may not be possible. Then even at the last stage, as we’ve seen in a variety of examples, it’s very good enough where the model may not be fair, but you massage the actual result which you give out to the others by changing some simple thresholds, and make your pipeline fair.

Data pipelines to detect machine learning bias and improve the fairness of ML programs 

Srini Penchikala: Very interesting. So, still I have a question on the fairness quality assurance, right? Neelima, going back to your example of loan threshold, probably increasing it, because it’s been traditionally wrong with the previous criteria. So, how do you decide that, and how do you ensure that that decision is also fair?

Neelima Mukiri: In examples like the bank loan, typically the way to evaluate fairness is, you have one set of data, let’s say from your bank, and the decisions that you have made. But let’s say, you’ve denied a loan to a person, and that person’s gone and taken a loan with another bank. You actually have real world data about how they performed with that loan. Did they pay back on time or not? So, you can actually come back and say, “Hey, that was a false negative that I said, that I don’t want to give a loan to the person, who actually is paying back on time.”

So, historic data, you can take it and see how it’s performed versus your prediction. And you can actually evaluate what is the real fairness in terms of both the accuracy. And you can easily look at fairness in terms of subpopulations by looking at the positive rates per population. But as a business coming in, you want to optimize for value. So, it’s critical to know that you’ve actually made mistakes, both in terms of accuracy, and there is bias there.

The bias is what has induced the errors in accuracy. First of all, getting that historic data, and then getting a summary of how it’s performed across these different dimensions, is the way for you to see what bias exists today. And if you improve it, is it actually going to improve your accuracy as well, and your goal of maximizing profit or whatever your goal is, right? So, tools like Aequitas and What-If, actually give you a very nice summary of the different dimensions. How is accuracy changing as you’re changing fairness? How is it changing when you’re trying to fit better to the data or when you’re trying to change thresholds?

So, I would say evaluate this, run through the system, see the data that it generates, and then decide what sort of fairness reduction that makes sense for you. Because really, it doesn’t make sense to say, “Give it to everyone.” Because you have a business to run end of the day, right? So, evaluate, see the data and then act on the data.

Standards and community guidelines to implement responsible AI

Srini Penchikala: In that example, financial organizations, they definitely want to predict from an accuracy standpoint to minimize a risk, but also they need to be more responsible and unbiased when it comes to the customers’ rights. Okay, thank you. We can now switch gears a little bit. So, let’s talk about the current state of standards. So, can you both talk about, what is the current situation in terms of standards or community guidelines? Because responsible AI is still an emerging topic. So, what are some standards in this space that the developer community can adapt in their own organizations to be consistent with fairness? So, we don’t want the fairness to be kind of different for different organizations. How can we provide a consistent standard or guideline to all the developers in their organizations?

Meenakshi Kaushik: So, let me just start by saying that, as you mentioned, fairness is still in its infancy, so everybody’s trying to figure out. And the good thing is that it’s easier to evaluate fairness, because you can look at lines from the subpopulation and see whether it is still doing the same thing as it’s doing for the overall population as a whole. Given that the easiest thing which you can do for now, which is commonly done in most of our software and even in hardware, is we have a specification. It tells you, “Oh, these are the performance, it will only accept this many packets per second. These are the number of connections it would take.” Things like that. What is a bounding limit under which you would get an expected performance?

And the model now has something called model cards, where you can give a similar specification as to how was the model built? What are the assumptions it made? This was the data it took. This is what the assumption it is making for the bounding limit under which it works, right? This is the data set that it took. For example, if you were doing some kind of medical analysis, and it took a population which is, let’s say from India, then it has a specific view of just a specific population. And if you’re trying to generalize, or if somebody’s trying to use in a generalized setting, a model card which tells you about that, then me as a consumer can be aware of that, and can say, “Aha, okay, I should expect some kind of discrepancy.” Currently, those things are not readily available when you go to take some model from open source or from anywhere, for that matter. So, that’s the first easy step that I think that can be done in the near term.

In the long term, there has to be perhaps more guidelines. Today there are different ways of mitigating fairness. There is no one step which fits all. However, adding those into the pipelines, what needs to be added is not standardized. What should be standardized is that these are the sets of things your model should run against, right?

So, if your model is doing some kind of an age group across all the age groups, then some of the protected groups should be predefined. “Oh, I should look at children versus pre-teens versus adults, and see if it is performing in the same way.” If there is some other kind of disparity, there has to be a common standard that an organization should define, depending on the space they are in. For the bank, for example, it would be based on gender differences, based on the zones that they live in, area zip code they live in, some ethnicity for example. In the case of ofcourse medical, the history is larger. So, those are the near term standards. The broader term standards, I think will take a longer time. Even within machine learning, there are no standard ways to give predictions. You can bring your own algorithms, and you can bring your own things. So, I think we’re a little far away on that.

Neelima Mukiri: Yeah. I would echo what Meenakshi said. We were surprised with the lack of any standards. The field is at it’s very infancy, right? So, it’s evolving very rapidly. There’s a lot of research going on. We are still at the phase where we are trying to define what is required versus at a state where we are able to set standards. That said, there are a lot of existing legal and society requirements that are available, in different settings, what’s the level of disparity that you can have across different populations? But again, that’s very limited to certain domains, certain use cases, maybe in things like, when you are applying for jobs or housing or giving out loans. So, there are fields where there are legal guidelines already in place. In terms of, what is the acceptable bias across different subgroups, that’s where we have some existing standards.

But then bring it on to machine learning and AI, there’s really no standards there. When we looked at all these different frameworks that are available for reducing bias, one interesting thing is that even the definition of what is bias or what is parity, is different across each of these models. Broadly, they fall into either an allocation bias or QoS or a quality of service improvement. But again, each framework comes and says, “This is the bias that I’m trying to reduce, or these are the set of biases that I allow you to optimize.” So again, it makes sense at this stage to actually look at it from multiple angles, and try out and see what works in a specific sub-domain. But as a society, we have a lot of progress to do and ways to go before we can define standards and say, “This is the allowed parity in these domains.”

Future innovations in addressing machine learning biases

Srini Penchikala: Right. Yeah. Definitely, the bias is contextual and situational, and relative, right? So, we have to take the business context into consideration to see what exactly is bias, right? Can you guys talk about what’s the future? You already mentioned a couple of gaps. So, what kind of innovations do you see happening in this space or you would like to see happen in this space?

Meenakshi Kaushik: As Neelima pointed out, we were happy to see that there was fairness. It’s easy to not define fairness, but at least evaluate fairness because it’s model generated rather than, there is some human in the loop involved, and you can’t really evaluate. So, that’s good thing. What I am excited to see is that we are continuing fairness across different domains of machine learning, so that it started with, as I said, classification problems, but it is now going more and more towards the problems which are getting increasingly deployed. Anything to do with image recognition, computer vision, for example, and it touches broad areas, from medical to, as Neelima was pointing out, autonomous driving field. So, that I’m really excited to see.

The second thing is that more and more, hopefully the model cards become the way of the future. Every model comes with what it was that was used to generate the model, and what should the expected output be, so that we all can figure out how it is done. Even for advertisements which are served to me. If I know exactly how the model was defined, it’s useful information to have. So, I’m excited to see that.

And the toolkits which are developing are also very good. Because right now, these toolkits are one-off toolkits. And when Neelima and I started looking at not only Kubeflow, but researching as to what we want to demonstrate in KubeCon, we were looking at a way of automating in our machine learning pipeline. Similar to how we generate automated hyperparameter, we wanted to automatically modify our machine learning model to now have fairness criteria built-in.

So, currently those things are not totally automated, but I think we’re very close. We could just modify some of our routines, similar to the hyperparameter tuning. Now there is a machine learning fairness tuning, so you can tune your model so that you can achieve fairness as well as achieve your business objectives. So, accuracy versus fairness is easily done. So, that’s the other area I’m excited to see that we achieve, so that it becomes in-built like hyperparameter tuning. Also do the fairness tuning for this model.

Neelima Mukiri: Yeah. To echo what Meenakshi said, we really need to have more standards that are defined, that we can use across different types of problems. We also want to see standardization in terms of defining fairness, evaluating fairness. And there’s a lot of improvement to be done in making it easy to integrate fairness into the pipeline itself. There’s work ongoing in Kubeflow, for example, to integrate evaluation of fairness into the inference side of the pipeline, post-processing. And so, we need to be able to build explainable, interpretable models and make it easy for people to build in fairness into the pipelines, so that it’s not an afterthought, it’s not coming in as someone who’s interested in making your model more fair, but it’s part of the pipeline. Just as you do training, testing, cross validation, you also need to do fairness as part of the pipeline, as part of the standard development process.

Final thoughts and wrap-up

Srini Penchikala: Yeah. Definitely, I agree with you, both of you. So, if there is one area that we can introduce fairness as another dimension, and build the solutions out of the box right from the beginning, to be fair, that area would be machine learning, right? So, thanks, Neelima. Thanks, Meenakshi.

Neelima Mukiri: Thank you for this opportunity to talk to you and talk to your readers on this very exciting topic.

Meenakshi Kaushik: Thank you so much for the opportunity. It was fun chatting with you, Srini. Thank you.

Srini Penchikala: Thank you very much for joining this podcast. Again, it’s been great to discuss this emerging topic, and a very important topic in the machine learning space, how to make the programs more fair and unbiased, right? So, as we use more and more machine learning programs in our applications, and as we depend on machines to make decisions in different situations, it’s very important to make sure there is no unfairness as much as possible. To one demographic group or another. So, to our listeners, thank you for listening to this podcast. If you would like to learn more about machine learning and deep learning topics, check out the AI/ML and the data engineering community page on infoq.com website. I encourage you to listen to recent podcasts, and check out the articles and news items my team has posted on the website. Thank you.

QCon brings together the world’s most innovative senior software engineers across multiple domains to share their real-world implementation of emerging trends and practices.
Find practical inspiration (not product pitches) from software leaders deep in the trenches creating software, scaling architectures and fine-tuning their technical leadership
to help you make the right decisions. Save your spot now!

.
From this page you also have access to our recorded show notes. They all have clickable links that will take you directly to that part of the audio.

Subscribe for MMS Newsletter

By signing up, you will receive updates about our latest information.

  • This field is for validation purposes and should be left unchanged.


Presentation: Data Mesh: An Architectural Deep Dive

MMS Founder
MMS Zhamak Dehghani

Article originally posted on InfoQ. Visit InfoQ

Transcript

Dehghani: My name is Zhamak. I head the emerging technologies at ThoughtWorks in North America. About three years ago, my colleagues and I came up with a new approach in managing analytical data. It’s called data mesh. Data mesh was born out of sheer frustration for not getting results from building yet another data platform, yet another data lake, yet another data warehouse. It’s been three years since then, we have learned quite a bit. What I want to share with you is a deep dive into an architectural aspect of data mesh. If you’re getting started, hopefully you take some lessons away, how to think about architecture, how to break up that architecture. I’m going to get started in terms of the technology that you need to deploy.

Analytical Data

Data mesh is a paradigm shift in managing and accessing analytical data at scale. Some of the words I highlighted here are really important, first of all, is the shift. I will justify why that’s the case. Second is an analytical data solution. The word scale really matters here. What do we mean by analytical data? Analytical data is an aggregation of the data that gets generated running the business. It’s the data that fuels our machine learning models. It’s the data that fuels our reports, and the data that gives us an historical perspective. We can look backward and see how our business or services or products have been performing, and then be able to look forward and be able to predict, what is the next thing that a customer wants? Make recommendations and personalizations. All of those machine learning models can be fueled by analytical data.

Today’s Great Divide of Data

What does it look like? Today we are in this world with a great divide of data. The operational data is the data that sits in the databases of your applications, your legacy systems, microservices, and they keep the current state. They keep the current state of your stateful workloads. Analytical data sits in your data plane, it sits in your data warehouse, data lake. It gives you a historical perspective. It gives you an aggregated perspective. You can correlate all sorts of data from across your business, and getting a historical perspective, and be able to train your machine learning. That’s just the definition and the current state of the universe.

Scale

What do we mean by scale? I think this really matters because this is where we’re really challenged. If you deal with an organization that’s constantly changing, and I argue that any organization today is constantly changing, your data landscape is constantly changing. The number of data sources that are pouring data into this analytical view of the world, constantly increasing the diversity of those changes, every new touchpoint with your customer is generating data. Every new touchpoint with your partners are generating data. You have these ambitious plans and proliferation of use cases to use that data, then you have a scale at the consumer side as well. There’s a diversity of transformation. There’s not just one or two or three pipelines, there are a ton of the transformations that need to happen to satisfy very different diverse use cases. The speed to that change needs to increase. Data mesh tries to address these concerns.

Paradigm Shift

Why is this a paradigm shift? Because it challenges assumptions that we have made for over half a century. It challenges the assumption that the architecture of analytical data like the ones that you saw with the lake and the warehouse, or lakehouse, or any form of those existing architectures, it has to be monolithic. It challenges the fact that data has to be centralized under the control of a centralized team so that we can get meaningful use out of it. It challenges the fact that if you’re struggling with scale, the only way forward is dividing your architecture based on this technical partitioning, dealing with pipelines by one team, dealing with ingestion in another team, dealing with technologies that help you with serving the APIs and data services on one or the other end. These are all technical decompositions. It challenges the fact that we have designed our organizations around these technical tasks, separated data platform team or separated data lake team or data warehouse team, from these business domains where the data gets used or consumed. The reason for that is the scale, the problem of scale will not be addressed, when the number of sources grows, and the number of use cases grows with this centralized solution. As Thomas Kuhn who coined the term paradigm shift says, we can’t just reject one paradigm without coming with another one.

The Four Principles

Here’s the solution around data mesh. At a 50,000-foot view, this is a new way of thinking about how to decouple your architecture. How you decouple your data. How you decouple your data ownership around domains. Folks who are familiar with domain driven design, this should be fairly natural and organic to you. Because you probably have already broken down your microservices around domains, your ownership of those services, your ownership of the technology mapping your current business, around the domain of your business. Why can’t we just do that for analytical data? Why can’t we just say, let’s actually break down the boundaries of that monolithic lake, based on the domains? The moment you do that, you’ll find yourself in a lot of trouble. The very first thing is the silos of data, now that we have 50 different lakes instead of one, and all siloed from each other and don’t talk to each other. One of the architectural and principles of data mesh is that now we’re looking at data from a very different lens, we call it data products. Data shall be served as a product, which means it has some form of a baseline usability built into it. The behavior that makes the data usable, and interoperable with the rest of the data products and domains, make it natively accessible to the user. If a user comes and finds your domain, whether that user is an analyst or a data scientist, they have a very different access mode. Their access mode will be satisfied right then and there accessing the data and the domain.

The other problem you find yourself with is the cost of infrastructure suddenly grows. Every team needs to maintain its own very complex infrastructure to run those data pipelines, to serve those data products. We need to think about our infrastructure, the data infrastructure in a self-serve way a bit differently. We need to create new logical layers of infrastructure that really makes it easy to build and serve and discover these data products. Finally, in a distributed architecture, we need to think about how we are pragmatically going to make these data products interoperable, making them secure, not compromising privacy. How can we do all of those things when now we’ve decoupled these data products across the enterprise and smeared them all over the place?

This computation of federated governance is the fourth pillar. Each of these pillars have an architectural implication. At a very high level, the way you can imagine this architecture at an enterprise landscape is through this lens that while you still have this operational data plane, and analytical data plane, which is storing and providing access for a very different kind of data. One, the data on the inside. The data that is stored in the microservices databases. The other is the data on the outside. Data products that expose your data and your historical data over time, an infinite log of your data, an infinite temporal of your snapshots, however you want to expose them is now both combined together across your domains. One domain team, and I’m going to use the example of the digital media streaming, your podcast release team, or your artist payment team, or artist management team, these teams will not only take care of the applications, but also the analytical data.

Data Mesh

To do that we have this multi-plane analytical data platform. I’ll go through some of the capabilities to get a feel of, what are these new platforms that we have to build to self-serve these domains, to build these data products? We will have a set of global governance and policies in terms of how are we going to control access to each individual data product at every domain? How are we going to make the schema language consistent across this domain? How are we going to make the IDs of the policies, the data that goes across these domains consistent so we can join them? There’s a bunch of global policies that we’re going to automate and embed into each of these data products, so that practically, we can still have an ecosystem that nicely plays with each other.

Data mesh, at a very high level, is both an organizational and architectural and really a technical shift, technology shift. Let’s focus on the architecture. The moment we distribute our data across the boundaries of domains, and apply the strategic design patterns that was introduced with Eric Evans, right at the end of his seminal book on this, we find ourselves with a very different view of the world. We start pulling that big monolithic data apart around the functions of our business, not the technical functions of our monolithic data warehouse or data lake. Then, every domain, every bounded context, with its team, with its systems has two lollipops. These lollipops, I use them as an indication for interfaces to that domain, capabilities and APIs or whatever mode of interface you want to provide to the rest of the organization. Now we have two interfaces, two set of capabilities. One is I use a notation, a lollipop with an O in it, your operational capabilities, your APIs, release a podcast. Then, you have this D lollipop, which gives an interface into your domain, an access point, an endpoint in your domain, for sharing the analytical data from that domain.

Let’s look at an example. You have various domains here. You have your user management domain that the people who build the microservices or services applications that are registering our users, the subscribers to our podcasts, and media streaming playlist. That domain will have an API or a system that registers users. Now, we’ll have something else. We’ll have a log of user updates forever and ever. Maybe we want to publish this user updates on a daily batch basis, or event basis. Those are all your choices that makes sense for your organization for that domain. Let’s look at a podcast. We have a podcast domain, probably has a podcast player in it. The applications that actually people run, podcast microservice, whatever application you have. You have the operation of create podcast, and release podcast episodes. In addition to that, having an access to podcast listeners’ demographic is a pretty good analytical data. To get that demographic, then you will have to probably use data from your users. There’s a flow of the data still within the domain, serving data and consuming data between the domains. Or, top podcast daily. What are my top 10 daily podcasts? These are some of the analytical data that now the domain provides.

A New Architectural Quantum

If you think about data as a product, now we have this hexagon that I casually drop all over my diagrams, we have this data product as a new architectural quantum, as a new unit of architecture. Architectural quantum is a phrase that evolutionary architecture introduces essentially. It’s the smallest unit of your architecture that has all of the structural components to do its job. In this case, I need all of the structural components, the code, the storage, the services, the transformation logic, everything so that I can provide that top daily podcast to the rest of the organization. It’s the smallest unit of your architecture. What does it look like? If you zoom into that domain, now you’ve got your microservices legacy application, that box that implements your O lollipop, your operational APIs. Adjacent to it, you have one or many data products within that domain that have two very key interfaces. One interface is input data ports. The port is a set of APIs and mechanisms, by which you will be receiving and pulling data, or receiving data, however you have implemented that API, to get data into that data product. Then the operation data product operates on that data, and then provided as an analytical data on its one or many output data ports. That essentially becomes the interface for serving that data, and its metadata.

Let’s zoom into that previous example. That user domain has user registration service right next to it. It has perhaps a user profile data product. This user profile data product is just receiving data from user registration service, but it serves two different data product output ports. It gives you your near real-time user profile updates and user registration updates, as events. Because of the consumer and the needs that the organization had, it also provides monthly dumps of the user profiles that it has. What are the user profiles of users that we have this month versus last month versus the month before? Let’s zoom in to that podcast listeners’ demographic data product. It has two input data ports. It gets the podcast player information to say, who’s listening to the podcast? It gets the demographic information, the age, the location, the geographical location, all that demographic information from the user profiles, and just transform them and provides them as podcast listeners demographic. Marketing would love this data. They can do so many things, knowing the demographic of people listening to different podcasts.

What is this hexagon? What is this data product? If you look outside in before jumping into the mechanical bits in it, it’s a thing that needs to have these characteristics to be a product. It has to be easily discoverable independently, easily understandable, have good documentation. Has to be addressable, independently. Interoperable with the other ones. Trustworthy. Has to have SLOs, so it can tell us in what intervals the data gets produced. What’s the variance between when the data was captured versus the data was processed? What’s the statistical shape of it? All of those SLOs that it needs to guarantee for somebody to actually trust this data. It has to have this polyglot form of providing the data because it has to be natively deriving or providing access for a very large spectrum of mode of access. Data analysts use spreadsheets, so they want to hook their spreadsheets or data warehouse or reporting system into a SQL engine, or some ports that can run SQL queries locally within this data product. Maybe data scientists on the other hand, they’re totally happy with semi-structured JSON files or parquet files dumped on some Blob Storage, as long as they can traverse that storage across time. They have a very different mode of access, so it’s very important the same data is presented and accessed in a polyglot format.

Data Product’s Structural Elements

Then this data product, this quantum, the structural elements, definitely needs code in it. It needs code in it for the transformation. It needs code in it for serving the data. It needs code in it for embedding the policies. Who can access this data product? What anonymization we need. What encryption we need. A ton of other transformations that make sense to analytical use cases and not so much makes sense to microservices APIs? Do we need to apply some differential privacy? Do we need to give some form of synthetic data that represent the statistical nature of this data, but not the data itself? Can I see the forest but not the trees? These are policies that can be embedded and executed by the data product here locally. Of course, we need the data itself. What’s in that data? We have the polyglot data access, but we have a bunch of metadata.

We have the SLOs. We have the schema. We have the syntax and semantics. We have the documentation. Hopefully, you have computational documentation. We have the policy configurations. We need to depend on an independent segmentation of our infrastructure. If each of these data products autonomously can be deployed, and serve their data, then the physical infrastructure needs to give a way of providing a logical decomposition of infrastructure. They don’t own the infrastructure necessarily, but they are able to utilize a sharded, separated infrastructure for just that data product without impacting others.

Multi-Plane Data Platform

Then, if you think about, now, all of the infrastructure and the platform that needs to field building these hexagons and serving them and discover them, what does that look like? Our thinking so far is that we need three different categories, I’ll call them planes, different classes of eight programmatically served capabilities at the infrastructure level. Down at the bottom of the infrastructure, you have this already. You’ve got what I call a utility plane. You have a way of orchestrating and provisioning your Spark job, if you’re using Spark for transformation. You have a way of getting a Kafka topic. You have a way of provisioning service accounts and storage accounts. These utility layer capabilities already exist and their APIs, and hopefully, you have written your infrastructure as code for someone to call an API or a command, or send down a spec to an API to provision them. That’s not enough. If you think about the journey of the data product developer, from the moment that they say, “I think I need an emerging artist data product, because I want to know what the emerging artists listen to.” Or, “I want to run a machine learning model across a bunch of data from different sources, from the social platforms for my players to see, who are the emerging artists?” They go through this value stream. They go from mocking their data product, maybe work with synthetic data first. Try to source and connect to upstream data products, or microservices, or whatever they need to get their data in. They need to explore that. Once they think they have a case, then they’ve got to bootstrap building this data production really, and deploy it. Then maintain it over time, and hopefully, one day retire it.

To do that without friction with this decentralized model that anybody and every generous developer can do that, we have to create this new plane, new set of APIs that allows provisioning the management, and manage the lifecycle of data products. Declaratively create them. Be able to read data products, the version. Then now we’re talking about this hexagon, this unit of architecture that independently can be declared and provisioned, not all of the bits and pieces we need. That’s great.

Once we’ve done that, there is a ton of capabilities that only makes sense to be provided to the users of this interconnected mesh of data products at the global level without centralization. When you think about your architecture, every time you find yourself building something that totally centralizes, and without it, you are not able to use the data products, something’s gotten wrong with the architecture. We’re back to that centralization again. I’ll give an example here. Knowledge graphs are the new hot thing. What are they? A way of semantically connecting these different data points together and be able to traverse this graph of relationship of data. An emerging artist is an artist that gets paid this amount, and these are our emerging artists that emerged last month. This graph of relationships. How do we do that, if all of these polyglot data separated? We need to have a schema system that each local data product defines the schema that has semantic links to other data products that they relate to. Once you have the emergence of that mesh of schemas that are linked together, you need a tool at the global level to be able to browse and search those things. That’s this mesh experience plane that you’ve got to build.

When you think about that federation, the governance, what happens to these policies over time? You probably started with just access control, and over time, you will keep adding to them. First of all, they need to be programmatically built, automated, built into the platform and then embedded as a sidecar, as a library, whatever makes sense to your architecture, and accessed by every data product. An analogous situation of this is the sidecars that we use in service mesh in the operational world, that embed routing policies and failover policies, and all those discoverability policies, all those policies that make sense in microservices APIs. We have an analogous situation here that we need a way of injecting these policies and executing them at the level of granularity of every single data product.

Data Product Container

Then, with that hope, what other architectural components gets packed within this logical, autonomous units of data product? Input/output ports need to have, first of all, a standard. You have to have a set of standards that then you can connect these things together. You might say, for my organization, I will provide SQL access, and this is the API for running your SQL queries on my data product, or this is the interface for getting access to the Blob Storage.

These are the two types that I only provide. Or, this is the way to get to the event. You need to have just standard APIs for exposing your data and also consuming the data. Then, we are now introducing a new port, a new set of APIs. I call them control ports. A way of configuring, whether you are configuring these policies centrally and pushing them down, cache them in every data product, or just configuring and authoring them locally within those domains, you need to have a way of receiving and changing these policies and then executing them. You have a policy engine. You have a data product container that contains this new logical thing that has all of those components in it, your code, your storage, the APIs access to it.

Let’s bring it all together. We’ve got domains now. Each domain has one or many applications, and there might be services, there might be apps, and one or many data products that are running on your infrastructure. We need some form of a container. I’ll talk to how we are implementing that. Each of those data products have input ports and output ports. The data is the only thing that flows between the two, no computation happens in between. The computation happens within the data products. In that computation, it could be the transformation. You might want to have a pipeline in there, that’s your choice, or you want to have actually a microservice in there to do the transformation. It’s your choice, however you implement it. You have a lot of other things. You’ve got the data itself. You’ve got the schema. You’ve got the documentation. You’ve got your other services probably running in there.

When you think about your mesh, there are three layers of interconnectivity on the mesh. The first layer is the flow of the data, between input ports and output ports. It’s your lineage essentially. The second layer is the links between your schemas, the semantic links between the schema so that you can join this thing. You need to use a schema language that allows a standard way of semantically linking these fields together. The third layer is the link of the data. The data itself. This listener is this user. If this is a listener versus the user domains. You have a different port, I’ll call it the discoverability or discovery port, which is essentially your root API. If I access the pink data product here, the root API, the root aggregate for that should give me all information so that I can discover what is in it. What are the output ports? What are the SLOs? Where are the documentations? Where do I actually get the data? How do I authorize myself? The control port, which is the set of APIs for configuring the policies, essentially. Maybe some highly privileged APIs that you want to embed there that only governance folks can execute.

What is the platform? We talked about three planes. The plane that I think is really important to focus on, if you get started, is this data product experience plane. You might have an endpoint, your lollipops, your API data product deployments that you declaratively deploy your data products. Then that plane uses the utility plane, provision the Blob Storage, because my output port happened to use Blobs. On the mesh experience plane at the top, you have mesh level functions, search and browse. Those are the APIs that you probably want to expose up there.

Blueprint Approach

How do we go about architecting this thing? Here, I want to share three tips with you. Start with your data product. Start with designing, what is this data product? I have given you some pointers as to how to think about it. What are its components and elements? Once you start from there, then you can figure out, if I want to provide all those usability characteristics, and I want to delight the experience of my data analysts and any form of data user, then what are those APIs for my platform that I need to provide? Then you figure out the implementation. We’ve been around this block for three times already. We’ve built three, four different versions of this experience plane. Start very primitive with a bunch of scripts and templates that we can use, and now we’re very sophisticated. I show you a spec file. We pass a spec file to an API in a command. Go tap like inverted model of purpose, inverted hierarchy of purpose from your data products, and then design that.

Learn one or two lessons from complex systems. Think about how the birds fly. The birds don’t have an orchestrator or a central system who manages them. They just have local rules. I follow the leader and I don’t run into the other birds. That’s all I need to know. Then from the interaction of all the birds following those local rules, emerges this beautifully complex flock of birds that fly long distances. Apply the same complex adaptive system thinking. You don’t need a highly elaborate schema that covers all of the domains, but you need a standard schema, locally defined, a standard model of defining the data globally, but locally defined schema, and then connect them.

Affordances

Thinking about the affordances. What are the capabilities I need to provide by a single data product? It needs a service data. It’s secured locally. It’s a decentralized model. These capabilities should be your affordances, should be available to the users locally. You manage the lifecycle. I need to be able to independently deploy this thing without breaking all the other native products, without upsetting these users, over time. We talked about, I need to pass the code into this lifecycle management. I need to pass a model spec that says, what is this data product, because we want to abstract away complexity of how to configure it? We want to have a declarative way. This is a few lines of a much longer model spec for a data product that says, what are these input ports and output ports? What are some of the policies that it has? Check this code in your repo of that data product. Pass it to your experience plane API, that platform of choice. I’ve borrowed this diagram from Eric’s team that uses Azure for implementation. Then pass that to utility plane to configure all of the bits and pieces that you need. It’s going to be a little bit ugly, at the beginning. It is already because we don’t have that beautiful, simple, autonomous unit that you can deploy and does all of the things that I talked about.

When we started with microservices, we had one thing, we had a Unix process, this beautiful Unix process. Then we had Docker containers, or other form of containers that encapsulated our autonomous units of architecture. We don’t have that in this big data. Our big data, autonomous units of architecture data product looks a little bit like Frankenstein creations, because you have to stitch in a lot of different technologies that weren’t meant to be decentralized. It is possible. I’m hopeful that the technology will catch up.

Let’s look at one of those other affordances. Let’s say you’re designing your app reports. You’re designing serving the data. To me, these characteristics of analytical data that serve is non-negotiable. Your data needs to have some multimodal access. If not, all of your data gets dumped into some warehouse somewhere else. We’re back to centralization. I highly recommend, design your APIs to be only read-only, and be temporal time variant. If it’s not, then somebody else has to turn it into a temporal time-series data. You want that to happen locally here. Most importantly, immutable. The last thing you want is some data for a snapshot point in time changes and the cascading effect. The read function of this output port is always function of time. From what time to what time. We can talk about how to deal with time windowing and temporal information. I think we’re lacking standardization in this field.

Once you decide what data, what time, and what duration you want data, then you have a choice of saying, actually, I want this data in the form of graphs, or semi-structured files, or events. This is in spatial topology. Polyglot data means that you might have many different spatial topologies of that time variant data. If you have an app developer who is looking at the same data, like media streaming usage, or play events, they may want to Pub/Sub access to an infinite log of events. Your report guys probably don’t want that, they want a SQL access so they can run some of their queries here locally on the data product. Your data scientists want something else, to do visual design. They want a columnar file access. The output port definition and design of these output ports would need to satisfy these affordances: serving temporal data, time-variant data, polyglot data for the same core semantic of information.

Resources

If you want to learn more, I’ll put a few references here. My semi-recent article on Martin Fowler gives you a flavor of the architecture discussion we have here. There is a wonderful community. This is a grassroots movement, started around data mesh learning. Please join. I will have deep dive tutorials, certified multi-day tutorials, hosted by DDD Europe. That all goes deeper into the design of each of these architectural aspects.

When Not To Use Data Mesh

Nardon: When is it not a good idea to use data mesh?

Dehghani: In a distributed system, in a decentralized system. A data mesh comes with complexity, managing these independent units, and just have harmony. If you don’t have the pain points that I mentioned, the scale, the scale of diversity of sources, scale of the origins of the sources, scale of those very diverse domains and use cases, multiple teams that need to be autonomously responsible, if you don’t have that, it seems a hell of an over-engineering to build the data mesh. Really, the motivation for data mesh were around organizations that are growing fast. They have many domains, many touchpoints, and great aspirations. Hopefully, your organization gets there someday. If you’re not there yet, I don’t think it makes sense.

It’s a matter of also when. Right now we’re really in the innovator, adopter, or the lead adopter curve of adopting something new. The technology that you can just buy off the shelf or open source use is limited. You still have to do quite a bit of engineering to stitch what exists today in this distributed mesh. If you don’t have the engineering capacity, bandwidth, or just your organization likes to buy things and integrate, perhaps right now is not the right time, even though you might have the scale.

See more presentations with transcripts

Subscribe for MMS Newsletter

By signing up, you will receive updates about our latest information.

  • This field is for validation purposes and should be left unchanged.


PyTorch 1.10 Release Includes CUDA Graphs APIs, Compiler Improvements, and Android NNAPI Support

MMS Founder
MMS Anthony Alford

Article originally posted on InfoQ. Visit InfoQ

PyTorch, Facebook’s open-source deep-learning framework, announced the release of version 1.10 which includes an integration with CUDA Graphs APIs and JIT compiler updates to increase CPU performance, as well as beta support for the Android Neural Networks API (NNAPI). New versions of domain-specific libraries TorchVision and TorchAudio were also released.

The PyTorch team highlighted the major features of the release in a recent blog post. The new release moves several distributed training features from beta status to stable, as well as the FX module and the torch.special module. The release includes several updates for improved CPU performance, including an integration with CUDA Graphs APIs to reduce CPU overheads and an LLVM-based JIT compiler that can fuse multiple operations. Support for Android NNAPI has moved from prototype to stable, including the ability to run models on a host for test purposes. The release also includes TorchX, a new SDK for faster production deployment of deep-learning applications. Overall, version 1.10 contains more than 3,400 commits from 426 contributors since the 1.9 release.

Prototype support for Android NNAPI, which allows Android apps to use hardware accelerators such as GPUs and Neural Processing Units (NPUs), was added last year. The new release moves the feature to beta and adds several capabilities, including coverage of more operations, load-time flexible tensor shapes, and the ability to test a model on a mobile host. Another beta feature in the release is the CUDA Graphs APIs integration. CUDA Graphs improves runtime performance for CPU-bound workloads by capturing and replaying a stream of work sent to a GPU; this trades off the flexibility of dynamic execution in exchange for skipping setup and dispatch of work, reducing the overhead on the CPU.

This release moves several features of distributed training from beta to stable, including: the Remote module, which provides transparent RPC for subclasses of nn.Module; DDP Communication Hook, which allows overriding how distributed data parallel communicates gradients across processes; and ZeroRedundancyOptimizer, which reduces the amount of memory required during training. The torch.special module, which provides APIs and functions similar to the SciPy special module, also moves to stable.

The FX module, a “Pythonic platform for transforming and lowering PyTorch programs,” has moved from beta to stable. Its three main components are a symbolic tracer, an intermediate representation, and a Python code generator. These components allow developers to convert a Module-subclass to a Graph representation, modify the Graph in code, then convert the new Graph to Python source code that is automatically compatible with the existing PyTorch eager-execution system. The goal of FX is to allow developers to write custom transformations of their own custom code; for example, to perform operator fusion or to insert instrumentation. The latest release moves the module from beta to stable status.

In a discussion about the release on Hacker News, one user speculated that features such as FX indicate that PyTorch is becoming more like JAX. Horace He, a PyTorch developer, replied:

FX is more of a toolkit for writing transforms over your FX modules than “moving in JAX’s direction” (although there are certainly some similarities!) It’s not totally clear what “JAX’s direction” means to you, but I’d consider its defining characteristics as 1. composable transformations, and 2. a functional way of programming (related to its function transformations). I’d say that Pytorch is moving towards the first but not the second.

JAX was open-sourced by Google in 2019 and is described as a library for “high-performance machine learning research.” Google’s main deep-learning framework, TensorFlow, is the chief rival of PyTorch, and released version 2.6 earlier this year.

The PyTorch 1.10 release notes and code are available on GitHub.

Subscribe for MMS Newsletter

By signing up, you will receive updates about our latest information.

  • This field is for validation purposes and should be left unchanged.


Got $3,000? 3 Growth Stocks to Buy That Could Skyrocket | The Motley Fool

MMS Founder
MMS RSS

Posted on mongodb google news. Visit mongodb google news

Buy low, sell high. It takes money to make money. If you don’t have a lot of cash, these adages could convince you not to invest right now with stock market valuations at a premium.

However, there are still stocks that have tremendous growth potential. And you don’t need a huge amount of money to buy them. If you’ve got $3,000, here are three growth stocks to buy that could skyrocket.

Hand drawing a rocket with Ben Franklin in it.

Image source: Getty Images.

1. Fiverr

COVID-19 opened the door for many individuals and businesses to rethink how they work. This seems likely to accelerate the already growing increase of freelancing. Few companies are as well-positioned to profit from this trend as Fiverr (NYSE:FVRR).

Fiverr’s online platform matches freelancers with buyers of digital services. The company charges a 5.5% service fee to buyers and takes one-fifth of the transaction amount charged by the freelancer. 

This business model has worked really well for Fiverr. The company’s revenue has soared more than 260% over the past five years. Fiverr delivered exceptionally strong third-quarter results with revenue jumping 42% year over year. 

Even better, Fiverr still has a massive growth opportunity ahead. The company estimates that its addressable market stands at close to $115 billion annually. Fiverr currently claims only a fraction of a percent of this market. This stock could easily be a five-bagger or more by the end of the decade.

2. MongoDB

Every time you do anything on the internet, it creates more data. And that data has to be stored somewhere. Increasingly more of the data is stored in the cloud. MongoDB (NASDAQ:MDB) offers what is arguably the best cloud-based database platform around.

Sales for that platform — Atlas — skyrocketed 83% year over year in the second quarter. MongoDB continues to roll out new features that attract more customers. The company’s CEO, Dev Ittycheria, wasn’t exaggerating when he said in September, “It is becoming increasingly clear that the fastest and most compelling way to build modern applications is to use MongoDB.” 

The main knock against MongoDB is that its shares are expensive, trading at nearly 50 times sales. By comparison, none of the so-called FAANG stocks have price-to-sales multiples above 11, with most of them in the single digits. 

MongoDB has much better growth prospects than the FAANG stocks, though. The cloud database market is projected to increase by a compound annual growth rate of 14.8% through 2028. But MongoDB is growing much faster than its rivals with no end in sight to its opportunities.

3. Trupanion

For many Americans, their pets are a part of their family. But veterinary costs continue to rise. That’s where Trupanion (NASDAQ:TRUP) offers great value with its medical insurance for cats and dogs.

Trupanion ranks as the leader in the pet medical insurance market. It had over 1.1 million enrolled pets as of Sept. 30, 2021. The company delivered 40% year-over-year revenue growth in the third quarter. 

Probably the biggest competitive advantage for Trupanion is its relationships with veterinarians. It’s the only company that provides software that allows veterinarians to receive payment for services within minutes after checkout.

Trupanion has solid near-term growth drivers, especially with Aflac offering its pet medical insurance to employers in 2022. Its long-term growth opportunity is even greater. Only 1% and 2% of pets in the U.S. and Canada, respectively, are covered by insurance. Trupanion should be a big winner as it penetrates more of this market.

This article represents the opinion of the writer, who may disagree with the “official” recommendation position of a Motley Fool premium advisory service. We’re motley! Questioning an investing thesis — even one of our own — helps us all think critically about investing and make decisions that help us become smarter, happier, and richer.

Article originally posted on mongodb google news. Visit mongodb google news

Subscribe for MMS Newsletter

By signing up, you will receive updates about our latest information.

  • This field is for validation purposes and should be left unchanged.


NoSQL Database Standards Market Size Estimation – 2021, By Economic Growth Factors …

MMS Founder
MMS RSS

Posted on nosqlgooglealerts. Visit nosqlgooglealerts

Maximize Market Research (bhosale1746) - Profile | Pinterest

NoSQL Database Standards Market Overview

The global NoSQL Database market report is a comprehensive analysis of the industry, market, and key players. The report has covered the market by demand and supply-side by segments. The global NoSQL Database report also provides trends by market segments, technology, and investment with a competitive landscape.

NoSQL Database Standards Market definition:

Maximize Market Research provides a brief description of the NoSQL Database Standards  to provide a current snapshot of the market to understand its feature in an optimized way. Our summary reports profiling the key criteria of the NoSQL Database Standards Market to assist in further marketing activities. The report provides a market overview by standard topics which are highlighted with customized data according to the necessity. This overview helps in decisions on how to approach the market and understand the context of the industry.

NoSQL Database Standards Market Scope:

Global NoSQL Database Market, by Region

• North America
• Europe
• Asia Pacific
• Middle East & Africa
• South America

Market Trends:

The report provides trends that are most dominant in the NoSQL Database Standards  market and how these trends will influence new business investments and development overall. The market trend has a direct impact on the dynamics of the industry, including New Technology, Foreign Entry, New Regulations, Government Spending, New Applications, etc. A detailed analysis of trends is provided in this report to help you in your business decisions in NoSQL Database Standards  market.

Request for Sample:https://www.maximizemarketresearch.com/request-sample/97836

NoSQL Database Standards  Market Dynamics:

The report provides Drivers, Restraints, Opportunities, and Challenges in the NoSQL Database Standards  market. The report helps in identifying the drivers that are driving the market growth and finding ways to use these drivers as strengths. Restraints can help you identify characteristics that are restraining the NoSQL Database Standards  market and help you identify them and minimize or improve them before they become a problem. Opportunities are created by external factors like changes in the market and new consumer trends. This will help you to understand factors that will influence your ability to take advantage of the opportunities. Challenges could create barriers for your business. The report helps to identify challenges and ways to encounter them depending on the market scenario.

Porters Five Force Model:

The report provided by Maximize Market Research provides Porter’s Five Force model which helps you in designing the business strategies. The report helps you in identifying how many rivals, who are they that you have to compete with and how their product quality compared to yours in NoSQL Database Standards  market. It helps in determining the number of potential suppliers, how unique products they provide, and how expensive is to switch from one to another.

The report gives you insights on how many buyers you have, would it be convenient for them to switch from your product, and are buyers have held enough to dictate terms in NoSQL Database Standards  market. Analyzing substitutes is very crucial because it can weaken your position and threaten your profitability. Our report assists in identifying these challenges to tackle them swiftly. The report also analyses if the NoSQL Database Standards  market is easy for a new player to gain a foothold in the market, do they enter or exit the market regularly if the market is dominated by a few players, etc.

Covid 19:

Covid 19 is a global pandemic and public health emergency that has affected almost every industry, effects of this pandemic are expected to impact the industry growth through the forecast period. Our research report extends our research framework to ensure that it includes the underlying COVID 19 issues and potential paths to the future. The report provides insights on COVID 19 taking into account consumer and demand changes, purchasing behavior, supply chain transformation, current market power dynamics, and significant government intervention. The updated study provides insights, analysis, estimates, and forecasts that take into account the impact of COVID 19 on the NoSQL Database Standards  market.

NoSQL Database Standards  Market Segmentation:

Since Big Data is in an unstructured format, it is more difficult to manage complex queries and evaluate them than relational databases, which could stifle market development. Future technological advances addressing testing and complexity issues, on the other hand, could alter market dynamics during the forecast period. Consequently, the growing use of NoSQL databases in social networking and online gaming applications provides lucrative opportunities.

Get more Report Details:https://www.maximizemarketresearch.com/market-report/global-NoSQL Database-standards-market/97836/

NoSQL Database Standards Market Key players:
• DynamoDB
• ObjectLabs Corporation
• Skyll
• InfiniteGraph
• Oracle
• MapR Technologies, Inc.
• The Apache Software Foundation
• Couchbase
• Basho Technologies
• Aerospike Inc.
• IBM Corporation
• MarkLogic Corporation
• Neo technology Inc.

Regional:

Regional Insights provided in our reports ensure that you are better informed about the NoSQL Database Standards  market region-wise. Understanding local energy, economic, political, and geographic conditions is crucial for any realistic analysis of viable policy options that relate to global change in the NoSQL Database Standards  market. Local development plans are often affected by the performance of NoSQL Database Standards  market. Our report allows us to produce regional analyses while ensuring that impacts from NoSQL Database Standards  market and local development in other regions are adequately represented. We conduct a wide range of high-resolution regional studies at Region, Country wise.

About Us:

MAXIMIZE MARKET RESEARCH PVT. LTD.

3rd Floor, Navale IT Park Phase 2,

Pune Bangalore Highway,

Narhe, Pune, Maharashtra 411041, India.

Email:[email protected]

Phone No.: +91 20 6630 3320

Website:www.maximizemarketresearch.com

 Related Report Link:

https://www.marketwatch.com/press-release/contact-center-software-market-growth-worldwide-business-overview-by-top-manufacturers-and-sales-revenue-forecast-to-2026-2021-11-23?tesla=y

https://www.marketwatch.com/press-release/bitcoin-technology-market-industry-analysis-size-share-revenue-prominent-players-developing-technologies-tendencies-and-forecast-to-2026-with-dominant-sectors-2021-11-23?tesla=y

https://www.marketwatch.com/press-release/mobile-learning-market-overview-opportunities-analysis-of-features-benefits-manufacturing-cost-and-forecast-2026-2021-11-23?tesla=y 

.

Subscribe for MMS Newsletter

By signing up, you will receive updates about our latest information.

  • This field is for validation purposes and should be left unchanged.


7 Innovative Examples of XR Technologies in the Healthcare Industry

MMS Founder
MMS RSS

Article originally posted on Data Science Central. Visit Data Science Central

VR & AR technologies will transform health care in the future, given their rapid and relevant development. They make the clinical experience of patients more immersive.

Artificial intelligence has matured into a fundamental technology used in a variety of fields such as robotics, computer vision, and natural language processing. It is pretty obvious that this progress would interfere with the health care sector, which needs constant improvement. Consequently, the combination of AI and XR provides extensive biotechnology applications and enables digital interaction with the physical environment in a multi-dimensional way.

Near-Infrared Vein Finder

This vein visualizer allows nurses to look at the patient’s arm and identify veins’ paths using a phone camera and the necessary app, e.g., AccuVein. Some people have thick skin, so it becomes easier for specialists to inject something or take blood.

Virtual Relaxation

Guided meditation, yoga, and other types of breath, mind, and body relaxation are gaining momentum. Lots of people are stressed, and VR can take them into a different world ─ free from triggers, worries, and irritants. 

Medical Diagnosis

Augmented and virtual realities are now used to diagnose medical conditions. For instance, VR enables healthcare specialists to look for vision impairment. Additionally, it can diagnose certain mental health conditions.

Cancer Diagnosis

Powerful tools are being developed with the potential to redefine cancer detection, diagnosis, and treatment. Thus, detecting cancer through recognition can help doctors in patients’ diagnosis with real-time screening analysis. For example, an AR microscope with real-time AI integration for cancer diagnosis overlays AI-based information onto the current view of the sample in real-time.

Virtual Therapy

We can now use VR & AR for therapy as well. For example, there are specific ways now to treat post-traumatic stress disorder (PTSD). It happens by exposing patients to situations complicated to organize in a clinical setting. Thus, the doctor takes the patient into a specific environment and treats him. Another example is enhancing cognitive behavioral therapy.

Pre-Surgical Data

If there are some organs or bones to operate on, XR technology provides holographic visualization before the doctor makes any incisions. Besides, surgeons are widely using tiny cameras to operate with more accurate and careful penetration. 

Bionic Vision

Surgeons use this technology for more focused AR visualization and computer-assisted surgery use. It includes procedures like joint replacements, ACL reconstruction, and spinal fusion.

Subscribe for MMS Newsletter

By signing up, you will receive updates about our latest information.

  • This field is for validation purposes and should be left unchanged.