Data-driven grocery shopping

Uncategorized

MMS • RSS

Article originally posted on Data Science Central. Visit Data Science Central

Recently, my better half commented on my deficient ability to spot good value whilst shopping for groceries. She was right: As I was only looking at the product and largely ignoring price, I was spending too much. The Albert Heijn in Prinsenland is not quite as upscale as (say) Whole Foods, but I can only assume the manager was still laughing all the way to the bank every time I visited. Time to go for the other extreme, armed with data science.

As online shopping has become ubiquitous, there should be sufficient data to contrast and compare across supermarkets. However, as this information is available yet not provided on a silver platter anywhere, we need to create the required transparency and comparison platform ourselves. The goal here is to optimise personal grocery expenditures at e.g. Albert Heijn so we can save more money… to buy shares of Albert Heijn for our retirement accounts.

Now, we also don’t want to lock ourselves in to any particular vendor, so we need to compare with relevant other local grocery chains – in this case, Jumbo and Hoogvliet. The Lidl supermarket unfortunately does not have the required e-commerce front-end. And ideally we don’t even have to visit the shop anymore, except perhaps to pick up the resulting basket of products. So what we need to do is 1) get the data 2) clean it up and make it comparable 3) store it 4) dashboard a relevant selection 5) place the order 6) block the order slot(s) in our calendar.

The below is a demonstration of how this data-driven culinary delivery system works from a technical point of view. I’ll briefly discuss what is going on to make this possible – but as much as this is crucial in creating the value (a wallet that is less empty towards the end of the month) I’d like to focus more on outcomes, concepts, and inferred lessons rather than technique.

So, behind the scenes, some choices were made. Data scrapers are running as cron tasks, on an AWS free tier Debian instance. Python is used to scrape with beautifulsoup4. I’m actually not a fan of this technique. It’s too fragile and too time-intensive to set up. Data transformation then happens with R. R needs more love, it’s still a great tool. When data transformation is complete R proceeds to upload the data to a cloud repository in the form of BigQuery (mysteriously, successfully completed jobs don’t show up in any GCP logs). The data is presented and queried via Google Apps Scripts in Sheets. And when all the preceding has demonstrated to be stable and predictable a few more times I’ll set up one-click ordering and scheduling. But enough about technology – how does this composite add value?

Well, for instance now we can see how prices evolve. We don’t quite have enough data to build predictive pricing models, and for this exercise we’re mostly interested in the here and now anyway, but it’s nice to finally see an AAPL that is safe to short.

Perhaps more importantly we now have the ability to compare and contrast prices, as per the below:

When you’re drudgingly waddling around the supermarket dead tired after a long day of work, you don’t care as much about the penny differences between similar products as you possibly should. There’s some serious cognitive requirements to rigorously and consistently making rational decisions whilst filling up the shopping cart, and you can only really see what’s in front of you. Supermarkets know this and make it as easy as possible for you to gaze at their attractively labelled high-margin selection.

Have a look though at how seemingly small price differences add up: At Jumbo, the original version of Speculoos is the cheapest and comes in at 4.57 per kg. This very same product is priced at 4.88 per kg at Hoogvliet, and 5.07 at Albert Heijn. Between extremes there’s a whopping 11% percent difference! And that’s just this random example.

On to our dashboard. In this case Google sheets was used as I try to do as much in the cloud as possible. Three separate hard disk crashes taught me that (I’m stubborn). Sheets is an ok choice for this purpose as it’s easy to set up and configure, although I find it a bit unpredictable and mysterious in its functioning at times. As said a few dashboard iterations from now I’ll add a button to send out orders automatically.

So what could we learn from the preceding narrative, drawing lessons beyond the confines of the world of online grocery shopping, this high frequency supermarketmoneyball?

Moving towards data-driven processes is of course by itself a process that has been going on for quite some time – take for instance this data-driven 2002 Major League Baseball initiative quote, demonstrating an unwillingness to accept that data-based methods could outperform practices based on human expertise:

You don’t put a team together with a computer … Baseball isn’t just numbers. It’s not science. If it was anybody could do what we do, but they can’t cause they don’t know what we know. They don’t have our experience and they don’t have our intuition. … there are intangibles that only baseball people understand

As you probably know from the book or film, the baseball team went on to break record after record, and did so in an unprecedented cost-effective manner. Following the demonstration of the success of this data-driven methodology, other teams stopped laughing and started copying.

You may also have heard about the digital transformation of stock markets, which I have published on. One of my interviewees in 2008 at NYSE told me, reflecting on how algorithms had rapidly and agressively taken over the floor:

I don’t trust computers. Human communication is more flexible, safer, secure. We’re more reliable, we deliver service, we have integrity. I can’t help my customer if they use electronic systems. And I can’t compete with electronic trading, the system is going too fast, it’s too much to handle. Algorithms never get tired and don’t take breaks. You can’t compete with that. We provide a much better service, computers are just dumb rule-based systems.

In a way, he was right, but for the wrong reasons. A lot of trades now occur outside the exchange and on the phone without intermediaries, but only because the intermediary algorithms are so good at their jobs that customers often feel taken advantage of.

But what about fashion? As a very seasoned manager told me:

Maybe in other industries you can do data science. But not in fashion. Our sales people really understand customers in a way that data can not represent. We have deep customer experience and instinct, we build on years of knowledge.

I don’t know enough about this specific industry to comment with authority, but I doubt that this person is right. Developments in deep learning now allow for visual interpretation and generation of fashion styles. To further drive the point home on how AI and data are transforming work across the board, a friend of mine in insurance made similar statements about how he relies on decades of knowledge on risk – knowledge which an algorithm could not possibly fathom. Yet, InsurTech is growing rapidly. And of course, in aviation data-driven methods and AI-based automation have become commonplace. So-called fly-by-wire systems under developments since the 1980s have transformed the piloting occupation from rockstar experts to autopilot monitors. Try not to think about it. Flights have never been statistically safer nor more economical, after all.

Going back to our grocery shopping example, this vignette shows how data science (as it is now commonly called) can aid in making rational, data-driven decision. Gut feelings and experience are good, but data is better, especially when a process is repetitive. The data can demonstrate which variables matter, where to get the best bang for your buck, and it’s easy to build a system around the decision models to further enhance convenience. What’s more, if I don’t actually visit the shop, I can’t be lured into buying that extra item that I initially didn’t actually want or need (Here’s looking at you, cookie dough ice cream).

In summary, I believe we are seeing our new data and AI driven world develop and unfold before our eyes: Data is now easy to process, cheap to store, undemanding to generate, and we have battle-tested analytical techniques and visualization tools to do so comfortably. AI is thus encroaching on cognitive processes that were until recently exclusively in the human domain. And it’s faster, more consistent, cheaper, and overall effective at some of these cognitive processes than we are. Resistance is futile?

About the author

Dr Roger van Daalen is an energetic, entrepreneurial, and highly driven strategic executive with extensive international management experience. He is an expert on industry digital transformations through over a decade of experience with data and artificial intelligence on capital markets and is currently working on transitions to cloud data architectures and applications of machine learning and artificial intelligence more broadly, to pioneer innovative products and services in finance and other industries as to aggressively capture market share and drive revenue growth. Feel free to reach out to him for executive opportunities in the Benelux

This article was originally posted on LinkedIn

Mobile Monitoring Solutions

Uncategorized