We might not know or accept it but every one of us is somewhat a data scientist. Every day of our lives we solve problems by making sense of inputs (a.k.a data) around us. Sometimes we do this without being conscious of the fact. Walking. Breathing. Homeostasis. All these processes do not involve deliberate thinking but depend on complex data crunching, formulation of predictions of their impact, and the classification of safety level. In this article though, I am focusing on deliberate problem solving, specifically in data science projects.
I think we can all agree that data science projects usually involve solving complicated problems. Many of these complexities arise not only from technical aspects of coding the algorithms but from the softer side as well. On a previous post I discussed the importance of domain knowledge. Without understanding what you are trying to solve for or what to begin looking at, you will soon find that spinning up a Jupyter Notebook and importing Python libraries is the least of your challenges BUT that’s the exciting part! Data Science projects stretch your thinking and understanding of how things work. “What do you project our stock price to be next year?” To come up with the right dataset for this project, you need to understand costing, revenue generating levers, market share, seasonality, human behaviour and so on. At times it is not direct knowledge of the project at hand that will take your solution to the next level. Data Scientists, like every other professional, are constantly challenged to think outside the box. Here are some examples of outside the box thinking that fascinated me. I hope they will get your creativity going and help you look for some outside inspiration:
- Continuing with the example of predicting the stock price, the obvious process would be to look at features such as daily buy and sell prices, revenues etc but one problem is the amount of data you can generate here. If you look at daily trades, the past 10 years of a company’s stock will give you just about 3 650 observations. Depending on your features, this dataset might not be enough to come up a valid model. Some data scientists have started thinking outside the box by processing images of parking lots to figure out how busy a retail store is. The hypothesis is if a retail store’s parking lot is full, it means the store is getting a lot of customers which directly translates to more sales, high revenue, happy shareholders, and a higher stock price.
- Other alternative data sources for this are jobs being advertised by the company and how long they stay open for. The thinking behind this is if the company starts aggressively advertising jobs in a new area of business, it means they are expanding and if they are filled quickly, the company is attractive to employees and is doing well.
- From the store’s point of view, one problem they face is predicting foot traffic. Pick n Pay can use previous data to do this by looking at seasonality, location, population, discounts offers, and demographic information. Have you seen Starbucks offering free wifi? That’s an alternative data source for them because they are using that to enrich their dataset they are using to predict number of many customers.
- Car manufacturers and investors need to know the sales of their cars and how well they are performing. The problem is this information is usually shared once every quarter at earnings presentations. Some data scientists have started tapping into data held by insurance companies to track how certain cars are being insured, and patterns around policies being claimed. This is giving investors insights into how the cars are performing before quarterly results.
- Satellite images are being used by governments to predict crop yields in an effort to assist in predicting any food supply risks they might face and mitigate adverse effects of events like droughts. Normally a data scientist would have looked at precipitation levels, weather patterns, nutritional requirements of the crop, location and so on but satellite or drone imagery is providing “out of the box”, real time data points to enrich what they already have.
- Supply of oil to financial markets can be predicted using a time-series on data composed of number of barrels supplied previously, for example. These days, some data scientists are using drones and satellite images to analyse outdoor oil storage facilities and use that to predict the short-term supply of crude oil.
- Some companies are not just waiting for competitors to announce what products they have built but are scraping social media posts on Twitter and running sentiment analysis to look for opportunities, analysing job postings or patents applied for to predict what new products or business units competitors are building.
There are many examples of how data scientists are thinking outside the box. The point is when you are doing a project, sometimes you need to be creative around ways to take it to the next level. It might not just be the number of transactions, their nature and value that will help you predict which ones of your clients will book a travel package. Maybe it is SpaceX’s new rocket announcements!
It’s a brave new world!