Tue. Feb 11th, 2025
    image

    From traffic control to social media feeds, machine learning powered solutions are increasingly being used in a wide range of applications.

    In fact, I used a Google Chrome plugin to create this first sentence. The plugin uses Natural Language Processing to suggest edits or paraphrases. Quite useful when one is writing a paper or a report. The initial one I had was “Machine Learning powered solutions are being leveraged in every aspect of our lives from traffic control to social media feeds.” Very impressive, right? The plugin is called Wordtune – AI-powered Writing Companion.

    Machine Learning algorithms are trained on past data to predict future behaviour be it engagement with a news article or which political candidate one is likely to vote for. When developing and using these solutions, being diligent is very important because one can unintentionally introduce errors that will compromise the solution or use case. Some of these errors are due to the training process / data not being inclusive or representative enough (Training Problem) for example if you train a self-driving car in the summer, it will struggle to maneuver itself in the rain. Many of these training problems are easy to notice as one can use technical metrics during training, testing and implementation to see if the solution is failing in a different paradigm to one it was trained on and, if possible, course correct. Some problems are difficult to solve for though because they show themselves once we start using the algorithm even one which was trained well (acceptable technical metrics) (Usage / Ethical Problem).

    Although many of us trust that Data Scientists are building working Machine Learning models, I believe it is not enough to put the onus of auditing the A.I. systems on just the ones who are developing them. Even users need to be able understand and question the outputs. “Does the result from this A.I. system make sense and should I use it?” Users have domain knowledge that Data Scientists might lack, which is important in figuring out whether the solution has any hidden holes or not and how best to use it. Depending on the use case, the effects of these errors might be very minimal e.g. SnapChat’s filters not working on darker faces but as A.I. is penetrating almost all aspects of our lives, all of us, across the board, need to take care. In this article I will briefly touch on four very cool A.I. solutions and highlight potential negative implications these systems might have even when technically they might seem well-trained. You can follow me because I will discuss more examples in upcoming articles.

    • Targeted ads – There is value for marketing teams to use targeted ads. Many of these algorithms do not suffer from a training problem but a usage problem. They are built on a person’s previous purchasing behaviour to suggest other products an individual might need and they do quite well, too well some might say. This sounds great but the risk comes in when the individual ends up buying what they do not need but what they have been convinced they do. One predatory example is diet or health supplements. Targeted ads can have the potential to prey on an individual’s vulnerability, compounding a problem that made them create the purchase history dataset in the first place. The algorithm is doing very well – accurately targeting an individual who has been searching for similar products but is it being used correctly? Some targeted ads do not take context into account; there is a socio-economic reason why individuals from deprived communities tend to buy more lottery tickets than individuals from higher LSMs. The A.I. algorithm can pick up this trend and suggest that lottery companies should target lower LSMs but when considering the usage / ethical problem, should we? If so, how do we do it in ways that will not compound an already existing problem? There is a great research paper that was published in Nature Magazine on these negative pyshological effects of targeted ads.
    No alt text provided for this image
    • Customer segmentation – This is one use case almost every company that deals with many different customers implements at some stage. The old way of segmenting was based on demographic information such as age, location, and gender which was not effective and was not personalized enough. With the rise of powerful Machine Learning models, this segmentation is becoming more purer using multiple behavioural features. While it is helpful to know the groupings in your customer base so you can service them appropriately, potential usage problems arise because of the fine line between segmenting and stereotyping customers. Care needs to be taken when using the A.I. solution in order to avoid the latter. Users need to question the results and justify their use cases for example when the best performing products are offered to customers in one group because the algorithm clustered them together with others who have strong purchasing power and not offer the same products to the other groups. This has the potential to increase social inequality and breach of regulation around treating customers fairly.
    • Medical diagnosis prediction – The medical field is very attractive for A.I. and Robotics professionals. Research is going on into how A.I. algorithms can run on tiny chips found in edge devices such as cellphones which could possibly help with delivery of medical services to remote areas, for example. The problem is many of our historical medical data was gathered from a non-inclusive demographic, particularly white adult males based in the Western world. This lack of representation creates a training problem because it makes it difficult to train a comprehensive A.I. solution that can be applied to different demographics. This can mean misdiagnoses of patients who do not fit the data the models were trained on. If the dataset is not the secret sauce of a solution, should we start sharing it along with the solution for users to audit the applicability?
    • Detecting fake news – Social media platforms are struggling with scourge of fake news and are investing large sums and talent into combating it. I published an article on Investec Focus discussing this issue. The problem is these platforms by their nature are US- and left-centric. There is fake news on both sides of the spectrum but the datasets being flagged as fake news disproportionately disfavour one narrative over another. Some journalists have complained that their articles were being flagged on Facebook as being unreliable simply because they do not work for the major news corporations though their information was not fake. This is an example of what I am calling a training problem and care needs to be taken in the composition of the dataset itself.

    As we are increasingly using A.I. in varied solutions, it is not enough to just audit the technical training metrics of the algorithms. I view A.I. as a tool, a means to an end such as solving a business problem. It is crucial that we also evaluate the usage problems we might inadvertently introduce into our Machine Learning-powered solutions.

    It’s a brave new world!

    57 / 100