All digital tools, including AI, are susceptible to cyber-attacks. This is because every system, without exception, is vulnerable. So, either as a consequence of a software defect or as a result of social engineering, companies are permanently under threat from actors that try to exploit said vulnerabilities, usually for financial gain. As long as the digital world remains the main realm in which commercial (and frankly human) activity takes place, cyber security will continue to be paramount.
One of the avenues that functions as a way to combat cyberattacks is the concept of red-teaming. With terminology borrowed from WWII drill exercises, Red Teaming entails an effort by offensive security professionals responsible for “simulating real-world attacks on an organization’s systems and networks.” While many companies decide to have their own red team division, others put out “bounties” for freelance engineers to claim by ethically Hacking and stress testing the companies systems. Finding vulnerabilities and reporting them to a tech corporation usually leads to further collaboration, and it sometimes functions as an avenue for employment in the company’s security sector.
In a nutshell, Red Teaming is a crowd-funded effort that aligns incentives to attack a platform or algorithm, finding vulnerabilities and reporting them to get rewards. This has cemented itself as one of the most common practices in cybersecurity today. Red Teaming has been implemented in a wide array of cyber security areas, including AI, where it is used to test vulnerabilities in models. However, two other ways of implementing this approach with AI algorithms exist.
One of them is becoming more widely known as Red Teaming through Prompt Hacking; this means engineers try “coaxing the models into behaving badly.” This can entail getting them to state an erroneous answer to a math problem confidently or inventing an ID number for a person who doesn’t exist. Prompt HackingHacking is done to find vulnerabilities and correct them to strengthen the guardrails of AI-powered tools, mainly large language models or LLMs. This is a fundamentally different approach to red teaming because it involves no actual hacking.
However, the non-technical approach towards prompt hacking also works as a clue to solving a severe AI problem: unobservable biases. As has been heavily reported, there are several stages in which an AI tool can introduce or reproduce biases. Through the adoption of red teaming practices, AI algorithms can be “audited” by different sectors of society to shine a light on any possible biases that are both intentionally and unintentionally present in an algorithm.
There are two clear benefits of adopting this approach towards mitigating bias in AI: the first one is recruiting people knowledgeable in areas not traditionally involved with setting up these tools, which means disciplines like demographics, statistics, public policy analysis, and even cultural studies. Since the approach does not have to be as technical as traditional Red Teaming, a multidisciplinary effort can begin to look for biases and make the companies aware of them, retaining the incentive of gaining a “bounty” as well.
The second reason that calls for a more democratic approach to bias evaluation is that it is important to move past the stigma that is forming around AI and its “inclinations.” This is an inescapable element of Artificial Intelligence systems, but vilifying these tools because of their potential to replicate biases is not a correct path toward innovation. Companies often choose not to disclose their systems’ bias because it represents a considerable risk factor. The creators of widely used algorithms usually do not have the technical knowledge or topical sensitivity required to search and identify these issues.
This crowdfunded approach to monitoring AI bias can create a system of incentives that reduces the harms of unobservable biases while de-stigmatizing the issue as well. These kinds of algorithms have groundbreaking efficiency and need to be implemented sensibly in order to be used to their full potential. By involving more areas of knowledge in this debate, more instances can be identified and corrected, which will always be beneficial for the industry as a whole.
* Guillermo Alfaro studied International Relations and Political Science at ITAM, where he specialized in research on AI and global tech governance. He was a member of the first cohort of the Cyber Policy Dialog for the Americas, organized in conjunction with Stanford University (2024), and has fostered debate around these topics by organizing academic events in Mexico. Guillermo is deeply committed to positioning the Global South at the forefront of discussions regarding AI and technology.
Source: We Are Innovation