About LLM Guardrails (and safeguards) with Grok 2
The Endless Debate on Openness vs. Restriction
Good morning, AI enthusiasts!
Today, we dive into whether LLMs should be open or restricted for safety.
Should these models offer full freedom, or should there be tight control on what people can and cannot do?
Or, on the company side, should you have guardrails (and safeguards)? What kind of guardrails?
Models range from fully open access to heavily locked-down systems.
On one end, you have models that allow nearly anything, even potentially harmful outputs.
On the other hand, you have models that are so restricted that they block not only dangerous content but also useful functionalities.
Let’s discuss guardrails through five ways of tackling this with concrete examples.
The Core Question: To Have Guardrails or Not?
First, let’s define what a guardrail is in the AI world: there are controls or restrictions placed on models to prevent them from generating harmful, illegal, or ethically questionable content (according to you, this is where it gets blurry).
If you are curious and have the same question as us about the difference between guardrails and safeguards, here’s what ChatGPT says:
In the LLM context, guardrails are proactive design measures or restrictions built into a model to prevent it from generating harmful or unwanted outputs. They focus on controlling and guiding the model’s behavior during its operation.
Safeguards, on the other hand, are broader protective mechanisms that include monitoring, interventions, and post-processing checks to catch and mitigate harmful outputs after they occur. Safeguards can also involve external policies, human oversight, and reactive systems to ensure safety and compliance beyond the model's initial design.
In this article, we will use guardrails as a general term, including both built-in internal (guardrails) securities and external (safeguard) securities.
As LLMs advance, the ethical debates grow louder. Some argue for models prioritizing freedom and exploration (e.g. the recent AI Alliance), while others push for strong guardrails or safeguards to prevent misuse (E.g. Google).
We can think about these approaches along two dimensions: model openness and level of guardrails. Regarding openness, we have open-source models where code, training data, and weights are publicly available, open-weight models where only the weights are shared, and closed-source models where only the outputs are accessible to the public. In terms of guardrails, we see a spectrum from models with no restrictions, like Grok 2, to those with moderate guardrails (e.g. OpenAI, Anthropic) that restrict harmful or illegal outputs, to systems with extensive restrictions like Google's Gemini that sometimes blocks even valuable information.
It's important to note that openness and guardrails are not inherently linked. For instance, Grok 2 is a closed-source model with no guardrails, while some open-weight models, such as Llama 3, implement light guardrails through techniques like RLHF during post-training. However, an important consideration for open-source and open-weight models is that users can potentially fine-tune them and remove or modify existing restrictions, adding a layer of complexity to the guardrails discussion.
Let's explore where some of the major players stand and why.
Grok 2, developed by xAI (X, Elon Musk), is one of the most capable models with the most ease of use (right within X’s user interface). Premium subscribers can access nearly unrestricted generations, reflecting a philosophy of maximum freedom (and thus maximum controversy). For example, the below images were generated using the FLUX.1 model, which is offered along Grok with the premium subscription.
Then, Allen AI leans toward open-source principles with Ai2 OLMo, offering models with some safety measures implemented through fine-tuning and Direct Preference Optimization (DPO). But it’s open-source, which means you can fine-tune it to generate any kind of output as well.
Meta provides Llama3’s weights. Similarly, it has some safety measures implemented but you can still fine-tune it to generate what you want, assuming you have the compute and skills, which is already a monetary type of guardrail.
OpenAI’s GPT-4 is a closed model with relatively strong guardrails. Though similar to Anthropic, it is quite flexible for what you ask as long as it doesn’t include generating content related to a specific person or that may be illegal.
Then there's Google’s Gemini (called the “woke LLM” by many), one of the most restricted models, where even some useful functionalities are blocked to prevent potential risks.
Here is an example of blocking a user with a regular query:
In another case, a user by the name of needlesslygrim tried to generate C++ code, but the LLM refused:
Personally, I've given up on Gemini, as it seems to have been censored to the point of uselessness. I asked it yesterday [0] about C++ 20 Concepts, and it refused to give actual code because I'm under 18 (I'm 17, and AFAIK that's what the age on my Google account is set to). I just checked again, and it gave a similar answer [1]. When I tried ChatGPT 3.5, it did give an answer, although it was a little confused, and the code wasn't completely correct.
Incredible, right? This is what can happen with tight guardrails.
The Extremes of Openness vs. Restriction
As with everything, the middle ground is probably our best bet, but the extremes are interesting.
Why did xAI and Google explore two opposite paths?
xAI’s models embody an open philosophy. Free speech, as Elon Musk often refers to.
It allows users to explore nearly anything, including potentially harmful topics like weapons. Proponents argue that this openness accelerates innovation, sparks societal debate, and could even force governments to establish more explicit legal boundaries. This is the right reason to do so.
It could also be another randomly “fun” decision by Elon Musk to create controversy and noise or just a marketing move to get people to use xAI instead of the competition.
On the other hand, Google’s Gemini represents the opposite approach, with layers of restrictions that block even benign content if it’s deemed too risky. For instance, you could not generate any image of a person (and no white people) because the model deemed it potentially hurtful.
This leads to frustration, as users cannot explore certain topics. But how bad is it? Would generating images of any person or near-the-line hurtful generations be helpful? We don’t have an answer, but we suspect having some guardrails is better than not having any.
It’s similar to the Apple vs. Linux debate: Apple’s walled garden is considered safe and consistent but limits freedom. Linux, by contrast, allows endless tinkering with all the associated risks but also associated innovations and robust safety from all the contributions of the open-source community.
Similarly, there’s the Apple vs. Android debate. Apple restricts many apps to promote new innovative features while also limiting them with high fees (for every purchase made on an iPhone, 30% goes to Apple). Mark Zuckerberg recently spoke out on how Apple was highly limiting them in regards to their AI features compared to Android.
This debate concerns a larger ethical question: Should we prioritize freedom or safety? Fewer guardrails would benefit scientific progress, greater creativity, and more robust model testing. Fewer restrictions can also drive public discourse and encourage the development of legal frameworks.
But there are significant downsides, too. Without strong guardrails, models can be misused for harm, misinformation, and illegal activities. The risk of generating defamatory content or even enabling criminal behavior is possible.
So, is Elon Musk's push for complete freedom about genuine innovation, individual liberties, or just a chaotic move? Only the future will tell…
In any case, this isn't a new debate. History shows that both open and closed models can succeed. Both Linux and MacOS are successful in their own ways. Wikipedia’s open approach won against closed encyclopedias, while Microsoft’s closed PowerPoint software dominated its open-source competitors (can you name one?).
Both openness and restriction have their place, depending on the context and the users' goals. Neither is universally better—each has its strengths depending on what you want to achieve.
One thing’s certain: having variety (vs. a monopoly), as with xAI, Meta, OpenAI, and Google, is ideal.
Conclusion: Guardrails or Not—What’s the Point?
In the end, the debate over LLM guardrails reflects our broader values.
Are we willing to take risks for progress, or do we prioritize safety, even if it means limiting freedom?
Should we Move fast and break things, or play it safe and progress slowly?
Perhaps Elon Musk’s actions are more than provocations—they could push us (and governments) to confront these questions head-on instead of letting the tech giants decide as they please.
As AI evolves, these debates will shape how we build and use these powerful technologies. Finding the balance between freedom and control will be one of the key challenges moving forward, and we are glad we can take part in this.