Platform Guardrails

Guardrails can either be configured on platform or use case level. It is up to the use case admin to decide whether to follow the platform’s recommendation or to include an individual setting. A system admin can set the guardrails via the admin panel. Platform admins can additionally add guardrails to the chat.

What Are AI Guardrails?

AI guardrails are like safety rules and boundaries for Artificial Intelligence. They help guide the AI’s behavior, ensuring it provides helpful and safe information, and avoids creating content that could be harmful or inappropriate. Think of them as the protective fences on a highway, keeping traffic (the AI’s output) on the right path.

Why You Might Want To Use Guardrails

We use guardrails primarily for:

Safety: To prevent the AI from generating dangerous, offensive, or biased content.
Quality & Focus: To keep the AI relevant to your request and maintain high-quality responses.
Compliance: To ensure the AI follows specific company policies, legal rules, or ethical standards.

You’ll find guardrails most useful in situations where AI interacts with the public, handles sensitive topics (like health or finance), or needs to adhere strictly to specific guidelines.

Limitations: Why They’re Not Perfect

It’s important to know that AI models, especially Large Language Models (LLMs), are statistical systems. This means they learn by finding patterns in vast amounts of data and then predict the most likely next word or phrase. They don’t “understand” things like humans do. Because of this statistical nature:

They can sometimes be bypassed: Clever or unusual prompts might occasionally find a way around the guardrails.
Nuance can be tricky: Human language is complex. Sometimes, guardrails might accidentally block harmless content or miss very subtle inappropriate meanings.

Implementation in the Genow Platform

Content Filter

The content filters check prompts against a list of safety attributes, which include “harmful categories” and topics that may be considered sensitive. Every filter can be set between 0% and 99%, while 0% means it is Off, and 99% is the highest, strongest setting. For an optimal user experience, we recommend not having the filters set to more than 30%. These categories can be checked by a content filter:

Toxic Content that is rude, disrespectful, or unreasonable.
Derogatory Negative or harmful comments targeting identity and/or protected attributes.
Violent Describes scenarios depicting violence against an individual or group, or general descriptions of gore.
Sexual Contains references to sexual acts or other lewd content.
Insult Insulting, inflammatory, or negative comment towards a person or a group of people.
Profanity Obscene or vulgar language such as cursing.
Death, Harm & Tragedy Human deaths, tragedies, accidents, disasters, and self-harm.
Firearms & Weapons Content that mentions knives, guns, personal weapons, and accessories such as ammunition, holsters, etc.
Public Safety Services and organizations that provide relief and ensure public safety.
Health Human health, including: Health conditions, diseases, and disorders Medical therapies, medication, vaccination, medical practices, and resources for healing, including support groups.
Religion & Belief Belief systems that deal with the possibility of supernatural laws and beings; religion, faith, belief, spiritual practice, churches, and places of worship. Includes astrology and the occult.
Illicit Drugs Recreational and illicit drugs; drug paraphernalia and cultivation, headshops, etc. Includes medicinal use of drugs typically used recreationally (e.g. marijuana).
War & Conflict War, military conflicts, and major physical conflicts involving large numbers of people. Includes discussion of military services, even if not directly related to a war or conflict.
Finance Consumer and business financial services, such as banking, loans, credit, investing, and insurance.
Politics Political news and media; discussions of social, governmental, and public policy.
Legal Law-related content, including law firms, legal information, primary legal materials, paralegal services, legal publications and technology, expert witnesses, litigation consultants, and other legal service providers.

Jailbreak Filter

Jailbreaking occurs when a malicious actor attempts to bypass the model’s security controls. This can lead to the AI ignoring its usual instructions, circumventing an AI model’s security measures, or performing actions for which it was not intended. The jailbreak filter can be set to Off, Low, Medium and High.

Welcome

Setup and Manage your Genow Platform

Analyse and Manage Use Cases

Permissions

Settings & Customisations

What Are AI Guardrails?

Why You Might Want To Use Guardrails

Limitations: Why They’re Not Perfect

Implementation in the Genow Platform

Content Filter

Jailbreak Filter

Welcome

Setup and Manage your Genow Platform

Analyse and Manage Use Cases

Permissions

Settings & Customisations

​What Are AI Guardrails?

​Why You Might Want To Use Guardrails

​Limitations: Why They’re Not Perfect

​Implementation in the Genow Platform

​Content Filter

​Jailbreak Filter

What Are AI Guardrails?

Why You Might Want To Use Guardrails

Limitations: Why They’re Not Perfect

Implementation in the Genow Platform

Content Filter

Jailbreak Filter