Knowledge Base: Tips and best practices from moinAI

About this guide

A well-structured knowledge base is crucial to the effective performance of an AI chatbot within a company. In this guide, moinAI explains which types of content should be included in the knowledge base and how the RAG process works in knowledge management. What are the key considerations when it comes to setting up, quality assurance and ongoing maintenance? We explain the most important settings, including best practices and common pitfalls, and how to avoid them!

moinAI features mentioned in the article:

What is a Knowledge Base?

The AI knowledge base serves as the AI-powered hub for organising, storing and providing relevant resources for using AI. This information may include FAQs, guides or product documentation, and is usually categorised under a specific topic. The AI gains access to this content through machine learning and natural language processing (NLP).

The moinAI Knowledge Base: Knowledge and AI put to targeted use

The knowledge base is at the heart of our chatbot, because it’s important to realise that AI likes to draw on a great deal of knowledge and is capable of storing a lot of information, but AI that is kept under control is the best kind of AI!

– Johannes Hehr, moinAI 

The Knowledge Base therefore serves as the central knowledge repository for the moinAI chatbot. As the heart of the moinAI solution, this is where resources for AI agents are added and managed. These resources are then used by AI agents to provide appropriate, case-specific responses in customer communications.

Screenshot der Benutzeroberfläche des „moin AI HUB“ Dashboards für das „Knowledge Base Management“. Das Interface ist auf Deutsch eingestellt. Im oberen Bereich ist eine RAG (Retrieval Augmented Generation) Pipeline mit fünf aufeinanderfolgenden Schritten dargestellt: Wissensabruf, Wissensprüfung, Instruktionen, Antwortgenerierung (mit ausgewähltem LLM „GPT-4o mini“) und Datenextraktion. Mehrere Bereiche sind mit roten Rechtecken hervorgehoben: der Navigationspunkt „Knowledge Base“ in der linken Menüleiste, der „RAG“-Tab oben rechts sowie zwei Buttons mit der Aufschrift „+ NEUE RESSOURCE“ – einer im ersten Schritt des RAG-Prozesses und ein weiterer rechts im Bereich „Alle Inhalte“.

Why is the knowledge base so important?

The knowledge base is key to optimised chatbot performance, because without a verified database, artificial intelligence can ‘hallucinate’ or provide outdated information. We explain exactly what ‘hallucinations’ mean in the context of AI in more detail in our article here.  Incorrect outputs have a negative impact on the automation rate and customer satisfaction within the company. Unlike general AI applications, which are often perceived as a ‘black box’, moinAI’s AI works only with explicitly stored knowledge. The specific advantages of a knowledge base as the backbone are as follows:

  1. Up-to-date and controllable: Knowledge can be updated or removed at any time, independently of the AI model itself.
  2. Higher automation rate: Accurate answers reduce the need to escalate queries to human agents, known as ‘human takeover’.
  3. Consistency: All users receive a company-approved answer, at all times and regardless of the number of incoming queries. 
  4. Brand-consistent communication: the tone and language of the outputs reflect the corporate identity.
  5. Scalability: the knowledge base can quickly cover new topics or products through simple expansion; no new training is required.
  6. Measurability: potential knowledge gaps become visible and can thus be optimised in a targeted manner.

In short: A well-maintained knowledge base forms the most important foundation for a reliable AI chatbot in a business setting. It ensures quality and trustworthiness in automated customer communication!

What types of content are stored in the Knowledge Base?

In the Knowledge Base, content types such as PDFs, web pages and CSV files can be easily uploaded so that the AI agent can provide the correct information as output. The formats and their uses are summarised here:

Format When is it suitable? Example
PDFs Manuals, product sheets Technical data sheet
Websites (URL) Existing FAQ pages Embedded support page
Documents Structured guides, policies Troubleshooting guide
Q&A pairs Frequently asked standard questions “How do I reset my password?”
CSV files Structured data sets Product catalogue with prices
Ein Screenshot, der zwei verschiedene Suchmodi im Bereich „Alle Inhalte“ einer Knowledge-Base-Benutzeroberfläche zeigt. Im oberen Bereich ist die Standardsuche zu sehen: Das Feld „Name“ ist rot umrahmt, und die Suchleiste zeigt den Platzhaltertext „Suche nach Titel / URL / Name“. Im unteren Bereich ist die Inhaltssuche aktiv: Hier ist das Feld „Inhalt“ rot umrahmt. Die Suchleiste hat nun einen lila Rahmen mit einem Funkeln-Icon und dem Platzhaltertext „Abfrage oder Inhalt, z.B. ‚Wie viel kostet es?‘“. Direkt darunter erscheint ein gelber Hinweisbalken mit der Meldung „Auch 1 nicht verwendete Ressource gefunden ANZEIGEN“, auf den ein roter Pfeil zeigt. Unter beiden Suchleisten befinden sich Filter-Buttons für Dateitypen wie PDF, Webseite und Dokumente.
Keep track of all content in the moinAI Knowledge Base
Did you know? Resource management and agent control are handled in the moinAI Hub

PDFs are uploaded directly to the Knowledge Base and integrated without the need to transfer content manually. Text-based PDFs are preferable, as PDFs with a strong emphasis on layout — such as those containing columns or tables — can lead to faulty text extraction.

Websites can be integrated via their URL and automatically parsed. Existing FAQ pages or help centre content can thus be utilised directly without duplication of effort. Tip: When automatic updates are enabled, the Knowledge Base remains synchronised with the live site at all times. Unsuitable pages are those with extensive navigation, advertising or dynamically loaded JavaScript content. They often yield unusable scraping results. It is best to integrate simple, text-heavy pages and check the result.

Documents are formatted texts with paragraphs, headings, bullet points and tables. They are created and edited directly in the moinAI Hub and are ideal for content requiring detailed explanations, such as instructions and process descriptions. Please note: Where possible, create only one document per topic, clearly structured and without redundant content.

Summaries for the AI: For the resource types website and document, the AI automatically generates a summary of the content. These summaries serve as the primary basis for the AI agent’s decision-making when retrieving knowledge.

Question-answer pairs link a specific question to a structured answer. The AI recognises this direct relationship and thus provides particularly accurate results. It is crucial not to phrase questions too generally. The closer the stored question is to the users’ actual language, the better the match.

CSV files are primarily structured datasets such as product lists and price tables, and they represent large amounts of structured information for the AI. It can access them directly without any processing being required. However, clean data hygiene is a prerequisite here: missing defined headers, inconsistent column names or mixed data formats within a column can lead to interpretation errors.

By the way: external articles can also be added to the knowledge base, though the connection is made via an API!

Infographic by moinAI titled "Die moinAI Knowledge Base Inhaltstypen auf einen Blick" (The moinAI Knowledge Base content types at a glance). A central robot character is surrounded by six different data sources pointing to its knowledge base: Q&A pairs, API (external content), documents, CSV files, PDFs, and websites (URL).

Step by step: populating the knowledge base

1. Review of the current situation

Before allocating resources, it is worth taking a look at your own data: what questions do customers ask most frequently, and via which channels? These include live chat, email and telephone support. Existing support tickets and chat logs are the most valuable resources for internal analysis when it comes to identifying customer concerns. Your own FAQ page is also an important source of information.

The outcome of the analysis should be a list of priorities; ideally, this will identify the top 20 queries that account for the largest proportion of daily enquiry volume. These form the core of the knowledge base and ensure a high level of automation right from the start. In a second step, less frequent or more complex topics are then added.

2. Preparing content

When it comes to preparing content, there is a rule of thumb: AI is only as good as the content it is fed! This means that the information recorded in the knowledge base should be clearly written and structured. Quality takes precedence over quantity, and particular attention must be paid to the following:

  • Be clear: Ambiguous phrasing confuses the AI. Short, direct sentences containing specific information yield the best results; lengthy product descriptions are problematic.
  • Organise by topic: One document per topic is better than a single long document covering too many topics at once. A summary is automatically generated for each document resource, which the RAG agent then analyses. Based on this, it decides whether the full content should be used to generate a response. Documents that are vaguely worded or thematically overloaded are less likely to be correctly identified.
  • Make targeted use of question-answer pairs: The most precise format for recurring standard questions is question-answer pairs. The closer the stored question is to the users’ actual language, the better the answer will fit. Phrases taken directly from chat logs are best suited here, not technical terms from the specialist field.

3. Quality assurance prior to publication

Before the AI chatbot goes live, the content stored in the system should be tested under real-world conditions. moinAI provides the AI Playground within the moinAI Hub for this purpose: here, real customer queries can be tested directly against the knowledge base, and the answers and sources used are thoroughly reviewed. Two settings in particular should be configured correctly before the go-live:

  1. Enable knowledge checking: This tightens the criteria for generating responses. The AI agent will only provide a response if the stored sources allow for a clear and verifiable answer. In critical cases, response generation is even deliberately suppressed to prevent hallucinations. Prevents hallucinations.
  2. Configure guardrails: Some basic security mechanisms, such as content restrictions, topic restrictions and protection against prompt injections, are enabled by default. In addition, competition protection can be optionally activated. This blocks enquiries about competing products or redirects them to your own offering. For particularly sensitive matters, a compliance check is also available to perform a second explicit review of all enquiries.
Safety first: With the knowledge check, the system only provides an answer if the AI is certain. Whilst this reduces the automation rate, it prevents errors!

4. Ongoing maintenance

To ensure that the Knowledge Base does not contain outdated content or poor-quality articles, it must be kept up to date. However, ongoing maintenance is not solely a manual task; it can be automated using a number of features:

  • Use automatic updates: In the Hub, you can set a fixed update interval; the options are 7, 14 or 30 days. For particularly volatile content such as prices, opening hours or availability, there is also a
  • real-time update option: when set to ‘active’, the system checks with every request whether the stored content is still within the defined validity period. If necessary, the page is scraped again immediately. Please note that shorter intervals temporarily increase response times, but the trade-off is worth it!
  • MCP Server for automated content management: Using the moinAI MCP Server, content can be retrieved, created, edited or deleted via command. This integrates the knowledge base into an existing AI environment – without the need for manual intervention in the hub. This is particularly advantageous when content needs to be synchronised regularly from external systems.
  • Systematically closing knowledge gaps: The Hub features a ‘Missing Knowledge’ section. All enquiries for which the chatbot was unable to provide a suitable response are listed here. This may be due to a lack of resources on a particular topic or because the existing sources were insufficient. Clicking on the speech bubble opens the entire chat history, allowing you to review the list of all gaps. These customer insights are the most direct and valuable source for the continuous development of the knowledge base.

Technical criteria for optimal knowledge base performance

A clear data hierarchy and high-quality input data are essential for optimal knowledge base performance. As described earlier, content must be clearly categorised to avoid inconsistent results. Unstructured or outdated sources increase the error rate. In summary, AI needs the following to perform at its best:

  • Consistency: No conflicting information across different documents
  • Granularity: It is better to have several specific documents than one long, generic one
  • Timeliness: Maintenance processes – i.e. a continuous monitoring infrastructure – are essential, as outdated content reduces the hit rate
  • Language and tone: Content should reflect the language of the users.
  • CSV structure: Header rows are mandatory and clean data hygiene is crucial.
  • PDF quality: Text-based PDFs only (no scanned images without OCR).

With moinAI, changes can be made in real time without the need to retrain the model, ensuring consistently high-quality responses with minimal maintenance.

The RAG agent turns knowledge into answers

‘Retrieval’ means accessing information, ‘augmented’ means I’m expanding my knowledge, and ‘generate’ means I’m generating a response. This means that the AI at moinAI is able to say: I’ll look for additional information that the user has made available to me in the knowledge base, I’ll expand my knowledge with it, and then generate an answer based on the source and my knowledge. This makes the answer more precise, more reliable and much more informative for the user.

– Johannes Hehr, moinAI

RAG is a structured process for knowledge base management that offers a range of configuration options to control the responses of AI agents or generative AI. When a user asks a question, the AI agent searches the stored sources specifically for relevant content. The RAG system dynamically retrieves relevant information; the selection of sources takes place in two stages: first, the summary of a resource is checked; finally, only if relevance is identified is the full content used to generate the responses. No guesswork, just substantiated answers tailored to the query. Content-related guidelines for the agent’s response generation can be defined via instructions. Tone and communication guidelines, on the other hand, are controlled separately via policies and personas. The large language model is selected to match the best possible application-specific functionalities of the AI agents. The options include current models from the GPT series via Microsoft Azure as well as OpenWeights models.

Do you have any questions about the Knowledge Base or knowledge management? Your dedicated Customer Success team at moinAI is always here to help, with a personal contact who knows you and your setup!

AI actions for AI agents are particularly useful for resolving cases: they enable the agent to go beyond simply generating responses and ask follow-up questions automatically or trigger actions in third-party systems. Here is the RAG process at a glance:

Safety measures

The reliability of the extracted data is ensured by guardrails: firmly established safety mechanisms that reduce hallucinations and enforce topic restrictions. Guardrails actively block attempts at manipulation, such as the deliberate circumvention of safety rules or identity impersonation. They are always active and enhance security:

  • No harmful or confidential content
  • Topics are restricted to defined areas
  • Hallucinations are reduced through the exclusive use of verified sources
  • Spam or illegible requests are ignored
  • Internal system information (e.g. prompts) cannot be accessed
  • Identity impersonation (e.g. phishing) is prevented
  • Offensive or abusive language is filtered

In addition, the knowledge check can be made more stringent: when enabled, the AI agent will only generate a response if the facts in the knowledge base are clear; in cases of uncertainty, it will leave the query unanswered. When competition protection is enabled, queries regarding rival products are either blocked or automatically converted into a query about the company’s own product. This ensures that the agent remains consistently focused on the company’s own brand.

Common mistakes – and how to avoid them

Common pitfalls with knowledge bases can be specifically avoided by taking the right steps. We’ll show you some typical mistakes and what you can do to ensure your knowledge base is set up correctly:

Checklist: What matters most for a chatbot
  • ⚠️ Documents that are too generic: Content should be broken down into granular, topic-specific units, each covering a clear and distinct question.
  • ⚠️ Contradictory information from different sources: Multiple resources on the same topic with conflicting statements. Regular source audits ensure that outdated or redundant content is removed or merged.
  • ⚠️ Missing guardrail configuration: Competitive protection and compliance controls should be intentionally enabled or disabled depending on the company context. Guardrails are always active but should be actively reviewed.
  • ⚠️ PDFs with scanned text: Scanned documents are treated as images and cannot be read. Text-based formats are essential in resource management!
  • ⚠️ Knowledge Base is not being updated: Filled once and forgotten. Regular review and updates of resources are mandatory to maintain the quality of answers over time.

Data Protection & GDPR Compliance

The company retains full control over the data, as moinAI works exclusively with explicitly stored knowledge: there is no uncontrolled learning from external sources, and no sensitive information is passed on to third-party training datasets. All data is stored on servers in Germany and is SSL-encrypted. moinAI does not collect any personalised data without the user’s prior consent. Furthermore, manual deletion of conversation data is always possible upon request. Role-based access in the moinAI Hub allows for targeted control of internal data access.

Conclusion 

To ensure the quality of the AI chatbot in use, the knowledge base must be structured as a solid foundation and continuously updated. A well-organised knowledge base improves automation rates and boosts customer satisfaction in the long term. What makes moinAI special: the AI works exclusively with the knowledge that you and your company provide. The processing and use of this data are transparent and GDPR-compliant. After all, both companies and users must be able to trust the AI for its deployment to be successful and for the full potential of automation to be realised.

[[CTA headline="Fewer support tickets, better answers!" subline="Try out how a chatbot works on your website, with no obligation." placeholder="Insert your company URL here" button="Try it now!"]]

Happier customers through faster answers.

Überzeuge dich selbst und erstelle deinen eigenen Chatbot. Kostenlos und unverbindlich.