What is a Knowledge Base?
The AI knowledge base serves as the AI-powered hub for organising, storing and providing relevant resources for using AI. This information may include FAQs, guides or product documentation, and is usually categorised under a specific topic. The AI gains access to this content through machine learning and natural language processing (NLP).
The moinAI Knowledge Base: Knowledge and AI put to targeted use
The knowledge base is at the heart of our chatbot, because it’s important to realise that AI likes to draw on a great deal of knowledge and is capable of storing a lot of information, but AI that is kept under control is the best kind of AI!
– Johannes Hehr, moinAI
The Knowledge Base therefore serves as the central knowledge repository for the moinAI chatbot. As the heart of the moinAI solution, this is where resources for AI agents are added and managed. These resources are then used by AI agents to provide appropriate, case-specific responses in customer communications.

Why is the knowledge base so important?
The knowledge base is key to optimised chatbot performance, because without a verified database, artificial intelligence can ‘hallucinate’ or provide outdated information. We explain exactly what ‘hallucinations’ mean in the context of AI in more detail in our article here. Incorrect outputs have a negative impact on the automation rate and customer satisfaction within the company. Unlike general AI applications, which are often perceived as a ‘black box’, moinAI’s AI works only with explicitly stored knowledge. The specific advantages of a knowledge base as the backbone are as follows:
- Up-to-date and controllable: Knowledge can be updated or removed at any time, independently of the AI model itself.
- Higher automation rate: Accurate answers reduce the need to escalate queries to human agents, known as ‘human takeover’.
- Consistency: All users receive a company-approved answer, at all times and regardless of the number of incoming queries.
- Brand-consistent communication: the tone and language of the outputs reflect the corporate identity.
- Scalability: the knowledge base can quickly cover new topics or products through simple expansion; no new training is required.
- Measurability: potential knowledge gaps become visible and can thus be optimised in a targeted manner.
In short: A well-maintained knowledge base forms the most important foundation for a reliable AI chatbot in a business setting. It ensures quality and trustworthiness in automated customer communication!
What types of content are stored in the Knowledge Base?
In the Knowledge Base, content types such as PDFs, web pages and CSV files can be easily uploaded so that the AI agent can provide the correct information as output. The formats and their uses are summarised here:

PDFs are uploaded directly to the Knowledge Base and integrated without the need to transfer content manually. Text-based PDFs are preferable, as PDFs with a strong emphasis on layout — such as those containing columns or tables — can lead to faulty text extraction.
Websites can be integrated via their URL and automatically parsed. Existing FAQ pages or help centre content can thus be utilised directly without duplication of effort. Tip: When automatic updates are enabled, the Knowledge Base remains synchronised with the live site at all times. Unsuitable pages are those with extensive navigation, advertising or dynamically loaded JavaScript content. They often yield unusable scraping results. It is best to integrate simple, text-heavy pages and check the result.
Documents are formatted texts with paragraphs, headings, bullet points and tables. They are created and edited directly in the moinAI Hub and are ideal for content requiring detailed explanations, such as instructions and process descriptions. Please note: Where possible, create only one document per topic, clearly structured and without redundant content.
Question-answer pairs link a specific question to a structured answer. The AI recognises this direct relationship and thus provides particularly accurate results. It is crucial not to phrase questions too generally. The closer the stored question is to the users’ actual language, the better the match.
CSV files are primarily structured datasets such as product lists and price tables, and they represent large amounts of structured information for the AI. It can access them directly without any processing being required. However, clean data hygiene is a prerequisite here: missing defined headers, inconsistent column names or mixed data formats within a column can lead to interpretation errors.
By the way: external articles can also be added to the knowledge base, though the connection is made via an API!

Step by step: populating the knowledge base
1. Review of the current situation
Before allocating resources, it is worth taking a look at your own data: what questions do customers ask most frequently, and via which channels? These include live chat, email and telephone support. Existing support tickets and chat logs are the most valuable resources for internal analysis when it comes to identifying customer concerns. Your own FAQ page is also an important source of information.
The outcome of the analysis should be a list of priorities; ideally, this will identify the top 20 queries that account for the largest proportion of daily enquiry volume. These form the core of the knowledge base and ensure a high level of automation right from the start. In a second step, less frequent or more complex topics are then added.
2. Preparing content
When it comes to preparing content, there is a rule of thumb: AI is only as good as the content it is fed! This means that the information recorded in the knowledge base should be clearly written and structured. Quality takes precedence over quantity, and particular attention must be paid to the following:
- Be clear: Ambiguous phrasing confuses the AI. Short, direct sentences containing specific information yield the best results; lengthy product descriptions are problematic.
- Organise by topic: One document per topic is better than a single long document covering too many topics at once. A summary is automatically generated for each document resource, which the RAG agent then analyses. Based on this, it decides whether the full content should be used to generate a response. Documents that are vaguely worded or thematically overloaded are less likely to be correctly identified.
- Make targeted use of question-answer pairs: The most precise format for recurring standard questions is question-answer pairs. The closer the stored question is to the users’ actual language, the better the answer will fit. Phrases taken directly from chat logs are best suited here, not technical terms from the specialist field.
3. Quality assurance prior to publication
Before the AI chatbot goes live, the content stored in the system should be tested under real-world conditions. moinAI provides the AI Playground within the moinAI Hub for this purpose: here, real customer queries can be tested directly against the knowledge base, and the answers and sources used are thoroughly reviewed. Two settings in particular should be configured correctly before the go-live:
- Enable knowledge checking: This tightens the criteria for generating responses. The AI agent will only provide a response if the stored sources allow for a clear and verifiable answer. In critical cases, response generation is even deliberately suppressed to prevent hallucinations. Prevents hallucinations.
- Configure guardrails: Some basic security mechanisms, such as content restrictions, topic restrictions and protection against prompt injections, are enabled by default. In addition, competition protection can be optionally activated. This blocks enquiries about competing products or redirects them to your own offering. For particularly sensitive matters, a compliance check is also available to perform a second explicit review of all enquiries.
4. Ongoing maintenance
To ensure that the Knowledge Base does not contain outdated content or poor-quality articles, it must be kept up to date. However, ongoing maintenance is not solely a manual task; it can be automated using a number of features:
- Use automatic updates: In the Hub, you can set a fixed update interval; the options are 7, 14 or 30 days. For particularly volatile content such as prices, opening hours or availability, there is also a
- real-time update option: when set to ‘active’, the system checks with every request whether the stored content is still within the defined validity period. If necessary, the page is scraped again immediately. Please note that shorter intervals temporarily increase response times, but the trade-off is worth it!
- MCP Server for automated content management: Using the moinAI MCP Server, content can be retrieved, created, edited or deleted via command. This integrates the knowledge base into an existing AI environment – without the need for manual intervention in the hub. This is particularly advantageous when content needs to be synchronised regularly from external systems.
- Systematically closing knowledge gaps: The Hub features a ‘Missing Knowledge’ section. All enquiries for which the chatbot was unable to provide a suitable response are listed here. This may be due to a lack of resources on a particular topic or because the existing sources were insufficient. Clicking on the speech bubble opens the entire chat history, allowing you to review the list of all gaps. These customer insights are the most direct and valuable source for the continuous development of the knowledge base.
Technical criteria for optimal knowledge base performance
A clear data hierarchy and high-quality input data are essential for optimal knowledge base performance. As described earlier, content must be clearly categorised to avoid inconsistent results. Unstructured or outdated sources increase the error rate. In summary, AI needs the following to perform at its best:
- Consistency: No conflicting information across different documents
- Granularity: It is better to have several specific documents than one long, generic one
- Timeliness: Maintenance processes – i.e. a continuous monitoring infrastructure – are essential, as outdated content reduces the hit rate
- Language and tone: Content should reflect the language of the users.
- CSV structure: Header rows are mandatory and clean data hygiene is crucial.
- PDF quality: Text-based PDFs only (no scanned images without OCR).
With moinAI, changes can be made in real time without the need to retrain the model, ensuring consistently high-quality responses with minimal maintenance.
The RAG agent turns knowledge into answers
‘Retrieval’ means accessing information, ‘augmented’ means I’m expanding my knowledge, and ‘generate’ means I’m generating a response. This means that the AI at moinAI is able to say: I’ll look for additional information that the user has made available to me in the knowledge base, I’ll expand my knowledge with it, and then generate an answer based on the source and my knowledge. This makes the answer more precise, more reliable and much more informative for the user.
– Johannes Hehr, moinAI
RAG is a structured process for knowledge base management that offers a range of configuration options to control the responses of AI agents or generative AI. When a user asks a question, the AI agent searches the stored sources specifically for relevant content. The RAG system dynamically retrieves relevant information; the selection of sources takes place in two stages: first, the summary of a resource is checked; finally, only if relevance is identified is the full content used to generate the responses. No guesswork, just substantiated answers tailored to the query. Content-related guidelines for the agent’s response generation can be defined via instructions. Tone and communication guidelines, on the other hand, are controlled separately via policies and personas. The large language model is selected to match the best possible application-specific functionalities of the AI agents. The options include current models from the GPT series via Microsoft Azure as well as OpenWeights models.
AI actions for AI agents are particularly useful for resolving cases: they enable the agent to go beyond simply generating responses and ask follow-up questions automatically or trigger actions in third-party systems. Here is the RAG process at a glance:

Safety measures
The reliability of the extracted data is ensured by guardrails: firmly established safety mechanisms that reduce hallucinations and enforce topic restrictions. Guardrails actively block attempts at manipulation, such as the deliberate circumvention of safety rules or identity impersonation. They are always active and enhance security:
- No harmful or confidential content
- Topics are restricted to defined areas
- Hallucinations are reduced through the exclusive use of verified sources
- Spam or illegible requests are ignored
- Internal system information (e.g. prompts) cannot be accessed
- Identity impersonation (e.g. phishing) is prevented
- Offensive or abusive language is filtered
In addition, the knowledge check can be made more stringent: when enabled, the AI agent will only generate a response if the facts in the knowledge base are clear; in cases of uncertainty, it will leave the query unanswered. When competition protection is enabled, queries regarding rival products are either blocked or automatically converted into a query about the company’s own product. This ensures that the agent remains consistently focused on the company’s own brand.
Common mistakes – and how to avoid them
Common pitfalls with knowledge bases can be specifically avoided by taking the right steps. We’ll show you some typical mistakes and what you can do to ensure your knowledge base is set up correctly:
Data Protection & GDPR Compliance
The company retains full control over the data, as moinAI works exclusively with explicitly stored knowledge: there is no uncontrolled learning from external sources, and no sensitive information is passed on to third-party training datasets. All data is stored on servers in Germany and is SSL-encrypted. moinAI does not collect any personalised data without the user’s prior consent. Furthermore, manual deletion of conversation data is always possible upon request. Role-based access in the moinAI Hub allows for targeted control of internal data access.
Conclusion
To ensure the quality of the AI chatbot in use, the knowledge base must be structured as a solid foundation and continuously updated. A well-organised knowledge base improves automation rates and boosts customer satisfaction in the long term. What makes moinAI special: the AI works exclusively with the knowledge that you and your company provide. The processing and use of this data are transparent and GDPR-compliant. After all, both companies and users must be able to trust the AI for its deployment to be successful and for the full potential of automation to be realised.
[[CTA headline="Fewer support tickets, better answers!" subline="Try out how a chatbot works on your website, with no obligation." placeholder="Insert your company URL here" button="Try it now!"]]




