Securing RAG Endpoints with JWT Authentication in ASP.NET Core

Because I would be deploying my RAG application along with my website, I decided to secure my embedding and chat endpoints. Yes, it is selfish but I am writing all these first for myself :). To keep things simple and local, I chose to use JWT tokens for authentication. My approach uses in-memory token generation and validation—no external dependencies or persistent storage required. This is a solid starting point, and you can always enhance it later as your needs grow. ...

May 16, 2025 · 3 min · Taner

Building a Local RAG System: My Journey and How You Can Too

My Story: Why I Built a Local RAG System A few months ago, I found myself frustrated with the limitations and privacy concerns of cloud-based AI tools. I wanted to experiment with Retrieval-Augmented Generation (RAG) on my own terms—locally, with full control over my data and the ability to tinker under the hood. As a developer who loves open source, C#, and learning by doing, I decided to build my own local RAG system from scratch. ...

May 5, 2025 · 4 min · Taner

Choosing the Right Embedding Model for Your RAG Application

When building a Retrieval-Augmented Generation (RAG) application, selecting the right embedding model is crucial. After researching various models, I’ve summarized the key differences and use cases for two popular options: nomic-embed-text and all-minilm. Let’s dive in! Key Differences Between Nomic-embed-text and All-MiniLM 1. Architecture Nomic-embed-text: Optimized for handling large token context windows, making it suitable for both short and long text embeddings. All-MiniLM: Based on the MiniLM architecture, designed for sentence-level embeddings using self-supervised contrastive learning. 2. Performance Nomic-embed-text: Excels in semantic similarity tasks and produces high-quality embeddings for detailed documents. All-MiniLM: Offers faster inference speeds and is lightweight, making it ideal for real-time applications. 3. Use Cases Nomic-embed-text: Versatile and handles diverse text lengths, making it suitable for tasks like semantic search and clustering. All-MiniLM: Best for sentence-level tasks, such as paraphrase detection and short text similarity. Nomic-embed-text Use Cases Since nomic-embed-text is optimized for long text inputs and broad context windows, it’s ideal for applications requiring deep contextual understanding: ...

May 5, 2025 · 3 min · tc

Designing an Event-Driven RAG Application

I’ve been exploring ways to integrate Retrieval-Augmented Generation (RAG) into my website, and I wanted to design a small, event-driven app to learn more about it. My goal was to create something simple yet practical for my site. Below is a Mermaid diagram I created to model the flow of my application. It highlights the main components of an event-driven RAG system, covering content updates, user queries, and background indexing: ...

May 5, 2025 · 3 min · TC

Exploring Quantization Techniques for RAG Applications

Quantization has been on my mind lately as I explore ways to optimize my RAG (Retrieval-Augmented Generation) application. With so many options available, I wanted to break down the main techniques and share my thoughts on their strengths, trade-offs, and where they might fit best. Let’s dive in! 1. Scalar Quantization What It Is: Scalar quantization simplifies things by treating each component of a vector independently. For example, a 32-bit floating-point value can be mapped down to an 8-bit integer using a defined range and step-width. ...

May 5, 2025 · 4 min · tc

Fine-Tuning vs Retrieval-Augmented Generation (RAG)

When I first started exploring AI, I was eager to use my own data with large language models (LLMs). However, I faced a dilemma: should I fine-tune a model with my data or use Retrieval-Augmented Generation (RAG)? After diving into research, I discovered the strengths and challenges of each approach. Here’s what I learned: Fine-Tuning Fine-tuning involves retraining a pre-trained model on a specific dataset to adapt it to a particular domain or task. For example: ...

May 5, 2025 · 2 min · tc

Quantization in RAG Applications

I have been working on my RAG application and when it comes to using my data (although small), I was thinking about performance and size. I decided to check Quantization. Quantization can be a useful technique in my RAG (Retrieval-Augmented Generation) workflow, especially when dealing with high-dimensional embeddings. It essentially reduces the precision of embeddings—compressing them so that the memory footprint is lower and the similarity searches can be faster, all while preserving most of the semantic information. Let’s break down the concept and how I might integrate it into the app: ...

May 5, 2025 · 3 min · tc

What is Retrieval-Augmented Generation (RAG)?

As part of a small AI project, I wanted to dive deeper into Retrieval-Augmented Generation (RAG) to understand its potential. Below is a summary of what I learned and why I chose to use it for my website. What is RAG? RAG stands for Retrieval-Augmented Generation. It’s a method used in AI to enhance the way large language models generate responses by incorporating external information. Here’s how it works in simple terms: ...

May 5, 2025 · 2 min · TC

AI-102 Study Series Exercise 4: Retrieval Augmented Generation (RAG) Chat App

Overview This exercise demonstrates how to build a Retrieval Augmented Generation (RAG) chat application that integrates custom data sources into prompts for a generative AI model. The app is developed using Azure AI Foundry and deployed with Azure OpenAI and Azure AI Search. Steps & Configuration Details 1. Create an Azure AI Foundry Hub and Project Open Azure AI Foundry portal (https://ai.azure.com) and create a hub project. Configuration Items: Subscription: Your Azure subscription. Resource Group: Select or create a resource group. Hub Name: A valid name. Location: Example values → East US 2, Sweden Central (quota limits may require a different region). 2. Deploy Models Two models are required: ...

June 3, 2025 · 2 min · Taner

AI-102 Study Series Exercise 9: Retrieval Augmented Generation (RAG) with Azure OpenAI

Overview This exercise demonstrates how to implement Retrieval Augmented Generation (RAG) using Azure OpenAI Service and Azure AI Search. The goal is to enhance AI-generated responses by grounding them in custom data sources. Steps & Configuration Details 1. Provision Azure Resources To complete this exercise, you need: Azure OpenAI resource Azure AI Search resource Azure Storage Account resource Configuration Items: Azure OpenAI Resource: Subscription: Select an approved Azure subscription. Resource Group: Choose or create a resource group. Region: Choose from: East US East US 2 North Central US South Central US Sweden Central West US West US 3 Name: A unique name. Pricing Tier: Standard S0 Azure AI Search Resource: ...

June 7, 2025 · 3 min · Taner