01 Mar 2026

feedPlanet Grep

Sven Vermeulen: Embeddings in RAG

When I started looking into the architecture of Large Language Models (LLMs), I got confused when I encountered Retrieval Augmented Generation (RAG). Both LLMs themselves and RAG use embeddings (a numerical vector representation of a token) and through its shared terminology, I made the wrong assumption that the embeddings in both are strongly related. It is in fact much simpler, and while both use embeddings, they are unrelated to each other.

Note: I'm still dipping my toes into the world of LLMs (and other generative AI, like diffusion-models for image generation), so my posts might be inaccurate. I welcome any feedback or comments on this.

Embeddings in a large language model

LLMs are trained to predict text given a certain input. The text that is predicted are so-called tokens, small text snippets. These are then added to the input text, and the LLM again predicts the next token, moving forward until it predicted a special token that indicates the end of a text sequence.

Simple view of LLM

Suppose the text at that point is the following:

Two roads diverged in a yellow wood,
and sorry I

You might know this as the start of "The Road Not Taken", a poem by Robert Frost. If the LLM is trained with this poem, it might be able to predict the next tokens. When I ran this as input through Qwen3-VL 8B, one of the more recent open-weights model released by the Qwen team at Alibaba Cloud, it was able to generate parts of the poem further, but eventually strayed off course.

Two roads diverged in a yellow wood,
and sorry I couldn't travel both
and be one traveler, long I stood.
and looked down one as far as I could
to where it bent in the undergrowth;
Then took the other, as just as fair,
and having perhaps a better claim,
because it was grassy and wanted wear;
though as for that the passing there
had worn them really about the same,

And both that morning equally lay
in leaves no step had trodden black.
And both … The question is - which way does he take? It's not clear. He says "I took the other", 
but then says "the passing there had worn them really about the same". So why did he choose one 
over the other? Is it a matter of chance? Or is there something more symbolic going on?

Note: There is some randomness involved here, other iterations with the same model and input did result in the poem being quoted correctly, followed by an analysis of the poem.

While generating the output, the model generates one part of text at a time. This part of text is called a token, and the LLM has a built-in tokenizer that converts text into tokens, and tokens back into text. For the Qwen3 models, the Qwen tokenizer is used. If I understand its vocabulary correctly, the text "couldn't travel" would be tokenized into:

[ "couldn", "'t", " ", "travel" ]

Different LLMs can use different tokenization methods, but there is a lot of re-use here. Different LLM models can use the same tokenizer.

These tokens are converted into embeddings, which form the foundational representation for use in LLMs. They are numerical vectors that represent those text tokens. LLMs work with these numerical vectors: LLMs (and AI in general) are software systems that perform heavy computational operations, performing many matrix operations with each matrix being a massive set of numbers. Well, text is represented as a huge matrix.

Embeddings are not just a simple index, but are pretrained values. These values enable token mapping based on semantic similarity. When the training material often combines "corona" and "COVID", then these two will have embeddings that allow both terms to be seen as close to each other. But the same is true if there is material combining "corona" and "beer". So the embedding that represents "corona" (assuming it is a single token) would have semantic understanding of both corona being a viral disease (related to COVID-19) as well as an alcoholic beverage.

Unlike tokenizers, which can be reused across different LLM models, the embeddings are unique to each model. Sure, within the same family (e.g. Qwen3) there can be reuse as well, but it is much less common to see this re-use across different families.

The phrase "Two roads" would consist of three tokens ("Two", " ", "roads"), which are converted into a corresponding 4096-dimensional embedding vector during processing. The dimension is fixed for a particular LLM: Qwen3 8B for instance uses embeddings of 4096 numbers. So that start would be a matrix with dimensions 3x4096. The entire text itself thus would be represented by a very large matrix, with one dimension being this embedding size (4096 in my case), the other dimension being the amount of tokens already used as text (both input and generated output).

These matrices are then used as input within the LLM, which then starts doing magic with them (well, not really magic, it's rather maths, multiplying the matrix against other in-LLM stored matrices, iterating over multiple blocks of matrix operations, etc.) to eventually output a (sequence of) embedding(s), which is appended to the input matrix to re-iterate the entire process over and over again.

Embedding-based view of LLM

The maximum amount of tokens that a model can handle is also predefined, although there are methods to extend this. For Qwen3 8B, this is 32768 natively, and 131072 with an extension method called YaRN. So, for the native implementation, that means the maximum text size would be represented as a matrix of dimensions 32768x4096.

Retrieval Augmented Generation

LLMs are trained with a certain set of data, so once it is finished training, it does not have the ability to learn more. To make it more useful, you want the LLM to have access to recent insights. Nowadays, the hype is all about MCP (Model Context Protocol), which is having LLMs trained to understand that they have tools at their disposal, and know how to call these tools (well, in reality, they are trained to generate output that the software which executes the LLM detects, makes a tool output, and adds the outcome of that tool back to the text already generated, allowing the LLM to continue).

Before MCP the world was (and still is) using Retrieval Augmented Generation (RAG). The idea behind RAG is that, before the LLM responds to a user's query (prompt) it also receives new information from external data sources. With both the user query and information from the sources, the LLM is able to generate more useful output.

When I looked at RAG, I noticed it using embeddings as well prior to the actual retrieval, so I wrongfully thought that those are the same embeddings, and that the outcome of the RAG would be an embedding matrix as well, that the LLM then receives and further processes...

Incorrect RAG view

I was misled by documentation on RAGs indicated things like "the data to be referenced is converted into LLM embeddings", and that the technology used for RAG retrieval are vector databases specialized for embedding-based operations. Many online resources also looked at RAG as a complete, singular solution with multiple components. So I jumped into conclusion that these are the same embeddings. But then, that would mean the RAG solution would be tailored to the LLM being used, because other LLM models (like Llama3, or Mistral) use different embedding vocabulary.

Instead, what RAG does, is take the same prompt, convert it into tokens and embeddings (using its own tokenizer/embedding vocabulary) and then uses that to perform a search operation against the data that is added to the RAG database. This data (which is the recent insights or other documents you want your LLM to know about) is also tokenized and converted into embeddings, but it is not those embeddings that are brought back to the main LLM, but the plain text outcome (or other media types that your LLM understands, such as images).

Why does RAG then use embeddings? Wouldn't a simple search engine be sufficient? Well, the RAG's primary advantage is its ability to locate relevant information more effectively through embeddings. Thanks to the embedding representation, the RAG can find information that is related to the user query without relying on keyword matches. You could effectively replace the RAG engine with a simple search - and many LLM-powered software applications do support this. For instance, Koboldcpp which I use to run LLM locally, supports a simple DuckDuckGo-based websearch as well.

RAG view

The use of embeddings for search operations (again, completely independent of the LLM) allows for contextual understanding. When a user prompts for "What are the ingredients for Corona", a simple keyword-based search operation might incorrectly result in findings of COVID-19, whereas in this case the query is about the Corona beer.

These improved search operations are often called "semantic search", as they have a better understanding of the semantics and meanings of text (through the embeddings), resulting in more contextually relevant insights.

When is it "RAG" and when semantic search

Retrieval Augmented Generation is the process of converting the user query, performing a semantic search against the knowledge base, and appending the best results (e.g. top-3 hits in the knowledge base) to the user input text. This completed input text thus contains both the user query, as well as pieces of insights obtained from the semantic search. The LLM uses this additional information for generating better outcomes. This entire pipeline (retrieving context, augmenting the prompt, and then generating output) is what defines "RAG".

I personally see RAG technology-wise being very similar to a regular search: replace the semantic search with a search engine (which underlyingly could also use semantic search anyway) and the outcome is the same. The main difference is that RAG is meant for finding exact truth, information snippets tailored to bring context information accurately, whereas a search engine based retrieval would rather bring snippets of data back.

In the market, RAG also focuses on the management of the semantic search (and vector database), optimizing the data that is added to the knowledge base to be LLM-friendly (shorter pieces of accurate data, rather than fully-indexed complete pages which could easily overload the maximum size that an LLM can handle). It prioritizes efficient data management and insights lifecycle control.

For LLMs, it also provides a bit more nuance. A web search would be presented to the LLM as "The following information can be useful to answer the question", whereas RAG results would be presented as actual insights/context. LLMs might be trained to deal differently with that distinction.

Understanding that the semantic search is independent of the LLM of course makes much more sense. It allows companies or organizations to build up a knowledge base and maintain this knowledge independent of the LLMs. Multiple different LLMs can then use RAG to obtain the latest information from this knowledge base - or you can just use the engine for semantic searches alone, you do not need LLMs to get beneficial searches. Many popular web search engines use semantic search underlyingly (i.e. when they index pages, they also generate the embeddings from it and store those in their own vector databases to improve search results).

When new embedding algorithms emerge that you want to use, you must re-generate the embeddings for the entire knowledge base. But that will most likely occur much, much less frequently than using new LLM models (given the rapid evolution here).

Conclusion

RAG is a feature of the software that runs the LLM, allowing for retrieving contextual information from a curated knowledge base. RAG's use of embeddings is related to its semantic search, not to the same embeddings as those used by the LLM. The contextual information is added to the user prompt as text, and only then 'converted' into the embeddings used by the LLM itself.

Feedback? Comments? Don't hesitate to get in touch on Mastodon.

Images are created in Inkscape, using icons from Streamline (GitHub), released under the CC BY 4.0 license, indexed at OpenSVG.

01 Mar 2026 4:07pm GMT

Frederic Descamps: A Friendly Reset: Understanding the MariaDB Foundation’s Role

Over the last few days, I've seen more than a few comments and questions about what the MariaDB Foundation is supposed to do - and what it should do when something in the wider MariaDB ecosystem changes. I am referring here to what I call the "Galera Case". Instead of replying to every thread with […]

01 Mar 2026 4:07pm GMT

Frank Goossens: Is programmeren echt zo moeilijk dan?

Sinds begin dit jaar probeer ik als newbie coach te helpen bij CoderDojo Genk. Ik moet eerlijk zijn, ik ken nog niet écht veel van Scratch en nog veel minder van micro:bit, maar ik ga ervan uit dat dat wel zal komen. Soit, afgelopen weekend vertelde de moeder van 2 deelnemers me dat programmeren haar zo moeilijk lijkt. In de chaos van het moment heb ik daar amper op geantwoord…

Source

01 Mar 2026 4:07pm GMT

feedPlanet Lisp

Paolo Amoroso: Rearranging the File Browser menu for Insphex

Insphex adds the Hexdump item to the File Browser menu to view the hex dump of the selected files. The initial implementation called the public API for adding commands at the top level of the menu.

To later move the item to the See sumbenu that groups various file viewing commands I resorted to list surgery, as the API doesn't support submenus. The problem is internal system details can and do change, which happened to the File Browser menu and led to an Insphex load error.

I fixed the issue by reverting the public API call and now the item is back at the top level of the menu.

Insphex is a hex dump tool similar to the Linux command hexdump. I wrote it in Common Lisp on Medley Interlisp.

#insphex #CommonLisp #Interlisp #Lisp

Discuss... Email | Reply @amoroso@oldbytes.space

01 Mar 2026 9:16am GMT

feedPlanet Debian

Junichi Uekawa: The next Debconf happens in Japan.

The next Debconf happens in Japan. Great news. Feels like we came a long way, but I didn't personally do much, I just made the first moves.

01 Mar 2026 4:16am GMT

28 Feb 2026

feedPlanet Debian

Mike Gabriel: Debian Lomiri Tablets 2025-2027 - Project Report (Q3/2025)

Debian Lomiri for Debian 13 (previous project)

In our previous project around Debian and Lomiri (lasting until July 2025), we achieved to get Lomiri 0.5.0 (and with it another 130 packages) into Debian (with two minor exceptions [1]) just in time for the Debian 13 release in August 2025.

Debian Lomiri for Debian 14

At DebConf in Brest, a follow-up project has been designed between the project sponsor and Fre(i)e Software GmbH [2]. The new project (on paper) started on 1st August 2025 and project duration was agreed on to be 2 years, allowing our company to work with an equivalent of ~5 FTE on Lomiri targetting the Debian 14 release some time in the second half of 2027 (an assumed date, let's see what happens).

Ongoing work would be covered from day one of the new project and once all contract details had been properly put on paper end of September, Fre(i)e Software GmbH started hiring a new team of software developers and (future) Debian maintainers. (More of that new team in our next Q4/2025 report).

The ongoing work of Q3/2025 was basically Guido Berhörster and myself working on Morph Browser Qt6 (mostly Guido together with Bhushan from MiraLab [3]) and package maintenance in Debian (mostly me).

Morph Browser Qt6

The first milestone we could reach with the Qt6 porting of Morph Browser [4] and related components (LUITK aka lomiri-ui-toolkit (big chunk! [5]), lomiri-content-hub, lomiri-download-manager and a few other components) was reached on 21st Sep 2025 with an upload of Morph Browser 1.2.0~git20250813.1ca2aa7+dfsg-1~exp1 to Debian experimental and the Lomiri PPA [6]).

Preparation of Debian 13 Updates (still pending)

In background, various Lomiri updates for Debian 13 have been prepared during Q3/2025 (with a huge patchset), but publishing those to Debian 13 are still pending as tests are still not satisfying.

[1] lomiri-push-service and nuntium
[2] https://freiesoftware.gmbh
[3] https://miralab.one/
[4] https://gitlab.com/ubports/development/core/morph-browser/-/merge_reques... et al.
[5] https://gitlab.com/ubports/development/core/lomiri-ui-toolkit/-/merge_re... et al.
[6] https://launchpad.net/~lomiri

28 Feb 2026 6:36pm GMT

feedPlanet Lisp

Neil Munro: Ningle Tutorial 15: Pagination, Part 2

Contents

Introduction

Welcome back! We will be revisiting the pagination from last time, however we are going to try and make this easier on ourselves, I built a package for pagination mito-pager, the idea is that much of what we looked at in the last lesson was very boiler plate and repetitive so we should look at removing this.

I will say, my mito-pager can do a little more than just what I show here, it has two modes, you can use paginate-dao (named this way so that it is familiar to mito) to paginate over simple models, however, if you need to perform complex queries there is a macro with-pager that you can use to paginate. It is this second form we will use in this tutorial.

There is one thing to bear in mind, when using mito-pager, you must implement your data retrieval functions in such a way to return a values object, as mito-pager relies on this to work.

I encourge you to try the library out in other use-cases and, of course, if you have ideas, please let me know.

Changes

Most of our changes are quite limited in scope, really it's just our controllers and models that need most of the edits.

ningle-tutorial-project.asd

We need to add the mito-pager package to our project asd file.

- :ningle-auth)
+ :ningle-auth
+ :mito-pager)

src/controllers.lisp

Here is the real payoff! I almost dreaded writing the sheer volume of the change but then realised it's so simple, we only need to change our index function, and it may be better to delete it all and write our new simplified version.

(defun index (params)
  (let* ((user (gethash :user ningle:*session*))
         (req-page (or (parse-integer (or (ingle:get-param "page" params) "1") :junk-allowed t) 1))
         (req-limit (or (parse-integer (or (ingle:get-param "limit" params) "50") :junk-allowed t) 50)))
    (flet ((get-posts (limit offset) (ningle-tutorial-project/models:posts user :offset offset :limit limit)))
      (mito-pager:with-pager ((posts pager #'get-posts :page req-page :limit req-limit))
        (djula:render-template* "main/index.html" nil :title "Home" :user user :posts posts :pager pager)))))

This is much nicer, and in my opinion, the controller should be this simple.

src/main.lisp

We need to ensure we include the templates from mito-pager, this is a simple one line change.

 (defun start (&key (server :woo) (address "127.0.0.1") (port 8000))
    (djula:add-template-directory (asdf:system-relative-pathname :ningle-tutorial-project "src/templates/"))
+   (djula:add-template-directory (asdf:system-relative-pathname :mito-pager "src/templates/"))

src/models.lisp

As mentioned at the top of this tutorial, we have to implement our data retrieval functions in a certain way. While there are some changes here, we ultimately end up with less code.

We can start by removing the count parameter, we wont be needing it in this implementation, and since we don't need the count parameter anymore, the :around method can go too!

- (defgeneric posts (user &key offset limit count)
+ (defgeneric posts (user &key offset limit)
-
- (defmethod posts :around (user &key (offset 0) (limit 50) &allow-other-keys)
-   (let ((count (mito:count-dao 'post))
-         (offset (max 0 offset))
-         (limit (max 1 limit)))
-     (if (and (> count 0) (>= offset count))
-       (let* ((page-count (max 1 (ceiling count limit)))
-              (corrected-offset (* (1- page-count) limit)))
-         (posts user :offset corrected-offset :limit limit))
-       (call-next-method user :offset offset :limit limit :count count))))

There's two methods to look at, the first is when the type of user is user:

-
- (defmethod posts ((user user) &key offset limit count)
+ (defmethod posts ((user user) &key offset limit)
...
      (values
-         (mito:retrieve-by-sql sql :binds params)
-         count
-         offset)))
+         (mito:retrieve-by-sql sql :binds params)
+         (mito:count-dao 'post))))

The second is when the type of user is null:

-
- (defmethod posts ((user null) &key offset limit count)
+ (defmethod posts ((user null) &key offset limit)
...
    (values
-       (mito:retrieve-by-sql sql)
-       count
-       offset)))
+       (mito:retrieve-by-sql sql)
+       (mito:count-dao 'post))))

As you can see, all we are really doing is relying on mito to do the lions share of the work, right down to the count.

src/templates/main/index.html

The change here is quite simple, all we need to do is to change the path to the partial, we need to simply point to the partial provided by mito-pager.


- {% include "partials/pager.html" with url="/" title="Posts" %}
+ {% include "mito-pager/partials/pager.html" with url="/" title="Posts" %}

src/templates/partials/pagination.html

This one is easy, we can delete it! mito-pager provides its own template, and while you can override it (if you so wish), in this tutorial we do not need it anymore.

Conclusion

I hope you will agree that this time, using a prebuilt package takes a lot of the pain out of pagination. I don't like to dictate what developers should, or shouldn't use, so that's why last time you were given the same information I had, so if you wish to build your own library, you can, or if you want to focus on getting things done, you are more than welcome to use mine, and of course, if you find issues please do let me know!

Learning Outcomes

Level Learning Outcome
Understand Understand how third-party pagination libraries like mito-pager abstract boilerplate pagination logic, and how with-pager expects a fetch function returning (values items count) to handle page clamping, offset calculation, and boundary correction automatically.
Apply Apply flet to define a local adapter function that bridges the project's posts generic function with mito-pager's expected (lambda (limit offset) ...) interface, and use with-pager to reduce controller complexity to its essential logic.
Analyse Analyse what responsibilities were transferred from the manual pagination implementation to mito-pager - count caching, boundary checking, offset calculation, page correction, and range generation - contrasting the complexity of both approaches.
Create Refactor a manual pagination implementation to use mito-pager by simplifying model methods to return (values items count), replacing complex multi-step controller calculations with with-pager, and delegating the pagination template partial to the library.

Github

Common Lisp HyperSpec

Symbol Type Why it appears in this lesson CLHS
defpackage Macro Define project packages like ningle-tutorial-project/models, /forms, /controllers. http://www.lispworks.com/documentation/HyperSpec/Body/m_defpac.htm
in-package Macro Enter each package before defining models, controllers, and functions. http://www.lispworks.com/documentation/HyperSpec/Body/m_in_pkg.htm
defgeneric Macro Define the simplified generic posts function signature with keyword parameters offset and limit (the count parameter is removed). http://www.lispworks.com/documentation/HyperSpec/Body/m_defgen.htm
defmethod Macro Implement the simplified posts methods for user and null types (the :around validation method is removed). http://www.lispworks.com/documentation/HyperSpec/Body/m_defmet.htm
flet Special Operator Define the local get-posts adapter function that wraps posts to match mito-pager's expected (lambda (limit offset) ...) interface. http://www.lispworks.com/documentation/HyperSpec/Body/s_flet_.htm
let* Special Operator Sequentially bind user, req-page, and req-limit in the controller where each value is used in subsequent bindings. http://www.lispworks.com/documentation/HyperSpec/Body/s_let_l.htm
or Macro Provide fallback values when parsing page and limit parameters, defaulting to 1 and 50 respectively. http://www.lispworks.com/documentation/HyperSpec/Body/m_or.htm
multiple-value-bind Macro Capture the SQL string and bind parameters returned by sxql:yield in the model methods. http://www.lispworks.com/documentation/HyperSpec/Body/m_multip.htm
values Function Return two values from posts methods - the list of results and the total count - as required by mito-pager:with-pager. http://www.lispworks.com/documentation/HyperSpec/Body/a_values.htm
parse-integer Function Convert string query parameters ("1", "50") to integers, with :junk-allowed t for safe parsing. http://www.lispworks.com/documentation/HyperSpec/Body/f_parse_.htm

28 Feb 2026 8:00am GMT

feedPlanet Debian

Daniel Baumann: Debian Fast Forward: An alternative backports repository

The Debian project releases a new stable version of its Linux distribution approximately every two years. During its life time, a stable release usually gets security updates only, but in general no feature updates.

For some packages it is desirable to get feature updates earlier than with the next stable release. Some new packages included in Debian after the initial release of a stable distribution are desirable for stable too.

Both use-cases can be solved by recompiling the newer version of a package from testing/unstable on stable (aka backporting). Packages are backported together with only the minimal amount of required build-depends or depends not already fulfilled in stable (if any), and without any changes unless required to fix building on stable (if needed).

There are official Debian Backports available, as well as several well-known unofficial backports repositories. I have been involved in one of these unofficial repositories since 2005 which subsequently turned 2010 into its own Debian derivative, mixing both backports and modified packages in one repository for simplicity.

Starting with the Debian 13 (trixie) release, the (otherwise unmodified) backports of this derivative have been split out from the derivative distribution into a separate repository. This way the backports are more accessible and useful for all interested Debian users too.

TL;DR: Debian Fast Forward - https://fastforward.debian.net

  • is an alternative Debian repository containing complementary backports from testing/unstable to stable

  • with packages organized in an opinionated, self-contained selection of coherent sets

  • supporting amd64, i386, and arm64 architectures

  • containing around 400 packages in trixie-fastforward-backports

  • with 1'800 uploads since July 2025

End user documentation about how to enable Debian Fast Forwards is available.

Have fun!

28 Feb 2026 12:00am GMT

16 Feb 2026

feedPlanet Lisp

Joe Marshall: binary-compose-left and binary-compose-right

If you have a unary function F, you can compose it with function G, H = F ∘ G, which means H(x) = F(G(x)). Instead of running x through F directly, you run it through G first and then run the output of G through F.

If F is a binary function, then you either compose it with a unary function G on the left input: H = F ∘left G, which means H(x, y) = F(G(x), y) or you compose it with a unary function G on the right input: H = F ∘right G, which means H(x, y) = F(x, G(y)).

(binary-compose-left f g)  = (λ (x y) (f (g x) y))
(binary-compose-right f g) = (λ (x y) (f x (g y)))

We could extend this to trinary functions and beyond, but it is less common to want to compose functions with more than two inputs.

binary-compose-right comes in handy when combined with fold-left. This identity holds

 (fold-left (binary-compose-right f g) acc lst) <=>
   (fold-left f acc (map g lst))

but the right-hand side is less efficient because it requires an extra pass through the list to map g over it before folding. The left-hand side is more efficient because it composes g with f on the fly as it folds, so it only requires one pass through the list.

16 Feb 2026 9:35pm GMT

29 Jan 2026

feedFOSDEM 2026

Join the FOSDEM Treasure Hunt!

Are you ready for another challenge? We're excited to host the second yearly edition of our treasure hunt at FOSDEM! Participants must solve five sequential challenges to uncover the final answer. Update: the treasure hunt has been successfully solved by multiple participants, and the main prizes have now been claimed. But the fun doesn't stop here. If you still manage to find the correct final answer and go to Infodesk K, you will receive a small consolation prize as a reward for your effort. If you're still looking for a challenge, the 2025 treasure hunt is still unsolved, so舰

29 Jan 2026 11:00pm GMT

26 Jan 2026

feedFOSDEM 2026

Guided sightseeing tours

If your non-geek partner and/or kids are joining you to FOSDEM, they may be interested in spending some time exploring Brussels while you attend the conference. Like previous years, FOSDEM is organising sightseeing tours.

26 Jan 2026 11:00pm GMT

Call for volunteers

With FOSDEM just a few days away, it is time for us to enlist your help. Every year, an enthusiastic band of volunteers make FOSDEM happen and make it a fun and safe place for all our attendees. We could not do this without you. This year we again need as many hands as possible, especially for heralding during the conference, during the buildup (starting Friday at noon) and teardown (Sunday evening). No need to worry about missing lunch at the weekend, food will be provided. Would you like to be part of the team that makes FOSDEM tick?舰

26 Jan 2026 11:00pm GMT