Home

A land too poor to tax

Details: 03 March 2022

Hello all.

This is a public notebook where I put my ideas, explanations, illustrations and uncategorised things I do in the subjects I find interesting. It is called after the fictional city where the story of Lanark, written by Alasdair Gray, was set. It also means "a land to poor to tax", a slogan I find very appealing. It is not that I expect this to be a barren place, but a place that is already given away to the reader, for which the author do not expect any taxable income. My previous experience of blogging before warns me about anxiously keeping and growing an audience, but I will still try to keep everything readable and reasonably engaging, out of respect for anybody popping by. My approach to this site will be to keep things I do not want to forget, as well as things people might conceivably find useful / interesting.

I talk about Glasgow, migration, community organising, food growing, climate catastrophe. In my sister site UNTHINK I will share my mathematical musings and bits and pieces of my ongoing projects.

Lattice of sequences generated by applying all selective erasers to that long name of a Welsh town

Subspace lattices and representing text

Details: 15 February 2023

The everyday concept of relevance to a query (nevermind the technical one logicians use) is a complex one. An interesting way of produce a mathematically treatable way of defining relevance to a query is stating that when a document is relevant to a query, then [the document implies the query]. Since logic rules have proven to be very restrictive for retrieval (think of AND, OR and NOT in early internet search engines) so probability has been introduced as a way of making it work. Uncertain implication, a concept coined for this approach to Information Retrieval: finding a way of quantifying the probability that a document implies a query.

Natural language, however, is vast, subtle and complex, which means that what is learned from a small (or even large) sample of what has been written can be difficult to generalise to the vast universe of what could be written. The most successful answer to that have been to create vector representations of words and documents, using techniques from linear algebra to distill the valuable information by reducing dimensionality in clever ways. This is called word embedding: representing word occurrences, events living in a high-dimensional space, into a lower dimensional space. Current techniques are based on an assortment similarity measures between vectors and the odd linear (or non-linear) transformation, but little more. There is more to vector spaces than that. Vector spaces can accomodate, as Quantum Theory can show us, logics. As boolean logics can arise from sets, a differnt kind of logics arises from vector spaces.

During my PhD studies I tried to develop a way of doing it slightly outside of the classic set-theory-based probability-space usual framework to connect logics to probabilities, and go for linear spaces instead of sets. This is the basic trick that allows probabilities into the slightly twisted logics of Quantum measurements. So I defined operations on text that resemble Quantum measurements in the right (mathematical) aspects: Selective Erasers (SEs).

When we are looking for something, we focus our attention on particular features, and when we detect them, the area around them becomes clearer, at the expense of the rest. The Selective Eraser SE(t,w) finds a central term t, keeps a window of w terms to the left and right of it, but erase everything else.

This operation has two important properties: First, it's idempotent: applying it twice, or any number of times, is equivalent to applying it once. Second, sometimes, for a particular piece of text, one SE implies another: it keeps everything the other keeps. This implication is not in quite the same realm as a document-implies-query implication, but if the success of language models is anything to go by, these kind of relations can be taken from words to concepts effectively. A particular text, in fact, will have a particular structure of implications between SEs. These define a partial order, so SEs acting on a text are a partially ordered set, or poset. Grouping SEs in classes of equivalence of those preserving the same portions of the text we get an even more interesting structure: a lattice (a poset with a supremum consisting of operations preserving all the text, and an infimum consisting of operations erasing all the text). Erasers with width 0, preserving only a central term present in the text, will appear in the lattice as the atoms, elemtns right above the infimum.

All posible deletions of terms also generate a similar lattice, with much more nodes and edges. Here, as an illustration, are the lattice for all deletions and for only SEs in a text with 4 terms, called, for generality "ABCD":

Lattices carry information about the sequence: with only the atoms and the shape of the lattice, it is possible to infer the sequence. In this case, for example: In the graph, the distance between atoms reflects the distance between occurrences of terms in the text: terms that appear together will be directly connected to a common element above, and terms that do not appear together will be connected to a common element above through intermediate steps (as many as terms between them). When a text occurs several times, the lattice will have even less order relations:

In a particular document, SEs centred on a word that is not in the document would be mapped into the infimum (the lowest element) while there will be wide erasers preserving the whole document which would be mapped into the maximum (the highest element).

We can also take a set of documents, and map SEs as they behave on the whole set with a stringent criterion: only relations that hold for all the documents in the set will be counted in the set SE lattice. For example, let's take the set of sequences:

\[\{ abcd, abce, abde, acde, bcde \} \]

the lattices for the whole set of documents will look as follows:

Now, how can all this be used for retrieval, or any other natural language applications? The answer is embedding. Embedding is the representation of words, or any feature that would require a very high-dimensional space in a much smaller space. A representation of a document as a word vector, for example, would take a space of hundreds of thousands of dimensions to be represented, since that is the number of words in a given language. Many techniques distill the meaningful contents of this high-dimensional space into lower-dimensional spaces, often highlighting significant relations between words.

Word embeddings usually map words to vectors in a vector space, but here I intend to introduce a new and potentially richer idea: map relations between words into relations between subspaces of a vector space.

A subspace spanned by vectors

\[\{a,b\}\]

is said to be contained in a space spanned by

\[\{a,b,c\}\]

and this represent an order relation

\[\{a,b\} \subset \{a,b,c\}\]

. This might look like simple set relationships, but they imply relationships like

\[\{(\alpha\ a + \beta\ b), (\beta\ a - \alpha\ b) \} \subset \{a,b,c\}\]

A set of n vectors {v_i} is said to span a vector space, which contains all possible linear combinations of them

\[\sum_i c_i v_i\]

where c_iare coefficients from a number field (real, complex, etc.). We can represent a subspace s with a projector

\[\Pi_i\]

, which leaves the parts of vectors lying in the particular subspace unchanged, and anihilates the rest.