Semantic Folding – From Natural Language Processing to Language Intelligence
Semantic Folding Theory is an attempt to develop an alternative computational approach for the processing of language data. Nearly all current methods of natural language understanding use, in some form or other, statistical models to assess the meaning of text and rely on the use of “brute force” over large quantities of sample data. In contrast, Semantic Folding uses a neuroscience-rooted mechanism of distributional semantics that solves both the “Representational Problem” and the “Semantic Grounding Problem”, both well known by AI researchers since the 1980’s.
Francisco De Sousa Webber, co-founder of Cortical.io, has developed the theory of Semantic Folding, which is presented in a recently published white paper. It builds on the Hierarchical Temporal Memory (HTM) theory by Jeff Hawkins and describes the encoding mechanism that converts semantic input data into a valid Sparse Distributed Representation (SDR) format.
Douglas R. Hofstadter’s Analogy as the Core of Cognition also inspired the Semantic Folding approach, which uses similarity as a foundation for intelligence. Hofstadter hypothesizes that the brain makes sense of the world by building, identifying and applying analogies. In order to be compared, all input data must be presented to the neo-cortex as a representation that is suited for the application of a distance measure. Semantic Folding applies this assumption to the computation of natural language: by converting words, sentences and whole texts into a Sparse Distributed Representational format (SDR), their semantic meaning can be directly inferred by their relative distances in the applied semantic space.
After capturing a given semantic universe of a reference set of documents by means of a fully unsupervised mechanism, the resulting semantic space is folded into each and every word-representation vector. These word-vectors, called semantic fingerprints, are large, sparsely filled binary vectors. Every feature bit in this vector not only corresponds to but also equals a specific semantic feature of the folded-in semantic space and by this means provides semantic grounding.
The main advantage of using the SDR format is that it allows any data items to be directly compared. In fact, it turns out that by applying Boolean operators and a similarity function, even complex Natural Language Processing operations can be implemented in a very simple and efficient way: each operation is executed in a single step and takes the same, standard amount of time. Because of their small size, semantic fingerprints require only 1/10th of the memory usually required to perform complex NLP operations, which means that execution on modern superscalar CPUs can be orders of magnitudes faster. Word-SDRs also offer an elegant way to feed natural language into HTM networks and to build on their predictive modeling capacity to develop truly intelligent applications for sentiment analysis, semantic search or conversational dialogue systems.
Because of the unique attributes of its underlying technology, Semantic Folding solves a number of well-known NLP challenges:
- Vocabulary mismatch: text comparisons are inherently semantic, based on the topological representation of its 16,000 semantic features.
- Language ambiguity: the meaning of text is implicitly disambiguated during the aggregation of its constituent word-fingerprints.
- Time to market: Semantic Folding is accessible through the Retina API, which offers atomic building blocks for a wide range of NLP solutions. The unsupervised training process enables easy adaptation to specific tasks and domains.
- Black box effects: with Semantic Fingerprints, every single feature has concrete observable semantics. This unique characteristic enables interactive “debugging” of semantic solutions.
- Solution scalability: use-case specific semantic spaces enable scaling of a solution across customers and domains with minimum effort. As the representation of meaning in semantic fingerprints is stable across languages, text in in different languages can be directly compared, without translation.