The Xapagy architecture has been developed from scratch, rather than as an evolution of an existing model from artificial intelligence or cognitive science (but, naturally, building on the experience of these systems). Starting from scratch was motivated partially by the fact that the targeted behavior only partially overlaps with that of current cognitive architectures. The main reason, however, was that Xapagy starts with five differentiating design decisions (DD1..5). In the following, we list the design decisions and their implications for the architecture of the system. Not all decisions are unique for the Xapagy system, however all of them are controversial in the context of the current best practices of cognitive systems. Thus, we will also state the possible controversies these decisions can raise, as well as the intuitive justification which made us take these decisions despite their controversial nature. Naturally, the ultimate validation of a design decision is the performance of the implemented system.

DD-1: The autobiographical narrative is the only source of knowledge

 

Implications: This design decision states that an agent’s knowledge is exclusively acquired through its autobiography, the sequence of events which the agent directly or indirectly experienced. A natural way to provide this knowledge to the agent is by having an agent experience a synthetic autobiography, a sequence of stories in which the agent participates as a witness, an actor or a reader / listener. We will allow only a minimal level of knowledge to be hardwired (such as basic rules of topology, space, temporal succession and grouping). However, every concept and verb the agent knows will be acquired through stories in which these abstractions appear. These stories include first person experience, but also indirect narratives such as hearing or reading about the specific events. There is no requirement for the synthetic autobiography to be a sequence of carefully crafted learning experiences of increasing difficulty. Just as a human child is exposed to a view of the adult world which he or she only partially understands at first, the synthetic autobiography has relatively few constraints in the sequence of stories it shows, provided that they are representative for the future world in which the agent will operate and they provide sufficient coverage of the concepts which the agent needs to learn. The agent is the anchor point of the synthetic autobiography: a significant number of stories must be experienced directly (as an actor or an observer), and even indirect narratives are presented through the perception of the specific agent.

 

Controversies: Depending on how define the term “autobiography”, this design decision can range from trivial to highly controversial. If we accept training sets as autobiography, any supervised or unsupervised learning system verifies DD-1. Similarly, the ABox and TBox of an ontology can be defined as part of autobiography, thus allowing any ontological reasoning system to verify DD-1.

The way we interpret this definition, however, is narrower. The autobiography of a human child is a series of stories where the child is an observer and participant in the ordinary business of life, through its natural events. It is true, a human education does contain elements similar to supervised learning (e.g. learning the times table) or ontologies e.g. learning the classification of animals in the biology class). These, however are statistically insignificant parts of the human experience, constrained to very specific knowledge. Under this assumption, no current cognitive system uses a synthetic autobiography as the learning method of the agent (with a possible exception being the current direction of the Genesis system). Reinforcement learning systems, however, verify DD-1, although they are normally positioned as a way to acquire an optimal behavior, not as a way to acquire commonsense knowledge.

 

Intuitive justification: We believe that the commonsense knowledge of a human is acquired through his childhood experiences. Providing a similar experience to the agent in form of a synthetic autobiography should, in principle, provide it with equivalent commonsense knowledge. DD-1 does not exclude that the same knowledge can not be acquired in a different way, for instance, by knowledge engineering the rules [Lenat, 1990], parsing Wikipedia [Gabrilovich-2009-Wikipedia], or building a corpus from selected high confidence sources from the web [Ferrucci-2010-BuildingWatson].

DD-2: The autobiographical memory is the only memory model

 

Implications: This assumption states that the agent’s knowledge is stored in the form of a raw recording of the autobiographical information. The content of autobiographic memory is never extracted into general purpose rules: there is no learning, only a recording of the experiences. This also means that the system has no procedural or skill memory, no rule or production memory, and no separate concept memory (such as a concept hierarchy).

While DD-1 assumed that a synthetic autobiography is a possible way to provide the agent with its required knowledge, for a system which subscribes to DD-2, it is the only way. The lack of other memory mechanisms implies that we can not take a shortcut by entering knowledge in the form of general rules. Commonsense knowledge expressed in the form of commonsense rules or definitions (e.g. the form of in which they are represented in Cyc [Lenat-1990-Cyc]) are not directly usable for an agent subscribing to DD-2. We can, however, use such a knowledgebase to generate representative stories – this is not dissimilar to the way in which human instructors illustrate general concepts with examples.

While a system subscribing to DD-2 will not have a procedural memory, humans exhibit all the signs of having one. Steve Wozniak said that he will consider a robot to be intelligent when it can “go into an average American house and figure out how to make coffee, including identifying the coffee machine, figuring out what the buttons do, finding the coffee in the cabinet, etc.” (as cited in [Goertzel-Web-PartialProgress]). Despite not having an explicit procedural memory, the agent must be able to (a) perform the procedure when confronted with the situation and (b) describe the procedure if required. We argue that these abilities do not require a procedural memory: both activities can be performed using on-demand reasoning. Similar considerations apply to rule and concept memory.

 

Controversies: To our best knowledge, Xapagy is the only cognitive system which uses this design decision. In fact, the evolution of many cognitive architectures involves a gradual addition of new memory systems as the system is deployed over a wider range of applications. Autobiographical memory is typically one of the later additions to these systems (see e.g. Nuxoll et al. [Nuxoll-2007-EpisodicMemory] for Soar, Stracuzzi et al. [Stracuzzi-2009-IcarusReasoningOverTime] for ICARUS).

It is fit here to mention that, when referring to the Xapagy system, we shall prefer to use the term “autobiographical memory” rather than “episodic memory”. The latter is strongly associated with the work of Tulving [Tulving-1972-EpisodicAndSemantic]. However, in Tulving’s view episodic memory is a “recently evolved, late-developing […] past-oriented memory system” whose “operations require, but go beyond, the semantic memory system” [Tulving-2002-FromMindToBrain]. The autobiographical memory referred to by DD-2 is at odds with this definition: it is not the culmination, but the foundation of all other memory-like behaviors.

Another observation is that DD-1 does not imply DD-2: a system which subscribes to DD-1, assuming that the knowledge acquired is in the form of a narrative, can still process the acquired knowledge and store it into conceptual, procedural or factual memory.

Through personal communication, we found that many researchers agree, in principle, with DD-1, but not with DD-2. The controversy can be somewhat diminished, if we position DD-2 as a shift of the knowledge processing (learning) from the moment in which the knowledge is acquired to the real-time moment when the knowledge is used. In this light, DD-2 remains only a counter-intuitive efficiency decision: it appears that it is easier to put some effort into extracting a simple rule, which can then be used many times, rather than reasoning every time based on concrete examples.

 

Intuitive justification: The primary intuitive justification of DD-2 is that humans have difficulty introspecting on their procedural, concept and rule systems. The ability to explain procedures, set up rules, define concepts are skills acquired through education, and they appear to be secondary, rather than primary features of human cognition.

Let us now consider the efficiency of a system subscribing to DD-2. While it appears that storing rules and procedures is more efficient than storing a complete autobiography, we find that the size of the human experience is not particularly large compared to the current size of computer storage. Let us consider our four year old child as a reference point, and assume that she is witnessing 1 event / second for 16 hours / day (in our opinion, a high estimate). This way, the child experienced roughly 84 million events, many of them repetitive. This amount of data is computationally quite manageable, as the autobiographic memory as an unprocessed recording of events does not suffer from combinatorial explosion. One can, of course, argue that extracting a more compact set of rules from this data would further reduce the amount of necessary storage, but this had not been the experience of cognitive systems: knowledge bases of tens of thousands or even millions of rules or productions have been frequently used in cognitive architectures.

DD-3: No retrieval from long term into working memory

 

Implications: This design decision says that the content of the working memory (the focus in Xapagy) can be moved into the long term (autobiographical) memory, but not the other way around. The agent can not reload a previous experience, nor parts of it. The long-term memory can, however, impact the behavior of the agent through the memory use model. We conjecture that this will be neither address-based nor a straightforward content-based recall. The memory use model of the Xapagy system maintains weighted collections of in-memory entities either associated to entities in the focus (“shadows”) or with entities which were not witnessed but inferred or expected (“headless shadows”). The maintenance of the shadows and headless shadows is done through the parallel operation of several complex, dynamic processes, influenced by similarity, relationships, temporal evolution and resource constraints.

Of course, humans exhibit the memory-like behavior we colloquially call “recalling a story”. DD-3 implies that this behavior is not memory retrieval but the creation of a new story. When trying to recall a specific story, the agent tries to mirror a dominant story line, while trying to discard the influence of foreign story lines which compete with it in the shadows. The resulting memory-like behavior spans the continuum from perfect recall to free-form confabulation. In general, a perfect recall unaffected by any foreign story lines requires purity conditions which are basically unobtainable for any agent with significant life experience.

 

Controversies: The basic assumption of any Von Neumann computer is that the memory can store anything and can be loaded from any direction. Naturally, the computer system on which the cognitive system is implemented provides this functionality: why should we forfeit its use?

In general, cognitive architectures assume that the contents of the procedural or fact memory can be recalled, although the recall mechanism (such as the cue-based recall in Soar and systems inspired by it) can be quite complex. Some episodic memory implementations, such as the one described in [Laird-2008-ExtendingSoar, Nuxoll-2007-EpisodicMemory], retrieve a complete snapshot of the episode. We are not aware of any cognitive architecture which universally forfeits the use of memory retrieval, as required by DD-3.

 

Intuitive justification: It is well known that the human memory exhibits memory biases such as suggestibility, false memory, cryptomnesia, the telescoping effect and many others. We believe that these biases do not indicate a perfect recall distorted by foreign factors, but an architecture where the perfect match between the recalled story and a previous story is just a accident of particular circumstances.

The memory use model of Xapagy has been designed to closely mimic human behavior, without using exact recall as a baseline.

DD-4: Common serial processing of acting, witnessing, story following, recall and confabulation

 

Implications: This design decision states that the same system component handles the following situations:

  • the agent is acting in an environment.
  • the agent is witnessing through its sensors the real-time actions of other agents and environmental events.
  • the agent is following a story either through reading, listening to a narration, or watching a play or a movie.
  • the agent recalls a series of events forming a story.
  • the agent confabulates a new story.

Naturally, an embodied system will need other system components, for instance to physically enact actions or to process and interpret visual input. However, in a system subscribing to DD-4, all these situations will at some moment be represented through the general concept of the story, and processed by a common serial mechanism, which we will call the story bottleneck. A strict adherence to the literal meaning of DD-4 triggers several secondary implications.

The first is the undifferentiated representation of direct and indirect experiences. The stories exiting from the story bottleneck are recorded together in the autobiographical memory, with no fundamental distinguishing feature. Naturally, different ways of experiences can create different levels of detail: an agent seeing an action will retain visual elements, which are absent in a sparse narration. Reading a certain book in a certain place creates memories of the reading action itself, in addition to the flow of events in the book. Nevertheless, these distinctions are one of degree and context, not fundamental ones. The undifferentiated representation opens the door for the common use of first hand, indirect and book knowledge (as well as potential confusion between them).

The second implication is the unremarkable self. The Xapagy agent maintains an internal representation of its cognition (the real-time self), in the form of an instance labeled “Me”. However, this instance is not fundamentally different from the other instances representing other entities in the stories and in the autobiographical memory. The “Me” instance in the focus allows shadows containing instances which did not represent the self in the autobiography. This allows perspective taking in recall – however, it also opens the door to confusions between first person and third person memories.

DD-3 together with DD-4 yields another implication, the fragmentation of the self. As the entity of the self can not be retrieved from memory, only recreated, an agent remembering its own stories will have simultaneously several representations of itself, only one of them marked as its real time self.

Finally, an implication of DD-3 together with DD-4 is that every recall of a story creates a new story. In a system which subscribes to DD-3, the recall of a story is the creation of a new story, while DD-4 states that this story will be processed and stored in the same way as the original. An agent does not have a single representation of Little Red Riding Hood, but as many as the times he recalled or re-narrated it (including partial recalls during quiet thinking, like the one triggered by this paragraph).

 

Controversies: In its weaker form, DD-4 is not controversial. The serial nature of cognition is a well established theory, most cognitive architectures  make this assumptions, sometimes as an explicit strategy, such as in ACT-R [Anderson-1998-AtomicComponentsOfThought, Anderson-2004-IntegratedTheoryOfMind].

The secondary implications of the strong form, however, are controversial. The assumption of the unremarkable self and fragmentation of self are contrary to our natural intuitions. Yet these assumptions are present in many of the psychological studies of memory biases, in particular those of false memories. While these might be seen as undesirable faults of the human memory, we see them as simple observations of the normal functioning of the system.

The “every recall of a story creates a new story” implication might create controversy as an unnecessary squandering of resources. While the additional memory requirement is certainly there, this needs to be seen in the context of DD-2: there is sufficient memory to represent the autobiography of the agent, and the internal life of the agent is part of its autobiography. In addition, DD-3 implies that the recalls will often not be identical to the original story, thus the storage of the recalled stories is justified.

 

Intuitive justification: We choose to adopt DD-4 both for reasons of psychological modeling as well as software engineering. Our focus on narrative reasoning, and adherence to the strong story hypothesis naturally leads to the idea of processing the full life experience of the agent as a story. We found that from a software implementation point of view the unremarkable self and fragmented self are actually helpful in allowing the agent to act based on “book knowledge” or personal remembering.

DD-5: An internal language with no local semantics

 

Implications: This design decision states that reasoning in the story bottleneck proceeds through an internal representation which enumerates the events and actions of the story, together with the properties and relations of the entities. While this representation does not have a spoken or written form, it has similarities to the grammatical structure of human languages. For instance, we can identify constructs equivalent to subject-verb-object, subject-verb or subject-isa-attribute type sentences.

The fact that the language has no local semantics, means that a given language fragment, for instance a sentence, has no semantics, except though its relation to the complete autobiography of the agent. The language is not restricted by a semantic foundation (logical, syntactical or otherwise) and it allows the expression of illogical, impossible, self-contradictory and nonsensical statements.

An agent can operate indefinitely without translating this internal language into an external form. Whenever it processes natural language, however, it will be translated into this internal language, and the representation will not be different from that acquired from direct observation or internal recollection (DD-4).

This design decision opens the possibility of a bottom up approach for natural language processing. Instead of working downwards from a language (as in classical NLP) or staying at the language level (which, with some simplification, is the approach of the statistical NLP systems), we can build a first language directly on top of the internal language and work bottom-up toward natural languages.

The internal language of the Xapagy system is the data structure defined by the succession of {\em verb instances}. On top of this, Xapagy builds levels  of increasingly complex “pidgin” languages. Xapi L0 involves a direct mapping of verb instances into a textual form. Xapi L1 (the “explicit” Xapi level) applies a first set of grammatical improvements, making it usable (although clumsy) for communication with humans. Xapi L2 adds a series of macro expansions for the simplified setup of new scenes and for simplified quote references. This does not add new functionality, but it prepares the way to the more efficient translation from natural language.

 

Controversies: This design decision positions the Xapagy system on a certain side of a long running dispute with regards to the language of thought. The idea is not new and it had been forcefully expressed in the context of cognitive systems. While not all systems have an internal language, having an internal language form as the internal representation of a cognitive system is not unusual.

The choice of a language with no local semantics, however, is controversial. Most computer languages have a semantic foundation. Languages which describe series of actions, such as plans, have a distinguished history, going back to the STRIPS language [Fikes-1972-STRIPS], which is a prime example of a language with a well defined semantics: statements have pre-conditions and post-conditions, and well specified operators. A strong research area along the line of languages with strong semantics are the action languages [Gelfond-1998-ActionLanguages]. Languages with strong semantics have proved their utility in many applications.

Second, the bottom-up approach to natural language processing is controversial. We forfeit the use of the corpora of existing languages, and the statistical information it can be gleaned from it. Instead of inferring rules from the existing languages, it requires us to posit the structure of the internal language from first principles and build upwards. Finally, until the development reaches the level of a natural language, it cannot take advantage of the database of stories available in natural languages.

 

Intuitive justification: This decision, in the context of the Xapagy system is a natural consequence of the previous decisions. DD-1 and DD-2 together imply a fully expressive representational format for the autobiographical memory. DD-4 asserts the uniqueness of the representational format. The internal language is simply the representation in which the stories are represented in the focus, in the autobiographic memory, and the one over which the shadowing mechanism operates.

We chose the language to have no local semantics to be able to represent the internal flow of consciousness of the agent which sees but not necessarily understands. We believe that the representation of illogical statements, physical, geometrical and topological impossibilities, are a frequent occurrence in the brain which must be modeled. This does not mean that humans can not act or reason logically, only that this is an acquired, second order feature of human behavior.