Understanding Language

 

The major goal of the Active Structure semantic architecture is to learn—and we mean really learn—human language. Current Deep Learning or LLM methods approximate language understanding and usage through complex statistical models or word propinquity, but there is no true language understanding behind them (and statistical models or LLMs are useless in dynamic environments – tariffs, anyone?). This lack of true understanding limits application and can lead to costly mistakes that can never be fully eliminated because their cause lies at the very core of the Deep Learning or LLM methods. Our own approach is grounded not in statistics, but in a real understanding of the meaning of words (see footnote) and the way that they relate to one another. But what goes into learning a language? How do humans go about it, and can we implement the same method in a Semantic AI Machine (SAIM)?

We absorb language like a sponge when we are young. Our mother tongue is not explicitly learnt so much as unconsciously extracted. Learning a language when we’re older is difficult—our brains are no longer tuned to pick it up as easily as a child and we may need to be explicitly taught the rules. However, it bears mentioning that an adult learning a new language can at least depend on the presence of familiar grammatical elements and structures. The specific words and the syntactic rules that govern their combination may differ, but the way that they carve up the external world is familiar. Due to our common “hardware”, all humans share similarities in the way that we perceive and organize the world. Our language is grounded upon our shared relationship to the world. The structure and content of it take for granted a common understanding of the world.

You probably see where I’m going with this by now: a machine lacks entirely the shared perspective that humans use as scaffolding for interpreting the language of one another. Teaching a SAIM a human language is fundamentally different than teaching a human a new language, and a different approach will be necessary because of this. Though perhaps a bit unconventional, an alternative starting point we could use to approach this problem is to look at the methods humans are using to try to understand the communications of other species. When I’m not helping develop the SAIM approach, I’m a researcher specializing in animal communication. I go out into the field following around monkeys with recording equipment to capture any vocalizations they may produce, and then back at the lab we try to tease apart what, if anything, these animals may be talking about. We cannot assume anything during our investigations, as we lack a shared perspective with the animals we study in the same way that a SAIM lacks the human perspective.

One of the core problems to solve in my work is the problem of reference. As an example, imagine an anthropologist trying to learn the language of a newly encountered tribe of people purely by observing the way that they use language. While out walking with a member of this tribe, a rabbit hops by and the individual points at it while exclaiming “gavogai”. From this episode, the anthropologist initially concludes that gavogai means rabbit, but upon further reflection, he begins to doubt this conclusion. Gavogai could mean animal (a higher-order classification), it could mean furry (a quality), it could mean food (its function), it could refer to the way that it is moving (an action), or it may not even refer to the rabbit at all (“Look at that!”). In order to disentangle the specific meaning of that word, we need to hear it uttered in many different contexts, noting everything that is occurring at the time and testing different hypotheses about the meaning of the word until a likely meaning can be determined.

The anthropologist can at least rely on humans from a totally different language culture still carving up the world in a way that is largely understandable to other humans. There will be familiar word categories like nouns, verbs, and adjectives, and the things that those words refer to will probably be recognizable as relevant features to a foreigner as well. This cannot be taken as a given when we are trying to understand the meaning of animal vocalizations. We have very little idea of how animals may carve up the world so we cannot say ahead of time what they may or may not consider worthy of conversing about. We cannot rely on a shared way of perceiving the world to guide our interpretations.

Like the anthropologist in the story, when we try to disentangle the meaning of animal vocalizations, we form hypotheses about what certain sounds may mean and test those hypotheses against the context that they occur in. When doing a field study on animal communication, we track all context we can conceive of as being relevant—everything from social interactions that have recently occurred to environmental features and even lighting conditions at the time. We then use complex statistical models to extract patterns and test hypotheses linking animal vocalizations to the environmental features that they may refer to.

In some cases, the answer has come easily—at least at first. Some of the early evidence for referential communication in animals came from vervet monkeys, a terrestrial primate species native to Africa. Early studies on the communication of the species revealed that they produce distinct calls referring to specific predators. The call that they use when they spot an eagle is different than the call that they use when encountering a snake or a leopard. Not only that, but the other monkeys in the group will respond in a distinct and situationally-appropriate way depending on which call they heard, showing that other monkeys understand the referent of the call. Though alarm calls in animals were known by humans long before this, it was assumed that they had a more general referent (“something threatening is present”) or were produced in response to an internal state (“I am frightened”). What the study of these alarm calls accomplished was to demonstrate that animals too may deploy specific vocalizations for the purpose of informing others of things in their environment. In short, it demonstrated communicative intent and referential speech, not unlike how humans will converse with one another using words that reference specific features of their environment.

From that starting point, research on animal communication has exploded into every species from ravens to dolphins, with evidence of referential communication appearing in many unlikely places. Prairie dogs have a predator alarm call system so complex they can communicate the color of shirt that a human is wearing. Chimpanzees communicate with each other about the quality of newly discovered food. Elephants even address each other with specific calls—in other words, they may have names for one another. The task of disentangling animal communication is, however, far from over. The best evidence for communication still comes from simple alarm systems, and in most cases we’re far from determining what—if anything—animals may be talking about.

At this point, we can confidently say that at least some animals have the capacity to communicate about things relevant to their day-to-day lives like predators and food, but how flexible is this system? Suppose the environment changes very rapidly and the need to communicate something entirely new arises. What allows human language to adapt to express new ideas is the fact that we combine small chunks of meaning into larger ones. Morphemes are combined to form new words with new meanings, and words are combined into sentences to form complex meanings. This ability to combine smaller chunks of meaning into infinite unique combinations is what grants human communication its flexibility.

Though still a fresh avenue of study, early research into animal combinatory vocalization usage has demonstrated that it is widely present across the animal kingdom. The presence of combinatory usage on its own does not, however, prove that these combined calls have semantic content beyond the basic units composing them. So far most evidence for semantic content has come from primates, with a variety of monkey species combining vocalizations in a fashion that resembles affixation to convey specific meanings in alarm call contexts. One species of bird, the Japanese great tit, has also been shown to encode semantic content using syntactic rules, such that a call sequence of ABC-D means something different than D-ABC.

Perhaps unsurprisingly, the best evidence for semantic content in vocal combination usage comes from our closest relatives, the chimpanzees. They have been observed to produce a wide variety of vocal combinations specific to particular contexts, suggesting these combinations convey semantic content relating to those contexts. Not only that, but the order in which they combine their calls changes the meaning they are conveying, demonstrating the use of syntactic rules in chimpanzees. The rules governing appropriate call order have been shown to vary between populations, suggesting that there exist different chimpanzee “dialects”. This final point is especially pertinent to our discussion of the capacity for animal communication to convey new information—it demonstrates that the sequences chimpanzees produce are learnt, not purely instinctual, meaning that the way that they combine vocalizations is flexible and can potentially accommodate new meanings. Further support for this point comes from chimpanzees that have been taught to communicate with humans: Washoe the chimpanzee famously used ASL (American Sign Language) to sign “water bird” upon first encountering a swan, demonstrating the ability to flexibly combine previously learnt words to refer to new things.

So what’s all this got to do with teaching language to a SAIM? There are a few lessons that can be gleaned from our discussion. For one, the capacity to make flexible use of language through combining previously learnt words in new ways is critical. The world and language itself is constantly (and sometimes rapidly) changing, and any intelligence seeking to communicate about a changing world needs to have a sufficient grasp of language to combine its components in a flexible way.

The most important takeaway here is that though the referent of a word and its intended significance may be readily evident to us, the SAIM has none of the background understanding that we are relying on in our interpretation. Like the researcher trying to decode monkey communication, the SAIM cannot assume anything at the outset. On its own, it would need to weigh all possibilities when trying to make sense of human language and observe the word being used in many different contexts before it could settle on its meaning. We can aid this process by making every aspect of our language as explicit as possible.

So much of our interpretive gear is housed within the unconscious mind. The SAIM does not share our perspective and can depend on none of the scaffolding we rely on to interpret language. To ease the journey of the language-learning Semantic AI Machine as much as possible, the unconscious needs to be made conscious and the assumed needs to be made explicit. This step is also structurally necessary for the SAIM. Unlike the human mind, it will have no unconscious layer in a way comparable to that of humans. The capacity and processing power of the SAIM can potentially be increased far beyond human limits, which removes the need for conscious-unconscious segmentation. There will certainly be layering, and processes running in the background that rarely surface, but there will need to be the capacity for elaboration or repair at every point (and archiving of what has been changed). This necessitates transparency and surface accessibility, which will be another key difference between the architecture of the SAIM and the human mind.

The analogy between a machine learning human language and a human decoding animal communication underscores a fundamental asymmetry between the researcher and the subject. When humans study animal communication, they do so from a position of cognitive advantage, equipped with tools for abstraction, hypothesis testing, and contextual reasoning. For a machine to truly grasp human language in a comparable way, it may be necessary for it to operate from a similarly elevated position. This means not merely imitating human language use, but understanding it deeply enough to analyze, generalize, and even repair it. In other words, while the SAIM begins as an outsider to human perspective, its architecture must ultimately support capabilities that allow it to stand above or at least on a level with human understanding. This requires an artificial mind with the clarity and transparency to reconstruct meaning from the ground up.

(when we say “meaning”, we mean an active structure is created, which emulates what the real-world structure would do – if a verb indicates an action – “he ran away”- that action is carried out (abstractly). If the verb or adjective indicates a change of state, the object changes its state – “he became angry”. The text becomes a working model, making it much easier to understand complex behaviour beyond a human’s ability to understand from text – too many moving parts)

by Ryan Sigmundson


Comments

Popular Posts