Even though they are created by humans, large language models remain fairly mysterious. The high-octane algorithms powering the current artificial intelligence boom have a way of doing things that are inexplicable on their surface to those who observe them.This is why artificial intelligence is largely Known as a “black box” This is a phenomenon that is not easily understood from the outside.
Recent research published by Anthropic, one of the top companies in the artificial intelligence industry, attempts to shed light on the more puzzling aspects of the behavior of artificial intelligence algorithms.On Tuesday, Anthropic published a Research Papers Aims to explain why its artificial intelligence chatbot Claude chooses to generate content about certain topics and not others.
artificial intelligence system Already established A layered neural network that roughly approximates the human brain, which receives and processes information and then makes “decisions” or predictions based on that information. Such systems are “trained” on large subsets of data, which enables them to make algorithmic connections. However, when an AI system outputs data based on training, human observers do not always know how the algorithm arrived at that output.
This mystery gave rise to the field AI “interpretation””, researchers try to trace the path of a machine’s decision-making in order to understand its output. In the field of artificial intelligence interpretation, “features” refer to patterns of activation “Neurons” In neural networks – it is actually a concept that can be referenced by algorithms. The more “features” in neural networks researchers can understand, the better they can understand how certain inputs trigger the network to affect certain outputs.
exist a memo Based on their findings, the human researchers explained how they used a process called “lexicon learning” to decipher which parts of Crowder’s neural network mapped to specific concepts. The researchers say that using this approach, they were able to “begin to understand model behavior by seeing which features respond to specific inputs, giving us insight into the ‘inference’ of how the model arrived at a given response.”
In an interview with the Anthropic research team By Steven Levy of Wired, the staff explains what it’s like to decipher how Cloud’s “brain” works. Once they figure out how to decrypt one feature, others are produced:
One of the features that impressed them most had to do with the Golden Gate Bridge. They mapped a group of neurons that, when firing together, showed that Crowder was “thinking” about the massive structure connecting San Francisco and Marin County.What’s more, when similar groups of neurons fired, they evoked themes from around the Golden Gate Bridge: Alcatraz, California Governor Gavin Newsom and Hitchcock movies vertigo, the story takes place in San Francisco. All told, the team identified millions of features—like a Rosetta Stone for decoding Crowder’s neural network.
It should be noted that Anthropic, like other for-profit companies, may have some business-related motivations for writing and publishing its research in this manner.In other words, the team’s Papers are publicwhich means you can read it yourself and draw your own conclusions based on their findings and methods.