Attention is All You Need

Nail biting through an Adderall shortage while AI models get all the digital dopamine they need.

Attention is All You Need
Photo by ethan on Unsplash

In the two years since my ADHD diagnosis, I’ve become pretty obsessed with attention as a concept. I didn’t realize I had an attention deficit disorder until I was 35 because, if anything, it seemed like I had too much attention, not too little.

It’s hard for me to describe how I process reality. Everyone with ADHD has a different experience. And so I am very cautious about presenting my own as representative of anything. But sometimes people want to know what it’s like. Lately when someone asks, I ask if they’ve seen the movie Arrival.

In the movie, Amy Adams plays a woman who learns an alien language that changes the way her brain processes time. She can experience the past, the present and the future at once. Basically, the language untethers her from linear time. Because of this, she’s able to make connections others might miss. Everything she’s experienced is accessible, all the time.  

Leaving linear time in a world bound by it isn’t without consequences. Making the past accessible in the present makes memory confusing. And she often appears forgetful in the space everyone else understands as right now. The joy she’ll feel in the future is accessible, but so is the sorrow.

Want to listen to me TRULY spin out about how much I love this movie? GREAT! I got to do a whole podcast episode on it. Truly one of my favorite interview experiences of all time. 

I wept for an hour after I saw the movie. I’d never seen the way my brain works represented anywhere. I mean, Adams character was a genius. And I am not. And sure, this representation was an extreme, fictionalized, alien-adjacent version of the way I process reality. Adams can also effectively time travel because of her brain’s rewiring. Which is not something I can do, obviously. 

But the way she’s untethered from linear processes and so can see a little differently? I can occasionally feel the joy of seeing a tesseract where some other people might see a line. But there’s also the way her untethering left her feeling disoriented, blurry-eyed and exhausted. I understood that deeply too. 

Of course, when a doctor diagnosed me with ADHD, I learned that my everything, everywhere, all at once existence really is the result of an attention deficit. At least as psychology defines attention. In the 1890s, William Joyce wrote, 

Everyone knows what attention is. It is the taking possession by the mind, in clear and vivid form, of one out of what seem several simultaneously possible objects or trains of thought. Focalization, concentration, of consciousness are of its essence. It implies withdrawal from some things in order to deal effectively with others. 

I am, to put it mildly, not good at withdrawing from some things in order to deal effectively with others. There are the problems that are easy to explain. Like the way it makes everyday tasks weirdly difficult. For example! I cannot keep sequences in my head. Steps in a sequence or a sequence of numbers. 

The sequence of numbers issue is easy to illustrate. Sometimes I’ll get a code in an email that I need to input into a website to verify my identity. Okay, no problem. Except, I forget the code in the time it takes me to move from one tab to another. I can’t think of the four numbers I just saw, I can only think of all the numbers. In that scenario, I can keep two tabs open and I’m fine. But of course, there are lots of things in our lives we need to remember without the help of open tabs.

I also do this thing Riley calls “time-skipping”, like in Animal Crossing when you move the game’s clock forward or backward. Sometimes I talk about a month ago like it was yesterday and yesterday like it was ten years ago, because in my head there’s really little difference. This is actually wildly destructive to my life. Please consider how much of our existence is dependent on doing things “on time.” Now imagine me late or a no-show to all of those things. Yeah. 

But it’s not just about the external stuff. Internally, it can be difficult too. Sometimes, in the past, not being able to withdraw from some things made me think about withdrawing from all things. Which is a hard thing to admit. But it’s true. 

We still don’t know much about ADHD. Scientists know that it’s neurodevelopmental disorder that affects the way people process reality. It’s very possible that the diagnosis covers a spectrum of conditions we just have not been able to sort out yet. But for now, scientists are pretty sure ADHD is related to faulty dopamine neurotransmission. 

Dopamine is the chemical that helps with cognitive control, motivates learning and helps maintain our working memory.  So far, scientists have discovered 27 genetic markers that make a person more prone to having ADHD. Basically, when I was in utero, something happened that turned on or off a gene and I fell out of sequence. 

It is difficult to write clearly when you have a problem with order. It’s not any great surprise that my writing career really started when I was finally diagnosed with ADHD and started receiving treatment. I take a 15 mg extended release dose of amphetamine and dextroamphetamine. Stimulants stimulate dopamine production. I wouldn’t say that the dopamine helps me get in order. But it does help me…I don’t know….see clearly enough, vividly enough, for just long enough, to put some words in a line to try to represent the circles in my head. 

Many days, even with the medication, I just see through a glass darkly. On those days I write 3,000 words or 300 but publish none of them. But on those days, the medicine does still do enough to keep me from thinking about withdrawing from all things. And so I try to be gentle about the poor thinking and grateful for the still living.  

When I called to fill my Adderall prescription earlier this month, the pharmacist sighed. I’d called in a 30 day supply of medicine. But the US is in the middle of an Adderall shortage. So my pharmacy only had 26 pills left. Did I want the 26? I did. I know I was lucky. I know one women who hasn’t been able to get her prescription filled for three months. Just this week, the Biden administration announced a return to stricter regulations around how people can be diagnosed with ADHD. 

Like any drug, the stimulants used to treat ADHD can be abused. But not really by people who actually have ADHD. We don’t get high from those stimulants. Being able to finally, really pay attention helps us calm down. I can sleep now that I am on Adderall. As worried as I am about days without the medicine, it’s the nights that concern me the most. I don’t want to go back to not sleeping. It’s perhaps not surprising that treatment is becoming more difficult to access as ADHD diagnosis becomes more prevalent among women. I mean lack of health care access for women is kind of America’s thing at this point.

I was thinking about my dwindling supply of medication as I learned how large language models work. Did you hear about how Bing’s AI chatbot started declaring its love to a NYTimes reporter and arguing for the ascendance of white Christian men? 

Well, that chatbot was built using ChatGPT and ChatGPT is a large language model.  Once prompted by a user, an LLM can be very good at predicting what word should come next in the sentence it outputs. So good, it almost feels human. 

An LLM is a machine learning model that gets trained on lots and lots of text scraped from the internet. (Which probably explains some of Bing’s horrifying output. You are what you eat, errrr, scrape.) It learns through an algorithm that mimics the way dopamine helps humans learn and remember. And yeah, when I got to that part of my research I mumbled, “well, fuck.” Because while I am out here nail biting through an Adderall shortage, VC funded AI models are getting all the digital dopamine they need. 


ChatGPT is particularly effective because of a thing called “self-attention.” It’s a model that was first proposed by Google researchers in a paper called Attention is All You Need. Instead of processing text in a sequence - one word at a time in order - ChatGPT uses self-attention to consider all the words at once in relation to each other. (If I were a tech person, I’d launch into encoder-decoder stuff right now. But this is enough for us for now.) This makes some real sense to me because this is how I think. This is also how I read. I don’t read sentences, I take in pools of text. All the words are water molecules clinging to one another. It’s very, very hard for me to read one word at a time, in a line. Which sounds much cooler than it is. Like. I had to teach myself how to read in a line, aloud to my kids and that was humbling! 

While jealous of the investment in ChatGPT’s capacity to learn, I felt honestly moved as I began to understand its architecture. I’ve spent most of my life feeling really ashamed of the way I process the world. But here was a virtual brain that works a lot like mine! And Microsoft just invested another 10 billion dollars into seeing it develop! Maybe I have some potential after all! 

Except. Well. The thing is…

I was wrong. 

LLMs only learn text. And the models cannot comprehend that the letters that make up the text are really symbols or that the words themselves have meaning. LLMs can’t comprehend meaning at all. The models aren’t trained on symbolism, senses, smells, experiences, feelings or anything else. The larger the LLM, the better it is at guessing the next word. That’s one reason the Bing chatbot feels so real. It’s been trained on a ton of data. 

But studies have also shown that the bigger an LLM is, the more likely it is to churn out toxicity, conspiracy theories and lies. Every wild Bing output was just automated apophenia. For all mimicking of human learning, LLMs do not want to learn. They don’t want anything, even though they’re made to pay attention. Of course, it’s possible that it’s an alignment issue. Maybe the AI people don’t know what attention is. 

Mary Oliver wrote that “attention without feeling is…only a report.” Which is the kind of scathing understatement I always enjoy. 

I keep thinking about something else she wrote about attention, 

“I don't know exactly what a prayer is.

I do know how to pay attention, how to fall down

into the grass, how to kneel down in the grass,”

I like her attention, the one adjacent to devotion. Maybe because that kind of attention is the kind I can muster, even when I cannot get my medication, the kind that overwhelms me, that brings me to my knees, with my eyes lifted up. 

Have you heard of the thermal time hypothesis? Proposed by Carlo Rovelli, the hypothesis says that time is an illusion born of entropy, the growing disorder promised by the second law of thermodynamics. We only perceive entropy and its daughter, Time, because we cannot perceive everything. If we could see all of existence as it is - every particle and all their motions - then entropy would cease to exist and time would too. 

I read Rovelli’s book about the illusion of time over and over. I still can’t claim to begin to understand the science behind his proposal. But there’s something there, isn’t there? Some kind of attention. 

An attention that is the opposite of Joyce’s “withdrawal from some things in order to deal effectively with others.” An attention that doesn’t rely on algorithm. An attention that is both creation and eternity. An attention that keeps everything, while owning nothing. An attention that is like a prayer, but not quite.