AI round-up: Week of Oct. 16, 2023
“Within 10 years, AI will be able to ‘do 80% of 80% of all jobs that we know of today,’” said Vinod Khosla, a tech investor and entrepreneur for more than 40 years, speaking at The Wall Street Journal Tech Conference (10.17.23).
There was a lot said at this conference. (subscription required)
AI agents have learned to play Minecraft. Leading researchers to theorize they will eventually replace “‘office workers.” (subscription required)
Head of the SEC, Gary Gensler, feels AI needs to be “tamed.” Or else it could cause a financial crisis within the next 10 years. (subscription required)
(Sorry for all the “subscription required” articles but more and more mainstream media are writing about AI.)
Rise of the AI factories. Foxconn and Nvidia are teaming up.
But what about the children? AI will shape their future. It’s up to us(!!) to determine if that’s for better or worse.
Do I have diabetes? You can use your smartphone, your voice and AI to find out.
The “what the what?” read of the week:
AI is resurrecting the dead. With or without their permission. Can you believe you may have to take steps to ensure your likeness isn’t “brought back”?
Let's take a look:
I worked with Anthropic’s Claude to come up with a prompt to show off Midjourney. I know not everyone is using it as you need to be a member of Discord, so take a look at what found:
Me: I’d like to show off the potential of Midjourney. Let’s come up with [a prompt] to show something that is modern art based.
Claude: Okay, here is a prompt to generate a modern art style AI image with Midjourney:
“A surreal digital painting in the artistic style of modern minimalism and abstract expressionism by famous artists Piet Mondrian and Mark Rothko. The image depicts a series of brightly colored blocks and shapes with sharp lines floating in an abstract dimensional space. The blocks shift from shades of vivid primary colors like red, blue, yellow to neutral black, white and grey. The shapes intersect to create a sense of depth and geometric complexity. Rendered in ultra-high resolution with crisp clean lines, inspired by futurism and cubism.
“For this prompt, I focused on describing some key characteristics of modern art including geometric abstract shapes, bold colors, clean lines and dimensional depth. References to well-known modern artists help set the style. Let me know if you would like me to modify the language or emphasis in any way to better showcase Midjourney’s capabilities for generating modern art styles. I’m happy to refine the prompt further.”
(Put the prompt into Midjourney)
Midjourney: (To note—Midjourney gives you four options of what you asked for. Also to note—Midjourney images are not copyrightable.)
Finally, some perspective from Casey Newton:
This is a little longer read … but may sum up how you feel while trying to process all things AI. (It’s definitely how I feel!)
Note: Sourced from his e-newsletter
Today, let’s talk about a novel way of accessing generative artificial intelligence that, I think, speaks to how all of us are going to be using it in the future—and the feeling of vertigo I get when I think through the implications of this shift for the internet more broadly.
Last month, I was speaking with some people who work on AI safety. We discussed the difference between using OpenAI’s GPT-3.5, which is free to all, and GPT-4, which costs $20 a month.
I had recently subscribed to ChatGPT Plus at the encouragement of a friend who had found it to be an excellent tutor in biology. A few days later, I found myself embarrassed: What I thought I knew about the state of the art had essentially been frozen a year ago when ChatGPT was first released. Only by using the updated model did I see how much better it performed at tasks involving reasoning and explanation.
I told the researcher I was surprised by how quickly my knowledge had gone out of date. Now that I had the more powerful model, the disruptive potential of large language models seemed much more tangible to me.
The researcher nodded. “You can fast-forward through time by spending money,” she said.
In 2020, a pair of former Facebook software engineers named Thomas Paul Mann and Petr Nikolaev founded Raycast. The company’s app, which is currently only available for Mac, is a launcher: an app you use to do things with other apps.
If you have used Spotlight on your Mac or more sophisticated tools like Alfred or the late lamented Quicksilver, you are familiar with the basics of what Raycast can do. You type a universal hotkey, such as ⌘+space, and a window pops up on your screen. Type a few letters (“c-h-r”), and Raycast will guess what you are trying to do: in this case, open Chrome.
Raycast can also take other actions, such as looking up words, performing calculations or tracking flights. There is little it does that you could not also do on the web. What makes Raycast appealing, at least to a particular sort of nerd, is the way that it makes these actions instant. You have a thought, you type a few characters, you hit enter, and boom: There’s your answer, and now you’re back to work.
The free version of Raycast is quite useful. But in July, Raycast introduced GPT-4 as a paid add-on to its premium product. Last month, I installed it as part of my resolution to engage with AI tools more often, to prevent my knowledge from getting outdated quite so quickly.
The interesting thing about GPT-4 in Raycast is the speed. You hit ⌘+space, you prompt the model, and you hit tab. In a second or two, the model returns the results.
In my experience, this mode of using AI leads you to treat it more like a search engine than you might otherwise. I ask it about historical events, I ask it about musical artists, I ask it about video games.
I don’t use it to find information for this column—the model might be hallucinating, and it will take me longer to use the AI and then fact-check it than it would to just seek out a vetted source of information myself. But for the relatively broad category of searches for which I want information that is basically or mostly true, GPT-4 works surprisingly well. And making it accessible via a hotkey means that I am performing searches on traditional search engines less than I did before.
I had spent money to fast-forward through time.
I began feeling that increasingly familiar sensation of AI vertigo.
Most people will never use Raycast. But I imagine most people eventually will have an experience like the one I am having now, of typing in a text box looking for some output they used to get from Googling and get delivered to them directly.
This will not completely replace Google search, even as it changes substantially what Google will be asked to search for.
Google has Assistant; it has Android; it has Chrome; it has ChromeOS. It has—for now—its position on iOS. Hundreds of millions of people will still begin their queries in the little boxes they find there.
But there will be other little boxes. Apple is reportedly building a ChatGPT competitor, which it could bake into its operating systems. Amazon’s devices will soon begin speaking in natural language. The upstarts, from OpenAI to Humane, are working on their own hardware.
In the meantime, if you subscribe to ChatGPT Plus and have a new iPhone, you can assign the voice version of the chatbot to the action button on your device. I did this when I got my new phone; I now long-press the button, wait for ChatGPT to connect, and then ask my query that way.
There’s a bit too much latency for the experience to be truly great, and unless I’m walking around town, I don’t usually want a spoken answer to my query.
For the times I do though, it works well. It’s another step outside of the web as Google built it.
Fast-forwarding through time.
You can get carried away with this kind of thing, of course. You can succumb to hype. AI people only discuss their work in the most grandiose terms. Some days it sweeps me along.
In May I wrote here about AI’s missing interface. The problem with an increasingly powerful blank box is that its powers are all invisible. Open an Excel spreadsheet, Benedict Evans notes in a recent post, and it will show you some templates to give you an idea of what it’s capable of. Chatbots might suggest a handful of prompts, but for the most part it’s up to you to discover what to do with them.
Excel isn’t just giving suggestions—those tiles are documents, and documents are the start of a process, not an answer. You can see what you’ve built and what it’s doing and how far you’ve got. The same sense of creation as process applies to Photoshop, Ableton or PowerPoint, or even a simple text editor. The operative word is editor—you can edit!
Conversely, using an LLM to do anything specific is a series of questions and answers, and trial and error, not a process. You don’t work on something that evolves under your hands. You create an input, which might be five words or 50, or you might attach a CSV or an image, and you press GO, and your prompt goes into a black box and something comes back. If that wasn’t what you wanted, you go back to the prompt and try again, or tell the black box to do something to the third paragraph or change the tree in the image, press GO, and see what happens now. This can feel like Battleship as a user interface—you plug stuff into the prompt and wait to find out what you hit.
There is a kind of joy in this wait-to-find-out-what-you-hit interface: it offers the addictive intermittent rewards of a Skinner box.
If you are inclined to believe that AI is overhyped, this seems to me to be a good place to build your argument. When you Google, you usually know what you’re going to get back. When you ChatGPT, there’s a bit more chance involved.
For what it’s worth, though, lately when I use ChatGPT, I get the thing I expected.
In the end, though, I feel AI vertigo comes with a sense that the ground underneath our feet is changing.
The web is created by people. To the extent that people are paid to create the web, it is largely because of ads that run on websites. People visit most websites because they are searching for something, and mostly they search on Google.
To use Raycast is to get a glimpse of life after the web, or at least the web as we know it. It offers the answer you were looking for without you having to so much as open a browser. You summon the collective knowledge of the world—collective knowledge that was often obtained by these chatbot makers under dubious pretenses—and you return to your work.
It is very difficult for me to think through how we currently fund most journalism—and at the same time look at how AI tools are developing—and believe that the thousands of lost jobs we have seen in digital media this year are not about to accelerate.
There are places on the web that the chatbots can’t yet touch, of course. Their training data ends around 2021, making them useless for current events. You can’t grab a reservation on OpenTable or search for concerts in your area or ask them about the weather. To the extent they discuss sports, it is only as historians.
But it is in the nature of technology development to abstract away whatever it can. There are few technical barriers to bringing those kinds of data into a chatbot; if there were, publishers wouldn’t be busy asking chatbot developers not to crawl and scrape them. There are likely legal and regulatory battles to be fought, though, and if the pace of development slows down soon, I imagine it will for those reasons.
Sure, old habits die hard, and Googling is a way of life for a generation of people. No one uses Yahoo for search any more, and yet that company still exists. Not everyone will change their behavior overnight.
You can have a vision of that future today, if you want. It costs $16 a month on Raycast.
Depending on your place in the ecosystem, though, I can understand if you would rather look away.