The Information is reporting that Google plans to add “AI Mode” to its search. This is not a surprise if you’ve spent any time with Perplexity or ChatGPT search; they’re both leaps and bounds ahead of traditional Google Search. Moreover, Google’s Gemini is pretty good, and I expect that it could be a real contender if they put it to work with their search engine.
However, the point is that it would be a contender and not the clear market leader. I remember the old days when we had a lot of search engines, and then Google wiped them all out overnight. I don’t think that will be the case with this next arms race of search engines. Google will be one of several good engines from which we’ll be able to choose. Hopefully, this leads to lots of innovation and the end of the search monopoly.
Last Friday, I published my annual post referring to my Yule playlist. Attached to it was a cute picture of Santa Claus playing the saxophone. That image spurred a few questions about whether I used AI for it, and the answer is yes. For two years, I’ve been doing this post with an AI image of Santa playing the sax. Last year, the best I could do was a black-and-white illustration that was acceptable, but not cute.
This year, however, I upped my game. I have a one-month subscription to Magnific for a video I made for the MacSparky Labs. This is, by many accounts, the best AI image generator available. Although my testing and experience with it have been mixed, I must admit that it delivered (and then some) when it came to making a cute image of Santa playing the saxophone. I also note that it looks like Santa has a well-stocked bar in the background. It is remarkable how far this technology has come in just a year.
As an aside, I also gave it another prompt to make a cute image of Santa Playing a Yanagisawa tenor sax (I play a Yani.) It made a cute image, but it didn’t get the look of a Yanagisawa horn at all. I ended up using the above image instead because it’s so artistic (and shows Santa’s funny booze collection).
As I’ve spent considerable time with Apple’s Image Playground in the recent iOS 18.2 beta, I’m left with more questions than answers about Apple’s approach to AI image generation. The most striking aspect is how deliberately unrealistic the output appears — every image unmistakably reads as AI-generated, which seems to be exactly what Apple intended.
The guardrails are everywhere. Apple has implemented strict boundaries around generating images of real people, and interestingly, even their own intellectual property is off-limits. When I attempted to generate an image of a Mac mini, the system politely declined.
This protective stance extends beyond the obvious restrictions: Try anything remotely offensive or controversial, and Image Playground simply won’t engage.
Apple’s cautious approach makes sense. Apple’s customers expect their products to be safe. Moreover, Apple is not aiming to revolutionize AI image generation; rather, they’re working to provide a safe, controlled creative tool for their users. These limitations however can significantly impact practical applications. My simple request to generate an image of a friend holding a Mac mini (a seemingly innocent use case) was rejected outright.
I hope Apple is aware of this tipping point and reconsidering as Image Playground heads toward public launch. At least let it draw your own products, Apple.
Gemini, Google’s flagship AI model, has landed on the iPhone, marking another significant move in the increasingly competitive AI assistant landscape. The app brings the full suite of Gemini’s capabilities to iOS users, including conversational AI similar to ChatGPT, image generation through Imagen 2, and deep integration with Google’s ecosystem of apps and services.
The mobile release is particularly noteworthy given the current tech landscape, where platform exclusivity has become more common. Google’s choice to develop for iOS highlights its determination to compete in the AI space. Google appears keen to establish Gemini as a serious contender against established players like OpenAI’s ChatGPT and Anthropic’s Claude.
The app is free to use and includes access to both Gemini Pro and, for Google One AI Premium subscribers.
This finally gives me the kick I need to spend more time evaluating Gemini.
There are a lot of great time-tracking applications out there, but one of the absolute best for Mac users is Timing. That’s because it is a native app on your Mac with a bunch of built-in automation. You don’t have to worry about pushing buttons to reset timers. The app pays attention to what you’re doing and gives you a report later.
Along those lines, Timing received an update recently that includes AI-generated summaries of your day. It gives you a concise view of what you did throughout the day and is entirely automated. I just started using the feature, so I need to spend a bit more time before I can recommend it. However, I thought the mere inclusion of the feature was noteworthy. If you’re interested in time tracking and haven’t looked at Timing lately, you should.
Last week Ryan Christoffel over at 9to5Mac quoted the latest Mark Gurman report about Apple developing an additional AI personality. Gurman reports that Apple is working on “[…]another human-like interface based on generative AI.” Like Ryan, I am confused by this.
For too long, Apple let Siri linger. It’s been the butt of jokes in tech circles for years. We’re told that this year will be different and Siri will truly get the brain transplant it deserves. But if so, why is Apple working on an entirely different human-like interface? Does this signal that the Siri update isn’t all it should be?
It’s too early for any of us to tell on the outside. There are some Siri updates in 18.1, but they are largely cosmetic. We’re still waiting for the big shoe to drop on Siri updates with later betas.
However, the idea that Apple is already working on the next thing before they fix the current shipping thing does make me a little nervous. I realize that at this point, we’re all just reading tea leaves, and I could be completely off the mark here, but I sincerely hope that the updates to Siri this year get all of the effort that Apple can muster.
My experiments with Perplexity continue. This alternate search app takes a different approach to getting answers from the Internet. Rather than giving you a list of links to read, it reads the Internet and tries to give you an answer with footnotes going back to the links it reads. I think it’s a good idea, and Perplexity was early to this game. Google is now following suit to less effect, but I’m sure they’ll continue to work on it.
I recently got an email from Perplexity about a new feature called Perplexity Pages, where you can give it a prompt, and it will build a web page about a subject of interest to you. Just as an experiment, I had it create a page on woodworking hand planes. I fed it a few headings, and then it generated this page. The page uses the Perplexity method of giving you information with footnotes to the websites it’s reading. I fed it a few additional topics, and it generated more content. Then, I pressed “publish” with no further edits. The whole experiment took me five minutes to create.
The speed at which these web pages can be created is both impressive and, in a way, unsettling. If we can generate web pages this quickly, it’s only a matter of time before we face significant challenges in distinguishing reliable information from the vast sea of content on the Internet. In any case, I invite you to explore my five-minute hand plane website.
I watched the Apple WWDC 2024 keynote again, and one of the sections that went by pretty quickly was the reference to Private Cloud Compute, or PCC. For some of Apple’s AI initiative, they will need to send your data to the cloud. The explanation wasn’t clear about what sorts of factors come into play when necessary. Hopefully, they disclose more in the future. Regardless, Apple has built its own server farm using Apple silicon to do that processing. According to Craig Federighi, they will use the data, send back a response, and then cryptographically destroy the data after processing.
Theoretically, Apple will never be able to know what you did or asked for. This sounds like a tremendous amount of work, and I’m unaware of any other AI provider doing it. It’s also exactly the kind of thing I would like to see Apple do. The entire discussion of PCC was rather short in the keynote, but I expect Apple will disclose more as we get closer to seeing the Apple Intelligence betas.
Yesterday, Apple announced its new name for artificial intelligence tools on its platform: Apple Intelligence. If you watched the keynote carefully, it was almost humorous how they danced around the term “artificial intelligence” throughout. At the beginning, Tim made reference to “intelligence” without the word “artificial. Then, throughout the rest of the keynote, up until the announcement of Apple Intelligence, Apple relied on its old standby, “machine learning.” Nevertheless, they eventually got there with the announcement of Apple Intelligence.
The initial explanation was telling. They stated five principles for Apple Intelligence: powerful, intuitive, integrated, personal, and private. These principles are the foundation of what they’re trying to ship. Also, in Apple fashion, the term Apple Intelligence doesn’t refer to a single product or service, but a group of intelligence-related features:
Table Stakes AI This is the type of AI that everyone was expecting. It includes things like removing lampposts from picture backgrounds and cleaning up text. We already see multiple implementations throughout the Internet and in many apps already on our Macs. Apple had to do this.
They did, and the implementation makes sense. It’s got a clean user interface and clear options. Moreover, developers can incorporate these tools into their apps with little or no work. It should be universal throughout the operating systems, so learning how the tool works in one place means you can use it everywhere else. For most consumers, this is golden.
Also, it will be private. While I’m a paying customer of Grammarly, I’m aware that everything it checks is going to their servers. That means there are some things that don’t get checked. I’d much prefer to do this work privately on my device.
LLM AI There have been many rumors about Apple developing its own Large Language Model (LLM), but nobody expected them to have one competitive with the likes of OpenAI and Google. So the question was, is Apple going to ship something inferior, work with one of the big players, or not include LLM as part of this initiative? We got our answer with the partnership with OpenAI, which incorporates OpenAI’s 4o engine into the operating system.
This makes a lot of sense. Since the keynote, Craig Federighi has gone on record saying they also want to make similar partnerships with Google and other LLM providers. While nothing is going to be private sent to a company like OpenAI, Apple is doing what it can to help you out. It doesn’t require an account, and it gives you a warning before it sends data to them. Again, I think this is a rational implementation.
If you already have an OpenAI account, you can even hook it up in the operating system to take advantage of all those additional features.
Private AI
This was the most important component of Apple Intelligence and was underplayed in the keynote. Using the built-in neural engine on Apple silicon combined with Apple Intelligence, Apple intends to give us the ability to take intelligence-based actions that can only be accomplished with knowledge of our data. That bit is essential: Apple Intelligence can see your data, but more powerful LLMs, like ChatGPT, cannot.
That gives Apple Intelligence powers that you won’t get from traditional LLMs. Craig explained it with some example requests:
“Move this note into my Inactive Projects folder”, requiring access to Apple Notes. “Email this presentation to Zara”, requiring access to Keynote and Apple Mail. “Play the podcast that my wife sent the other day,” which requires access to data in the Podcasts and Messages apps.
While these commands aren’t as sexy as asking an LLM engine to write your college paper for you, if they work, they’d be damn useful. This is exactly the kind of implementation I was hoping Apple would pursue. Because they control the whole stack and can do the work on device, this feature will also be unique to Apple customers.
“AI for the Rest of Us”
During the WWDC keynote, I only heard the term “Artificial Intelligence” once. At the end, when Craig said, “Apple Intelligence. This is AI for the rest of us.” I think that sentiment summarizes Apple’s entire approach. I agree with the philosophy.
I’m convinced that Apple has considered AI in a way that makes sense to me and that I’d like to use it. The question now is whether Apple can deliver the goods. Apple Intelligence isn’t going to be released for beta testing until the fall, so now we just have promises.
Apple’s challenge is the way Siri lingered so long. You’ll recall that Siri, too, started with a good philosophy and a lot of promises, but Apple didn’t keep up with it, and Siri never fulfilled its potential.
Looking at the Siri example, I should be skeptical of Apple Intelligence and its commitment. Yet, I’m more hopeful than that. The degree of intentionality described yesterday, combined with the extent to which Apple’s stock price is contingent on getting this right, makes me think this time will be different. In the meantime, we wait.
MacWhisper has been updated to version 8 with some new features, including a video player. Multiple apps use the Whisper model to perform transcription. I bought a license for MacWhisper early, and I’ve been using it a lot ever since.
One example: We use a Notion database to manage all the MacSparky content (this blog, the MacSparky Labs and Field Guides, etc.). With the addition of Notion AI, we’ve found value in keeping text transcripts of released content in the database. This allows us to ask questions like, “When is the last time I covered MacWhisper?”
MacWhisper 8 adds new features:
Video Player
A new inline video player has been added that allows transcribing video files. The video player can be popped out into its own window. Subtitles display directly on the video, and translations appear as separate subtitles, too. This will make the above Notion workflow even easier
WhisperKit Support
You can now choose different Whisper engines like WhisperKit for your transcriptions. WhisperKit offers distilled models for faster transcription speed, and transcriptions stream in real-time. WhisperKit can be enabled in Settings → Advanced.
There are a bunch of other improvements keeping MacWhisper at the top of my list for transcribing audio on my Mac.
I will be curious to see if Apple incorporates the Whisper technology into the Mac operating system at WWDC. It seems like it should be built into the operating system. Moreover, if they incorporated it onto the chip, it could really scream. But it’s too early to tell exactly what Apple’s vision is for incorporating AI into macOS, and this may be a bridge too far. In the meantime, I’m very happy to have MacWhisper around.