Plain text is what you need. But why?
Generative AI is not just about creation, but also the transformation of content, to which easy access through APIs and open formats becomes fundamentally important.
The text you're reading right now isn't just a series of characters. Some parts will be bold, others might redirect you to external sources, like the article The rise of Artificial General Intelligence (AGI), and some will be interspersed with images or animations. Each of these additional elements needs to be somehow marked so that programs can display the content correctly and often also allow editing. From the user's perspective, usually what I'm writing about now doesn't play any role because they only see the successive characters of the text.
Below, I've included the link heyalice.app in Google Docs, so apart from a few interface elements, the screen only displays text directing to the indicated page. The document is straightforward and can be easily worked on further.
When you delve deeper, things get a bit trickier. If you save a file as a .docx, then rename it to .zip and unzip it, you'll discover that the document is made up of several .xml files. This reveals a rather intricate setup, made even more complex by the detailed content within the files. As a result, making any changes isn't straightforward, which is why using an interface of applications like Google Docs or Microsoft Word is almost a necessity.
So far, this situation hasn't been a problem for most of us. Even more so because the popularity of formats like .docx, .xlsx, or .pdf is so high that practically everyone can work with them. The issue is that not everything that worked for us in the past will serve us well in the future, especially since the way we interact with text has changed a lot over the last year.
New possibilities for content transformation
Large Language Models (LLMs) like GPT-4 or Claude 2 are associated with generating new content, for example, for ongoing conversations. However, it turns out that they are also capable of advanced transformation of the submitted text. Just by adding a piece of one of the documents in the instructions (prompt), the model can modify, enrich, or summarize it as desired. What's more, the result can also take various forms and include details we ask for.
An example of such a transformation can be seen in the animation below. I just copy the selected part, which thanks to Alice is translated into English. I can then paste the modified text into my document. Unfortunately, I have to manually fix the formatting and fill in the missing link.
Maintaining formatting in this case is not possible, or at least becomes very complicated, and at this stage, we already know the reason (xmls, right?). However, this is not the only limitation or even a blockade that we won't be able to bypass.
Content transformations can cover both individual fragments and entire documents. In the first case, we can be directly involved in the process, but it quickly turns out that working with longer documents becomes tedious and time-consuming. Even though the GPT-4-Turbo and Claude 2 models allow us to pass longer content sets, it still requires a lot of time to reformat and relink. It would be much easier to simply pass the entire document with a request to make specific changes. Then all that remains is to verify the result, which usually includes individual corrections. However, in the case of closed formats, this is not possible (or so difficult that it can be considered unprofitable).
Transforming a document isn't just about altering its content. It's also useful to describe the content and store it in a new, enhanced format. Take a text with a to-do list, for example. Such content can be converted by an LLM in a way that entries organized by date, projects, and priority appear on our task list.
The animation below shows how sending a simple message to Alice ended up adding three tasks to my to-do list.
This process is much more complex than just a simple correction or text translation. One of its stages included generating a structure that describes the content of the sent message. It took into account individual tasks and additional settings that are significant from the perspective of the logic responsible for actually adding tasks to my list.
It's quite easy to imagine further applications of such transformations, as we're talking about the ability to interact with apps, services, or even devices using natural language and simple commands. Regardless of the scenario, we're talking about working directly with text, to which free access turns out to be critically important.
Open Formats and Free Access to APIs
It's common knowledge that quality data is valuable in the context of AI. It's worth adding that access to this data should be as easy as possible and include reading and writing capabilities. You can easily see the importance of this when you first try to connect a Large Language Model (LLM) with your own knowledge bases (for RAG systems), and even the simple task of gathering your data in one place becomes a challenge.
The same applies to applications and services, where not only a great user interface (UI) is crucial but also the application programming interface (API). Especially since you can use it through no-code tools like make.com. The lack of an API almost completely closes off the space for optimizing your work through automation or integration with AI, beyond the options provided by the software creators.
Free access to data and actions opens up space for significant optimization of your daily routine, which I have already demonstrated to some extent with text transformation examples or simple task adding. However, to make this possible, it's necessary to choose tools that won't limit us. Then we'll need to develop skills and reach for ready-made solutions that will allow us to fully exploit the possibilities offered by Generative AI. For now, let's focus on this initial stage.
Browsing through the tools available on the market, you can notice that the role of APIs is often treated superficially, but thankfully, this isn't the case for all of them. Besides, some tools give us free access to data stored locally on our computer, which we can also use programmatically, although it makes establishing a remote connection more difficult.
Good examples are text editors or broadly defined note-taking tools, which usually offer us either fairly free access through an API or open local file formats. Given that none of the solutions available on the market address all my needs, my current list looks as follows:
I use Obsidian.md as a tool for building my long-term knowledge base
I use Notion for collecting information that I want to interact with remotely via API
I use iA Writer for creating longer content, from emails to articles, to books
Each of these editors has one thing in common — they support Markdown syntax, which is increasingly appearing in popular editors. Its presence is crucial for at least a few reasons:
Allows adding formatting that highlights the most important actions and can be extended if needed
Markdown can be converted into HTML structure and used for web content, blogs, or newsletters (on macOS, just run the command pbpaste | /opt/homebrew/bin/pandoc -f markdown -t html5 to convert markdown from the clipboard to HTML5. The converter Pandoc must be installed on the system)
Conversion works both ways, meaning HTML code can be turned back into markdown syntax (although this is a form of compression and some information will be lost, making it harder to recreate the HTML structure)
Content written in Markdown can be directly processed by Large Language Models. This means that the text can be processed while preserving formatting
The advantages of Markdown syntax are visible in the animation below, which presents exactly the same task we saw at the beginning of the post. However, this time the processed text has retained all its features, including formatting and links.
But that's not the end because once we've got our document prepped, it can be automatically split into smaller chunks and processed by an LLM without us lifting a finger. As a result, we'll get a copy that includes all the edits we need, keeping the content's formatting intact, or on the flip side, in a completely revamped form that fits our expectations.
However, we've got to make sure that the model actually sticks to our instructions, which isn't a given, since the snippet we pass along might sound like a to-do list or be too lengthy, which could make the content "distract the model" from the original command. That's why it's smart to process either single paragraphs or bits that don't exceed a certain number of characters or, even better, tokens. Plus, the instructions need to clearly state what needs to be done, no matter what. You can find an example of such a prompt below (Translation Prompt — Playground)
Translation feature / text correction, you can directly plug into the Alice app and assign it to a keyboard shortcut (in my case, it's ⌘⌥⌃⇧T, where the combination of function keys is assigned to the CapsLock key thanks to the Karabiner-Elements app).
Feel free to modify the instructions and direct them, for example, to text correction or any other transformation that you find useful in the context of your work. However, pay attention to a few elements of it:
The whole prompt is conducted like a conversation, starting with a statement explaining the translation rules. It's like talking to a copywriter who's ready to work, showcasing their capabilities. This form is intentional because the LLMs are designed to generate text by continuing/complementing the existing content. So, we design the instruction in such a way that, when combined with the fragment provided by the user, the whole can be naturally completed with what we expect.
In the instructions, I also included examples that demonstrate the expected behavior. I chose very short messages that, due to their content, could be treated as a command that might disrupt the initial instruction. Simply put, instead of making corrections and translating the text, the model would answer the posed question, and in this situation, we wouldn't want that. Therefore, the role of the examples is to present the expected behavior, which supports the earlier instruction and reduces the risk of improper model behavior. That's why it's so crucial to verify the generated result to make sure that no accidental errors have crept in.
I wrote that instructions like the one above can also be used for automation purposes. In that case, a script or no-code scenario is responsible for processing the documents, which involves reading the content of the document, dividing it into sections, running a prompt for each of these sections, and then combining everything together and saving the result. The JavaScript code below does exactly that task, and it's also available in this repository.
Such a script can successfully be used to translate a long article, the content of which could not be processed by the model in one go. The same scheme applies to any other transformations, including generating tags, analysis, summaries, or any other kind of description. What's more, we're also talking about the possibility of performing tasks in the background, without our active participation (excluding the verification stage).
Regardless of the action taken, the key is just to ensure that the content we're working with is easily accessible, either in the form of a local file saved in an open format (e.g., Markdown), or through an API.
At this stage, you should start to see the possibilities offered by large language models, which we can use directly, but also for automation. That's why it should be clear why it's so important for us to choose tools that won't make it harder for us, or product teams, to interact with our own data. Especially since if we understand the general principles of designing such solutions, we'll be able to freely create new ones that will gradually support various processes. After all, we're talking here not only about processing long documents, but practically any activity related to task management, events, mail, or even entire areas of marketing, sales, product development, or management.
Where to start?
If you want to develop your skills towards advanced use of Generative AI to optimize your daily life, you might be interested in the following steps that I've gone through.
Build a knowledge acquisition system: Review the list of newsletters you're subscribed to, websites you visit, blogs you read, and YouTube channels you watch. Pick those that support your goals and check out the social media profiles of their creators. You'll likely find other places that might interest you among them. Focus on those that are as close to the source as possible, like the creators of selected tools, technologies, or concepts. Add their profiles to Feedly or alternative solutions that will make it easier for you to monitor them. Set aside time in your calendar to check out what's new.
Choose the right tools: Make a list of the tools you use and go through each one, verifying the availability of an API. If it's not available, check out the alternatives, keeping in mind not just the API itself but also the user interface you'll ultimately be using. Here you can find the current list of apps I use myself: brain.overment.com/Tools/Apps
Understand the capabilities and limitations of AI: Using the knowledge acquisition system you've built, familiarize yourself with materials that will help you better understand what AI is and at what stage of development this technology is. Look for unconventional applications that will allow you to start discovering solutions tailored to your needs. Even if you come across examples that you can't directly apply to your daily life, consider whether you can use any of their individual elements in some way. To start, you might want to check out Introduction to Large Language Models
Explore by building: Experiment and build. The examples I discussed today showed you how easily you can create incredibly useful tools that save you time or make various tasks easier. Only by doing this will you encounter problems that will actually teach you how to effectively use Generative AI in your everyday life. If you're a programmer, start by running the document translation script and try to adapt it to your needs.
The potential applications of Large Language Models (LLM) in content processing are extensive. It's wise to choose the ones that best align with our needs, our projects, and our company. An additional incentive is that nearly all AI integrations in business involve data handling. Therefore, it's beneficial to start organizing our knowledge bases and updating our tools sooner rather than later. For bigger organizations, these updates can take a while, so it's smart to begin early and implement changes gradually. No matter our current position, it's often best to initiate the transformation by ourselves.
That wraps up today's post! If you found it helpful, please pass it on.
Meanwhile, I'll be working on creating more content for you.
Have fun,
Adam