Prompt Engineering Is Snake Oil
In the tech industry, we frequently find ourselves caught up in the latest trends and buzzwords. One such trend that has recently gained popularity is "prompt engineering" — a practice that encourages crafting impeccable input messages to elicit desired output responses from large language models (LLMs) such as GPT-3.5 Turbo, Llama 2, and others. While it may appear to be a panacea, it is crucial to recognize it for what it truly is: a form of snake oil that diverts attention from the essential skills required to effectively leverage the potential of LLMs.
The allure of prompt engineering lies in its apparent simplicity. The concept is straightforward: create a well-structured prompt, and the LLM will generate the output you seek. However, this simplicity conceals a deeper problem. Instead of fostering genuine expertise in working with LLMs, this trend often promotes a superficial approach that is akin to beautifying the exterior of a house while neglecting its structural integrity.
One of the fundamental issues with overemphasizing prompt engineering is that it creates a false sense of competence without necessitating a profound understanding of the technology. Many individuals in the tech realm are dedicating more time to mastering the art of prompts and less time to exploring the intricacies of fine-tuning LLMs. This is detrimental to both emerging AI professionals and the field as a whole.
Consider an example from our own experience at Grandiose, our agency. We recently collaborated with a bootstrapped startup client who had paid a contractor a substantial sum of $5,000 to construct a system that, as it turned out, was merely a patchwork of clever prompts strung together using LangChain. The moment the system encountered input for which the model's prompts were unprepared, the entire setup unraveled. To compound the issue, the system message alone consumed a whopping 760 tokens!
We intervened to salvage the project. In just five days, through thorough data collection and fine-tuning, we not only reduced the system message to under 200 tokens but also transformed the LLM into a robust reasoning tool using a custom JSON format. This example underscores the critical significance of proper training and fine-tuning, rather than relying solely on prompt engineering.
We've all heard complaints about GPT-3.5 Turbo, particularly when compared to its successor, GPT-4, seemingly struggling to follow instructions. Guess what? In our experience, this is a non-issue with a properly fine-tuned GPT-3.5 Turbo model. In fact, GPT-4 can serve as the "prompt engineer" that assists in generating the training data.
In essence, prompt engineering is akin to handing someone a cookbook and teaching them to follow a recipe to create a particular dish without imparting the art and science of cooking. Yes, they can follow the steps and produce a dish, but they won't truly grasp the essence of cooking, nor will they be prepared to adapt to different ingredients or unforeseen challenges in the kitchen.
Another concerning aspect of the prompt engineering trend is that it encourages a "black box" mindset. Users become content with the notion that they can control the model's output through clever phrasing, without questioning or comprehending how the model generates those responses. When applied to real-world scenarios in fields like healthcare, finance, or law, this can have serious consequences.
The tech industry requires professionals who can not only interact with AI models but also comprehend and address their limitations. We need experts who can responsibly harness the power of these models for the benefit of society, rather than those who can craft clever prompts.
Shameless plug: If you are interested in acquiring these skills, want to fine-tune a model for your specific data and structure, or aspire to create a new AI product altogether, please consider exploring our agency, Grandiose. We offer flat rates and guarantee our turnaround times.