Every Alley
I once worked on a project normalizing a massive set of image data.
Terabytes of images, read one by one. Standardize formats. Filter out corrupted files. Attach metadata. Tedious work. Run the pipeline, wait, check the results, tweak the parameters, run it again. Nothing glamorous about it. But without that mud, no analysis or machine learning downstream would have worked.
Behind every dominant service, the work is mostly mud.
Google Maps pointed a camera down every back alley. They drove cars, pedaled bicycles, strapped trekker rigs onto people's backs and sent them walking where vehicles couldn't go. Satellite photos stitched and corrected by algorithms. Address data cross-referenced. Store listings verified one by one. Behind that map is a staggering amount of physical labor.
X's timeline handles a stream of hundreds of millions of people writing simultaneously, in real time. Follow-graph expansion, ranking, filtering. Behind the smoothness of that scroll are algorithms that border on stunt work and designs refined over years.
Machine learning is the same. Running the training itself isn't that hard anymore. Stack GPUs, plug into a framework, tune parameters. But collecting the source data is a different order of magnitude. Labeling, cleaning, checking for bias, clearing rights. Tedious, endless, thankless. Training is the final push. The 99% before it is mud.
AI made output fast. Write a prompt and code appears. Images appear. Text appears. But the people who built dominant services spent their time photographing alleys one by one, fixing broken HTML one file at a time, labeling data one record at a time. Work that can't be done with a few clicks, sustained for years.
I want to be part of building a dominant service. Instead, I'm here typing prompts.