Size Matters
Sending 100x more satellite data back down to Earth
Dear SoTA,
Before we started The Compression Company, Joe (my cofounder) and I spent a lot of time thinking about computer vision. Specifically, about what Ultralytics and Roboflow had done for it. These two companies took something that previously required a team of ML engineers and collapsed it into a workflow a single factory engineer could operate: upload your images, label your defects, train a detector, deploy it on the line. Thousands of manufacturers now run custom vision models that would have been research projects a decade ago. It is one of the more under-appreciated infrastructure shifts of the last few years.
But every conversation about deploying CV at scale runs into the same wall: labelling. Annotation accounts for over half the cost and timeline of most vision projects. Each new product line, each change in lighting or camera angle, each new defect type requires human annotators who understand the domain. For a factory, that means training people to recognise things only a handful of specialists can identify. It is slow, expensive, and fragile. The model is only as good as its labels, and maintaining label quality over time is its own discipline.
We kept coming back to a question: are there tasks with the same shape (learn a function from data, deploy on edge hardware) but without the labelling requirement? Tasks where the data supervises itself.
Compression is exactly this. When you train a neural network to compress an image, the target is the original image. There are no labels. You give the network data and it learns to reconstruct that data from a smaller representation. The training signal comes entirely from the data itself. What this means in practice is that you can take a customer’s data, from whatever sensor they happen to operate, and train a compression algorithm for it without asking anyone to label anything. The customer provides data and a deployment environment. We train and deliver a codec that fits their needs. That is the basis of The Compression Company.
Now, there are many industries where data volume is a bottleneck, but few more than Earth Observation. A 2024 paper from Carnegie Mellon and Microsoft estimated that roughly 2% of satellite-collected data ever reaches the ground. When we started speaking with operators, what we heard was that the real constraint was not the sensors, which have improved enormously, but the bandwidth of the downlink and the capacity of onboard storage. Millions of dollars going into extraordinary cameras in orbit, and the factor that determines how much value they deliver is how quickly data can move through a radio frequency link during a ten-minute ground station pass.
Most of these operators compress with JPEG 2000 (named after the year it came out): a well-qualified codec, but a general-purpose one, designed for consumer photography and barely adapted to the Earth observation data it was actually processing. This is not unique to satellites. Compression has been general-purpose for most of its history, largely because the dominant data type was video and there wasn’t enough pressure to specialise for anything else. That is changing. Earth observation, LiDAR, medical imaging, agricultural and emergency-response drones: there is now enough volume and variety of high-value sensor data to justify building codecs for specific modalities, trained on the data itself, that outperform general-purpose alternatives by a wide margin.
Consider a wildfire response team receiving full-fidelity satellite footage in the field, rather than a downsampled preview that strips out the detail they need to coordinate an effective response. Or a physician sharing a diagnostic MRI with a remote specialist during the consultation itself, rather than waiting for an overnight file transfer that delays treatment by a day. Or a fleet of agricultural drones mapping crop health at centimetre resolution across thousands of hectares, where preserving key detail is the difference between catching blight in one field and missing it until it has spread to twenty. These are real workflows, all constrained today by the compression (or lack thereof) sitting underneath them.
Compression is self-supervised, which means building these codecs can be fully automated. An agent can ingest a customer’s data, characterise it, train a codec, validate it, and deploy. The labelling bottleneck that constrains computer vision does not apply here, which means that in a world of increasingly capable agents, compression is a task that compounds.
We ship our first orbital deployment next month. If you work on compression, sensor systems, or anything where data outpaces the ability to move it, we would like to hear from you.
Help us build the next small thing!

