Skip to main content

Space News & Opinion

Mercury Online

Searching Hubble’s Archive for Hidden Gems

Image: Spiral galaxy NGC 1309 hosted a supernova in 2012. By searching through archival Hubble data from 2005-2006, astronomers found out the type of star that exploded: a remnant white dwarf. [NASA, ESA, The Hubble Heritage Team (STScI/AURA), and A. Riess (JHU/STScI)]


This feature was originally published in the Spring 2020 (vol. 49, no. 2) issue of Mercury magazine, an ASP members-only quarterly publication.

For three decades, a treasure trove of data and discoveries about the cosmos has sat safely in a Maryland facility stored inside hundreds of tapes, laser-written optical disks, magnetic disks, and computer “jukeboxes.” Spurred by requests from around the world, workers in the ’90s and early ’00s would rummage through dense aisles of these data-filled vessels day-in-and-day-out to collect, copy, and share the information collected by one of NASA’s most ambitious projects: the Hubble Space Telescope.

In its first 30 years of life, Hubble has faced peril as well as delights. The telescope’s breathtaking images have taught us to think deeply about our universe and have inspired generations of scientists to learn how to explore it. These data have led to the discovery of dark energy and supermassive black holes at the centers of galaxies, and also characterized alien worlds. But, while direct observations made using the telescope have been incredibly important for the science Hubble has been able to do, securing time on the telescope has continued to be competitive and challenging.

That is where NASA’s jukeboxes of data become essential. The Hubble Legacy Archive is a first of its kind data archive of not only directly observed Hubble data but also of high-level science products created using new analysis processes on decades old data sets. While originally stored on only physical disks and tapes in the Maryland facility, this massive database of every Hubble observation has been moved to the cloud in recent years. Organizers have joined that data with the archives of NASA’s K2, the forthcoming James Webb Space Telescope, and other missions in a massive international archive called the Mikulski Archive for Space Telescopes (MAST).

While archives may conjure images of dusty museum basements with miles of drawers filled with ancient beaver pelts, the Hubble archive is one of the project’s most active and creative resources. In 2019 alone, published papers using Hubble observations were based on at least 50 percent archival data. Discoveries made using the Hubble Legacy Archive have not only shaped our understanding of science in the past decades, but the archive has shaped how we do science as well.

From a place to the cloud

Long before the archive was the international superstar it is today, Rick White, the Space Telescope Science Institute Archive branch chief, says it was simply a back room in Maryland filled to the brim with data on optical discs.

“The only way people could get data from the telescope is that it would flow from the telescope, get processed, get sorted and archived, and then they’d retrieve it from the archive — all data went through the archive,” says White. That’s because the raw data had to first be processed before it could be read by scientists.

The Hubble Legacy Archive has been collecting data since the telescope launched in 1990, and for many years it was the first stop before any data ever reached its intended scientists. But the influx of data was too large for typical storage systems, said White.

“The way it physically worked was that the volume of data coming in from Hubble in 1990 was too large to store on [normal] disks… so there was this big, complex set-up that involved writing the data on to optical disks, which were these big, 12-inch-sized platters,” said White. “There were optical disk ‘jukeboxes’ that had slots to hold something like hundreds of disks.”

These disks used to hold hundreds of gigabytes of data, but today the cloud-based database can hold hundreds of terabytes.

Long before the archive had moved to the cloud and could be easily accessed around the world with just a simple click, archive staff would physically fulfil data requests by locating stored optical disks in their respective jukeboxes and copying the data on to tapes, which they then physically mailed off to researchers. Each data request would be answered in a 24-hour period. While staffers often had to work hard to complete those requests on time, White recalls a particular event that sent requests through the roof — not from the scientific community, but from the general public.

In 2013, a UFO conspiracy website discovered an image in the archive that had strange lines and stripes. That website told its fan-base how to access the Hubble archive to download the image themselves for proof. “There were several million people who were doing exactly the same thing — they were following the link from this post and they were downloading the image, and it completely saturated our web server,” says White. “Our web server is not setup to handle the interest of millions of people.”

That image (shown below) was actually a composite of three exposures of Comet C/2012 S1 ISON, and those strange lines and stripes were just a parallax effect. The comet was close enough to the telescope compared to background stars that it moved and its image smeared between those individual exposures. White says they were able to quell the onslaught of requests by posting a small letter on the archive’s main webpage explaining that there was in fact not photos of UFOs hidden in the archive and showing those exposures.

Archived data keeps giving

While this influx of public requests created a temporary problem for the archive, Antonella Nota, associate director of the European Space Agency (ESA), says that the wide accessibility of data from the Hubble Legacy Archive has been an extremely important part of its success. It has not only increased the public’s access, but it has also had a powerful effect on how young astrophysicists access this data.

“In typical old-fashioned astronomy, the astronomers go to the telescope, get their own data and put it on magnetic tape, and then they take them home,” says Nota. “But Hubble basically broke that paradigm [because it] offered all the data in the same location and made them available to everybody.”

Before the Hubble Legacy Archive was established in 1990, observational data would be available only to those who conducted the initial observation. The notion that these data should be publicly shared after initial analysis by the observing scientists has created an important shift in how astronomical data is viewed and used, says Nota.

Joshua Peek, a Principal Investigator at MAST, adds that this has been incredibly important for his students. They are able to work with archival data and start getting their names into astronomy journals.

“One way we like to think about the archive is a way for people to get involved with Hubble data without having to go through the permission process of having their proposal approved,” says Peek. “That ends up as a way for them to join the scientific community and the literature, and then that’s a stepping stone into making proposals [for original observations] that are going to pass the committee.”

And even more than simply continuing the work that scientists set out to do in their initial proposals, the “patchwork” of observations collected and stored in the Hubble Legacy Archive allow scientists to do incredibly creative and innovative work. For example, Peek refers to one of his favorite unlikely discoveries: A team of researchers used archival data from Hubble’s fine-guidance sensors to overturn assumptions about the number of rocky objects at the edge of the Kuiper Belt, the band of objects — including Pluto — just beyond Neptune. They found an icy comet-like object just 3,200 feet (975 meters) across as it passed in front of a star. The tiny size suggests Kuiper Belt Objects are being ground down by collisions. “I really like when people use the entirety of the archive in some totally strange way,” Peek adds.

The breadth and history of Hubble’s observations also make the archive an incredibly useful resource. As new researchers return to old data with new computer algorithms, they can process data in ways that weren’t possible a few decades ago and discover new objects previously hidden in the data.

The archive's legacy

As science missions and projects begin to shift their focus toward deep surveys of the sky, the Hubble Legacy Archive has an important role to play. “The [Rubin Observatory’s] Legacy Survey of Space and Time is going to be doing this enormous survey of the whole sky, many, many, many times over, which is going to give us this incredible time-domain perspective,” says Peek. “People are studying things that change in time, [and] what makes the archive so powerful is that it allows you to go backward in the other direction.”

When reflecting on the history of the Hubble Legacy Archive, as well as its future, it’s important to remember that there could be no archive without the Hubble telescope itself and the high-quality, science-ready data it provides. There is also a different mindset, says Peek, when it comes to studying the archival data. While directly observing photons from Hubble may seem immediate, there is a long wait before you can ever analyze that data. With the archive, getting a hold of data is a lot more immediate, for everyone.

It is this shift in the way scientists can do science through these archives that will be the archive’s great legacy. “[A] philosophical change has been provided by the establishment of the archive. These last 30 years have been revolutionary for the way people do science,” says Nota. “It goes from the privilege of one to sharing with the entire world, because the archive is available to everybody — you just need to have an internet connection. This is to me the democratization of science.”

Sarah Wells is a science and technology journalist based in Boston who’s interested in how innovation and research intersect with our daily lives. Her work has been published in Undark,, Inverse, Gizmodo, and, among others. Read more articles by Sarah.