Preserving the Internet

( Matt Cardy / Getty Images )
Mark Graham, director of the Wayback Machine at the Internet Archive, explains his work preserving digital history in the face of fewer companies preserving their web content and the White House directives to take down government web pages. Plus, Annie Rauwerda, who runs the Depths of Wikipedia social media accounts, weighs in.
Brigid Bergin: It's the Brian Lehrer Show on WNYC. Good morning again everyone. I'm Brigid Bergin in for Brian today. The internet is a powerful tool that offers us access to a wealth of information at our fingertips, but that information is also subject to change. Some changes are garnering increased attention as the Trump administration deletes government web pages that go against the administration's views, names, words, whole portions of the nation's history are being removed, seemingly at the press of a button.
There are tools that can prevent this erasure through a digital purge. That's where the Internet Archive comes in, in particularly the Wayback Machine, which has been preserving relics of the internet for decades. Their first snapshot of WNYC's website was all the way back in 1998. I cannot imagine what that looks like.
To understand the history of the Wayback Machine and the work that they're doing now, we're joined by Mark Graham, director of the Wayback Machine at the Internet Archive. Mark, welcome to WNYC. It's so great to talk with you.
Mark Graham: Hey, fantastic. I'm really happy to be here. It truly warms my heart that we had the capture, the archive of your website from so long ago. I hope it played back well.
Brigid Bergin: I have to go back and look at it. I have not seen that yet. I am definitely gonna check that out. Now, before we get too into the weeds, let's talk about, very broadly, what is this platform that we call the internet?
Mark Graham: That we call the internet? It's a nervous system, a communications platform for humanity. It's the underlying fabric upon which applications like the web have been built. The web is obviously this environment that for the last several decades human beings all over the world have shared their experiences. It has become the communication and publishing platform of our times.
Brigid Bergin: Then how did you personally get involved with the Internet Archive and what is its mission?
Mark Graham: Oh, super briefly. I've been working for more than 40 years with this thing called the Net and evolved into the internet and then the web, et cetera and a whole variety of companies. My last job was with NBC News, for example.
About 10 years ago there was an opportunity to join the organization, the Internet Archive, which is a nonprofit digital research library with the audacious mission of universal access to all knowledge. How could anyone miss an opportunity like that?
Brigid Bergin: It sounds fascinating. Let's talk a little bit more about what the work is that is done there. The Wayback Machine opened to the public back in 2001, started caching websites, capturing snapshots of that data all the way back in 1995, which is pretty early in terms of the internet. How did the idea come about to keep historical tabs on what was then such new technology?
Mark Graham: Sure. The founder of the Internet Archive, Brewster Kahle, had this idea early on about 30 years ago. He was one of the pioneers of what was then the emerging web, that this new medium, that he and others were creating was going to be important. At the same time, it was somewhat inherently ephemeral. That material that was made available via the internet, via websites was pretty fragile.
Unlike books and other media on paper, et cetera, like magazines that we have lots of copies that are typically spread around, the websites tend to be singular entities. If a website went down, if a company went out of business, if a hard drive failed, any one of a number of reasons, that material probably would be lost. What we've learned over the last three decades is that that is entirely true.
Like about 28 years ago, Brewster basically press record on this new medium called the web. Every day in that time, the team at the Internet Archive, we've got about 140 staff members, works hard to archive a large amount of the public web. Today, we archive about a billion URLs every single day. We make that material available publicly at web.archive.org.
Brigid Bergin: That's amazing. Listeners, have you used the Wayback Machine before? I know if there are any campaign consultants out there that you've used the Wayback Machine before, are you using it now to find web pages that seem to be gone? Is there a particular webpage that you've noticed is missing and you want to talk about, you want to share with us? Maybe you lost some of your own work.
Do you use the Wayback Machine for research or fun? Or is there a particular relic of the internet that you just want to talk about? Give us a call. The number is 212-433-WNYC. That's 212-433-9692. Mark, I mentioned in the introduction that there are some attacks on our digital landscape that are somewhat unprecedented. I'm wondering, has the Internet Archive seen anything like this before?
Mark Graham: We've seen attacks on libraries throughout history. The story of libraries traditionally is attacks by church and state. Most libraries have been destroyed actually by those in power. Attacks on libraries is not anything new. We're seeing right now attacks on digital libraries is also, unfortunately, not without precedent.
Typically in other countries when regimes change, often material from the prior regime is erased. We at the Internet Archive, once again, we're a digital research library. As a library, we support and promote the four digital rights of libraries, the right to collect material, to preserve material, to lend material, and to cooperate with other libraries. We think that if we can strengthen the infrastructure of libraries all over the world, then that will be of supreme benefit for humanity.
Brigid Bergin: Mark, are there any particular web pages that you wish you had been able to save on the Wayback Machine but you weren't able to save?
Mark Graham: Let me put it this way. There's been any one of a number of studies that have looked at the atrophy of the web. There was some studies that done it at Harvard University's Library Innovation Lab that looked at URLs in Supreme Court opinions and also URLs in New York Times articles.
What they found was that, typically you see tens of percentages of those web pages are no longer available on the live web after 10, 15 years. There was a Pew Research study last year that looked at a series of URLs that were 10 years old, and they found that 38% of them were no longer available on the public web.
The good news is that when we went and we looked at that material and we said, how many of those were available in the Wayback Machine? Generally speaking, more than half of those URLs had been archived by our service over the decades. The answer to your question is, unfortunately, the 30 or 40% of those URLs that we hadn't archived are certainly ones that I wish we had.
Brigid Bergin: If you're just joining us, I'm Brigid Bergin filling in for Brian. Today, my guest is the director of the Wayback Machine, Mark Graham. Joining us for just a few minutes is Annie Rauwerda, internet personality and journalist covering the internet, most known for the depths of Wikipedia social media accounts she runs. Annie, welcome to the show.
Annie Rauwerda: Hey, thanks for having me.
Brigid Bergin: Annie, briefly, how do you utilize the Wayback Machine and other tools from the Internet Archive in your online presence?
Annie Rauwerda: I use the Internet Archive all the time. I edit Wikipedia, and Wikipedia has so many citations, like 30 million citations. Like Mark was saying, it's very common that you'll be reading a Wikipedia article, you'll see a citation, you'll click the link, and there will be nothing there. That's why the Internet Archive is so important. There's a partnership between Wikipedia and the Internet Archive to make sure that all of Wikipedia's citations are archived.
Brigid Bergin: Oh, that's so interesting. I know a lot of what you cover is maybe a little bit more light hearted than some of the challenges perhaps, would be a better way of saying it, facing the Wayback Machine today. Would you make the argument for increasing the awareness of this digital history even when the stakes aren't as high?
Annie Rauwerda: Oh, absolutely. There's so many pages on the internet that are labors of love by interesting people that go away, maybe they stop paying for their domain, maybe the platform that it's hosted goes away. I am a big fan of oldmouth.com. Somebody in Montana spent two decades meticulously maintaining a corpus of computer mouse information. Why? Just because they were interested. Then it went offline and so now it's only available via the Wayback Machine.
Brigid Bergin: Mark, I hear you with some reaction to Annie. What are your thinking?
Mark Graham: Oh, I love it. Hey Annie, how's it going?
Annie Rauwerda: Hi, Mark.
Mark Graham: Hey there. We've been working for a long time with the Wikipedia [unintelligible 00:10:37] all over the world and the Wikimedia foundation here in the United States to archive material that's referenced on Wikipedia articles nearly at the same time as people post them.
Then going back over time, and every single day going through the full corpus of all Wikipedia articles, hundreds of millions of them, looking for broken links and where we can't fixing them. To date we've been able to rescue more than 23 million URLs on Wikipedia articles that had gone bad, that had returned a 404 and edit them to point to archives of those URLs available from the Wayback Machine.
Brigid Bergin: I want to bring in one of our listeners, Leah in Manhattan. Leah, thanks for joining WNYC. Leah, are you there? I think we don't have Leah on the line. I want to thank Annie Rauwerda of the depths of Wikipedia for your time and for giving us another perspective on this crucial issue. Annie, thanks for making some time to join us.
Annie Rauwerda: Oh, of course. Thanks for having me.
Brigid Bergin: I want to continue now with you, Mark, the director of the Wayback Machine at the Internet Archive. As we talked with Annie about, this is open to the public. Does that mean that anyone can save a page to the database if they want to?
Mark Graham: Yes, it does. Anyone can go to web.archive.org/save, that is the entry point for our Save Page Now service. Anyone can then enter a URL, hit a return key, and we will do our best job of archiving that. Then they can then see it and share it with other people. We will maintain that, hopefully forever.
Brigid Bergin: I think we can go back to Leah in Manhattan now. Leah, do we have you now?
Leah: Hi. Yes, I hear you.
Brigid Bergin: Good. Go ahead.
Leah: Oh, yes. I am an archivist by my skilled profession. I use the Internet Archive to, of course, capture everything that is digitally created for intellectual property. My tradition is for primary source materials. One of the things that I love, I also teach about web archiving online.
I tell people all the time that you can go to the Internet Archive and you can save any page with the Save Now option and to always do that just in case something is important and you don't want it to get lost. Any recommendations on anyone who wants to just save stuff on the internet as quickly as possible.
Brigid Bergin: Leah, thank you so much for that. Mark, I'm sure a call like that has to be a little heartwarming to hear people talking about the benefits of this tool that you work on. Any reaction to Leah's call?
Mark Graham: It totally is. Thank you, Leah. We like to say if you see something, save something. I like to talk a little bit about what's happened most recently in our government and with this new administration. Every four years, the Internet Archive has gotten together with a number of other organizations to embark on a project that is called, we refer to as the End of Term Archive.
We've been doing this since 2004. What we do is do a deep dive on US Government websites before and after presidential elections. We started this effort again last summer. We did a very comprehensive archiving of material published by the US government, which, by the way, is the world's largest publishing organization.
We did that before the election, we did it again after the election, and then we did it again after the inauguration. We have, as part of this end of term archive project, accumulated about 2 petabytes of material from the web, and make this material available for bulk downloading for researchers and also available through the Wayback Machine.
Brigid Bergin: I think part of what I'm understanding from that is that perhaps some of the websites or webpages that people are not able to find now may have been successfully archived through that effort before the change of administration. Is that correct?
Mark Graham: Correct. Yes. Like I said, this is something that we've done every four years. It wasn't anything new related to this particular administration. However, what we've all noticed is that immediately after the inauguration, certain web materials started to be removed. Most notably, for example, the entire website for the USAID was taken down on Inauguration Day.
Researchers, journalists and others have been combing through this material, through these archives and have identified tens of thousands of web pages that have been removed since the inauguration.
Brigid Bergin: Let's bring in a few more callers. Let's try Jessica in Bayside, Queens. Jessica, thanks for calling WNYC.
Jessica: Hi. Thank you so much. Thank you for taking my call. Thank you to Mark and the team. I'm on one of the community boards in New York City. There's 59 of them, and we're appointed by the borough president and oversee various things. Part of what the community board staff is required to do is publicize that there are upcoming meetings, either full meetings of the board or meetings of the committees two weeks before the meetings.
I've been having some back and forth, push and pull with the overseers of the board, the district manager and the community board president about how that hasn't been done. They're blaming it on the Department of Technology with New York City or else telling me that I'm incorrect and that things were posted, but because of the Internet Archive and the fact that I'm able to save those individual pages, I'm able to show what was actually happening and that they were not posted in time.
Also show previously what the norm was for the other district manager before the current one, because there should be some standard, and it doesn't seem like there is, because they're able to hide behind just whatever they're giving. It's almost like a Freedom of Information Act request without all of the paperwork. That's really helpful.
Brigid Bergin: I love that description. Jessica, thank you so much for that explanation and for essentially, the reporting that you're doing and accountability work within your local community board using this tool. Let's bring in Gregory in-- Oh, do you want to respond to that?
Mark Graham: I just want to comment on that. That's wonderful. The end of term archive project is part of something we call Democracy's Library. That's an effort of the Internet Archive to focus on material published by democracies around the world, especially the United States and Canada.
The use of the archives by journalists, by people who are trying to reflect on our current history and past history, is a wonderful use case. There's been more than 10,000 articles that have referenced the Internet Archive or written about us . They're available from archive.org/about/news-stories.
Brigid Bergin: I want to bring in a question from a listener that was texted, before we get to one more caller. The listener texts, "What about the right to be forgotten when it comes to the Wayback Machine?"
Mark Graham: Sure. That's a great question. I'll just say that, that we respect our rights holders, and that generally speaking, URLs may be excluded from the Wayback Machine at the request of legitimate rights holders.
Brigid Bergin: Very interesting. There is a right to be forgotten. Gregory in Eldred, New York, thanks for calling WNYC.
Gregory: Oh, great. Thanks. First time caller here. As soon as I heard the discussion, I knew I had to give thanks. I'm an amateur historian. From about the early 80s through the mid-90s, a tremendous number of, I guess we'll call it indigenous websites were up on the web where individual World War II veterans and their families put up the veterans personal accounts and photographs.
That was parallel with a movement revelation where the veterans were revealing themselves and the experiences that they had. As time went on, the website faded away. There were even many from that time period that lasted for another 10 or 15 years. The critical elements there were not just the experiences of the vets, but an area that didn't have much exposure, and that was the prisoner of war experience.
My website was made in the '90s in association with some other folks whose parents were World War II vets. And we put this thing together and we're still active. The Wayback Machine, I don't know exactly when I became aware of it, but I'm able to go there using keywords and retrieve some of these, I will call them gems. What's the right word?
They're naïve in that respect. They came from a time before we used the phrase the greatest generation. Lots of that stuff is now-- Thank you for your service. You're a great hero. These stories were coming from a very naïve place where people were just speaking of their experience, not for fame and glory.
Brigid Bergin: Gregory, thank you so much for that story. I just wanted to give you a chance, Mark, to respond to-- I'm sure stories like Gregory's are ones that you've heard in terms of the types of both personal and historical information that has been created and existed on the internet and now may only be something you can find through using the Internet Archive.
Mark Graham: I was at the edge of my chair there hoping that he was going to say that we had, in fact, archived a lot of that material. It is. It's the individual life stories, it's the big stories of our times that are published by media organizations, made available by governments, academia, et cetera. It's the full breadth of the human experience that we at the Internet Archive work to collect, to preserve, and most importantly, to make available.
We think that a healthy archive is a archive that is used. I would just like to encourage all of the listeners, if you haven't, to go to archive.org, that's the overall website for the Internet Archive, or web.archive.org for the Wayback Machine, and explore, maybe some of your own history is recorded there that might not be available anywhere else.
Brigid Bergin: In our last minute or so, I'm going to ask two very big questions, Mark. One, there have been requests in the past for websites to be taken down. 2002, there was a controversy where the Church of Scientology demanded a critical website, not owned by the church, be taken down, and the Wayback Machine complied. How will the Internet Archives stand up to potentially similar requests from the Trump administration?
Mark Graham: Gratefully, we haven't received any. That's an interesting hypothetical question. I would say that we evaluate every single request like that individually. We take into consideration a variety of factors. The public right to know generally being one of them, and the historical significance of the material, et cetera. I have to say, maybe I'm naïve here about this, but I don't anticipate that particular threat.
This is material that was produced by the United States government, that has been published on US Government websites, and that we as a library have simply done-- We're doing our job, as a library, frankly, to help make up for a flaw of the web, architectural flaw of the web, that it does not inherently archive or preserve material. We're going to just continue to press forward and work to be the very best library that we can possibly be.
Brigid Bergin: Then, my final question, and just very briefly, but you've talked about archiving just enormous amounts of information. As my understanding, a lot of it is stored on physical computers. Is there a risk of running out of space? Then how do you protect those machines short-term, long-term, from both physical risks, or from hacking or any other risk to what is essential this archive?
Mark Graham: Sure. As long as people continue to make hard drives and we're able to buy them, then I think we're going to be okay. We add somewhere in the neighborhood of 20 petabyte of new hard drives every single year.
Brigid Bergin: I've never even heard of a petabyte that. It sounds so enormous.
Mark Graham: It's 1,000 terabytes. We store this material in multiple physical locations. That's one of the ways that we work to ensure the integrity of the material long-term. I think we're going to be okay there. We survive through the kindness of others. We're a non-profit organization. I think if people want to help support us, I would say there's two things they can do.
The first thing they can do is use us more than anything else like the various callers shared just. Come to our website and explore. We are here for the curious. Then the second thing is if you want to support us in our work, we're always grateful for that as well. You don't have to. Just use us. That's the most important thing, and tell your friends.
Brigid Bergin: Thank you so much. Our guest has been Mark Graham, director of the Wayback Machine at the Internet Archive. Mark, thank you so much for coming on the show today. It was really fascinating.
Mark Graham: Fantastic. It's been a pleasure. Thank you very much.
Copyright © 2025 New York Public Radio. All rights reserved. Visit our website terms of use at www.wnyc.org for further information.
New York Public Radio transcripts are created on a rush deadline, often by contractors. This text may not be in its final form and may be updated or revised in the future. Accuracy and availability may vary. The authoritative record of New York Public Radio’s programming is the audio record.