Archive.org Celebrates Its 20th Anniversary (sfchronicle.com) 42
20 years ago this week, Archive.org started with just 500,000 sites. An anonymous reader quotes the San Francisco Chronicle:
Now, the nonprofit San Francisco organization -- which celebrated the milestone with a party Wednesday night -- curates a vast digital archive that includes more than 370 million websites and 273 billion pages, many captured before they disappeared forever. It's more than an archive of Internet sites. The organization, founded by computer scientist and entrepreneur Brewster Kahle, now has a virtual storehouse ranging from digitally converted books and historic film to funny memes and audio recordings of Grateful Dead concerts...
The Internet Archive has survived through community donations and by working with about 1,000 libraries around the world that pay the group to help digitize books and other material. But the site itself remains free.
We've written about Archive.org over the years, and its collection of 2,400 DOS games, over 10,000 Amiga games (and other software) and a massive collection of arcade machine emulators. And here's what Slashdot looked like back in 1998. But what's your favorite page on Archive.org?
The Internet Archive has survived through community donations and by working with about 1,000 libraries around the world that pay the group to help digitize books and other material. But the site itself remains free.
We've written about Archive.org over the years, and its collection of 2,400 DOS games, over 10,000 Amiga games (and other software) and a massive collection of arcade machine emulators. And here's what Slashdot looked like back in 1998. But what's your favorite page on Archive.org?
robots.txt (Score:5, Informative)
One thing I greatly dislike about archive.org is that they retroactively apply current robots.txt contents to archived versions of a site.
I had a website that I sold years ago which now has a no crawl directive so the entire history is gone from the archive. Why would they remove archived versions which permitted crawling?
Re: (Score:2)
Re: (Score:2)
I cannot agree more. It is baffling to me that they would do such a thing.
I'm guessing it's tied to some sort of legal liability issue or something like that. If it's not, then I'd love to hear their reasoning on why they do this.
Re: (Score:2)
https://archive.org/post/10194... [archive.org]
https://news.ycombinator.com/i... [ycombinator.com]
https://archive.org/post/18880... [archive.org]
But no solid answers as to why.
Re: (Score:2)
Robots.txt isn't supposed to be a tool for de-indexing yourself anyway -- it's only supposed to control spidering. Archive.org should specify a different file to set if you want to be de-indexed, and that file can specify a retroactive date if desired. Or just make people fill out a form on the site to de-index.
Gimp (Score:2, Insightful)
Saw the Slashdot screenshot's article on Gimp and realized that it still sucks as badly now as it did 18 years ago.
Re: (Score:1)
Why is the Gimp icon the only one that works?
One cool thing I found... (Score:2)
Audio of back-to-back Jefferson Airplane concerts from October 1966--Sygne Anderson's farewell show with the band, followed by Grace Slick's first one on the following evening, both at the Fillmore. Part of the Anderson gig was eventually (sometime in the 2000s, I think) released commercially.
She died earlier this year--on the same day as Paul Kantner, IIRC.
Archive.org is getting so old (Score:2)
Re: (Score:1)
There are two different archives of the Internet Archive. One is at the Library of Alexandria and the other is a distributed system that is done by many of the same people who are a part of Archive Team.
this is my favorite section (Score:2)
if I owned a shortwave broadcasting station i would play those old radio shows exclusively
i remember (Score:3)
Re: (Score:2)
Didn't Work. (Score:4, Funny)
Re:Didn't Work. (Score:5, Funny)
Re: (Score:3)
I use Apple's Time Machine. It doesn't work.
Re: (Score:3)
You were holding it wrong!
Re: (Score:3)
Steve Jobs, is that you? Aren't you dead? :P
Re: (Score:3)
Time machine, remember?
Re: (Score:2)
Prove it. :P
Re: (Score:2)
I refused to register at first once it was required to post--over those pesky, untrustworthy, cookies.
It took a while before I wanted to post something enough to both overcome my distrust and to allow anything to set a cookie . . .
10th anniversary was better! (Score:2)
On their 10th anniversary, [archive.org] their front page had things you wanted to read about and things you cared about: conspiracies!
9/11 Revisited: Scientific and Ethical Questions
September 11th Revisited - Were explosives used?
;)
A few other good things (Score:2)
The Old Time Radio archive, the Public Domain Movies and some kodi addons
10,000 Amiga games GONE (Score:2)
All of the Amiga games were taken down after a ~week.
Robot reading of "The Book of Urantia" (Score:2)
I like the audio recordings of the entire "Book of Urantia" in a computer generated Robot Voice, like for example "The Paradise Sons of God" from "The Central and Superuniverses": https://archive.org/download/U... [archive.org]
Thousands of games (Score:2)
Most of the currents right holders won't care much but guys, this isn't right. And surely there's someone at Archive.org who knows this.
Suspicious Treatment of Domain Drop Catching (Score:3)
Archive.org plays it dumb when archived content becomes unavailable due to a domain drop catcher [wikipedia.org] placing a robots.txt archiving exclusion on the domain.
This would not be quite so suspicious if it were not for the fact that when the original author of the material "memory holed" by archive.org pays the extortion to the domain drop catcher, archive.org and requests that archive.org restore the content for the public, archive.org will frequently (always?) fail to do sodo so.
Archive.org's motive?
What is Google's motive for making its Usenet archives virtually unusable?
He who controls the past... [goodreads.com]
Bravo, it did not save the mp3 files (Score:1)
Re: (Score:1)
I love archive.org (Score:2)