Marc's LIS 2600: Readings

I had a bit of trouble finding the first reading of the week. I'm not sure why the link provided said Pitt didn't have full text access. Perhaps I'll report that to someone in the collections department....

Anyway, much like other readings we've done this semester, the Web Search Engine reading really shed light on how search engines actually work. We've all been talking about what is wrong with library searching, and how Google is able to do such a good job with searching and providing relevant results, but until this reading, we really haven't covered why Google is able to do such a good job, and fully realize the amount of work that goes into indexing websites. I can't say that I fully understand all that goes into making a search engine work, but this two part article made me realize that it really is a complex system.

The second article Current developments and future trends for the OAI protocol for metadata harvesting, went back to the earlier themes of the class: how the library can work together with computer scientists to create a better way of searching and indexing. The OAI community is trying to make things easier, but because of the way metadata works, is running into troubles. One of the biggest troubles is, again, the difficulty in creating a standard vocabulary for inputting that metadata. Another problem that they are facing is that the task is just too big for one group, a relatively scattered group at that. It seems like they want to be an almost "informal" group, but in doing so, they are really crushing their chance to even begin to make a dent in the problems they face.

The first thing I thought of while reading “The Deep Web: Surfacing Hidden Value," was the idea that the more a journal article is cited, the more reputable it becomes. This article is almost saying the opposite, if I'm understanding it correctly. (I know they are talking about websites, rather than scholarly articles, but bear with my metaphor.) Web crawlers such as Google, find websites to index based on how many times that site is linked with another site. So, the more a site is linked and clicked on, the more likely it will get indexed by a search engine, and the higher it will be on a result list. The article seems to be saying that it is sometimes the more important sites that get skipped, and un-indexed. That is an interesting thought. It makes you think, that perhaps Google is doing more harm to the internet than good.

3 comments:

MelissaNovember 18, 2010 at 5:48 PM
Marc, that's an interesting comment you made about the OAI community's desire to remain informal actually making it more difficult for it to succeed. I think you are on the right track, and I also wonder if the issue with a controlled vocabulary make communication and access between the groups practically impossible?
Lost ScribeNovember 19, 2010 at 6:21 AM
Hello Marc,

I like your line of thought regarding Google's relevancy being tied to how many times a site is linked or clicked on. It does bring up the thought that popular is better, even if it is not always correct or even fully relevant to what you are looking for.

So now I have to ask: if libraries and other institutions take the Google Approach, will we have to worry about navigating "popular" choices instead of things actually relevant to the search, or is there a way we can avoid this altogether?
MagpieNovember 19, 2010 at 7:19 AM
I agree, you've gotta wonder, now, about the rankings on a results list. I sure hope it's not the case that the more important sites are getting passed over, simply because they weren't linked from other Web sites enough to warrant indexing of their own! Kind of makes you think about the results you get, don't it? This was eye-opening to me. I think we DO need to worry about the "popular" vs. "relevant" results, and I'm guessing we will likely encounter this kind of dilemma every day with patrons/users/clients who sit down at a computer and do a Google search.

Thursday, November 18, 2010

Readings

3 comments: