PETER
        HAYES EXAMINES THE POLITICS AND USES OF SEARCH ENGINES 
         
        We all know that the
        Internet is a mine of information, but like their
        industrial namesake dividing the raw rock from the
        precious minerals can sometimes takes quite some time and
        effort!  
         
        Naturally the giant mining machines of the Internet are
        the so-called "search engines" that help the
        user gain access to the kind of site that will either
        answer a question or give them background inform on a
        topic or theme. 
         
        We will look more at the strengths and weaknesses of the
        individual service providers in the second part of this
        series, but today we look at good searching practise and
        the best ways to look for information on the Internet. 
         
        The first thing that we need to explore is the obvious
        and the easy. Most major businesses have web addresses
        that sound like the company name (Sun computers says that
        it enjoys extra UK hits because people are looking for
        the newspaper site!) and I've simply guessed a few in my
        time. 
         
        Failing that they should come up on ALL the engines by
        their name and country alone. Equally some companies even
        buy up sound alikes/near misses to help the poor typer -
        although these will simply leap the correct location. 
         
        (On a similar note I've come across sites that are dead.,
        but have been sold to others who use the address to make
        the user jump to them. Whether you approve of this is -
        or want to go there - is quite another matter!)  
         
        If you were to divide search engines in to two parts you
        could divide them in to those that use robots and those
        that use humans. Robots add sites to the service by
        looking for included words (including in special page
        headers officially called meta tags) and key phrases and
        are good at doing work in bulk, but tends to store the
        minor next to the major. 
         
        Perhaps the key to the success of Yahoo! (the number one
        and most visited engine) is the fact that all its sites
        are vetted by humans. This means that rarely will you
        come across a page of just pictures and basic hellos,
        equally they rarely include pages from free cyberspace
        providers. They do, however, include small sites just as
        long as they have some business connection or social
        function. 
         
        The first thing to realise when priming your "search
        box" is that words are prioritised: The first word
        having more meaning than the second. Equally you should
        start with the most specific number of words you can
        think of such as (musical+instruments+Hull+uk) and work
        your way down by clipping words off. 
         
        The next thing to remember is that not all engines are
        the same. In some cases Yahoo!'s exclusive policy works
        against it and the basic information is best found on the
        robotic sites. Equally there is often a backlog of
        material to be registered and the correct site may be
        locked up in that. 
         
        (This is the problem that many webmasters forget anyway,
        they register too late thinking that their site will
        appear in a few days. Five or six weeks if you are
        lucky!) 
         
        Being too general can lead to a whole mountain of
        information: The last thing you want is 10,000 sites to
        swim through for that vital piece of information or data!
         
         
        The first thing that most people do is turn on a computer
        and then think. Too late. You need a strategy in advance.
        I was talking (with a friend) about a piece of
        exploitation television (on Bravo) and talking about
        "Confessions of a Taxi Driver" and the basic
        irony that the star, Barry Evans, was murdered while
        working as a real taxi driver. However I didn't know the
        full story - until I hit the keypad. 
         
        Here typing stuff like "Barry+Evans+actor" will
        come back empty on all the engines I tried. Although
        there is little harm in trying such a long shot in any
        search. Here I need to find a directory of actors and the
        films that they were in. I therefore worked through these
        key words and came across a site about British actors
        (www.uk.imdb.com) which included details of his career
        and death. 
         
        (He was hit over the head by a burglar at his home and
        died from his injuries, to the best of my knowledge the
        case remains unsolved.) 
         
        The problem with names is that you must be sure that you
        have the right one. Several authors use my name without
        my permission, trying to cash in on my name and
        reputation no doubt. Even more curiously some of them
        taking on subjects close to my heart! 
         
        Never ever use words such as "and",
        "or" or "the" because this will queer
        your pitch. Equally words such as "sex" or
        "mp3" will simply leave you with too much data.
        Be very careful when looking for sexual education
        material or health issues - obscene spoofs are not
        unknown. 
         
        (Spoofs often are used as links and form a kind of
        Internet humour - the Whitehouse site has several spoofs
        that appear to be real thing, until they are explored...) 
         
        Typing "not" will take out examples that don't
        fit the bill (Arsenal not soccer, for example), but this
        is hard word to use and control. In Yahoo double meanings
        are automatically divided out. Also the engine can easily
        come up with ties to words that you would never think of
        in a million years - including simple names. 
         
        Naturally there is a difference between information and
        correct information. I trust material that has been
        published in print and from news agencies far more than
        from a fifteen year old in his back bedroom. Nevertheless
        I've come across things that are plain wrong in
        heavyweight encyclopaedias or is simply opinion presented
        as fact. 
        Equally information can go out of date or the site falls
        in to disuse and the information is no longer valid. The
        claims of vested interests should also be judged as such
        and the Internet has many wild claims about commercial
        interests that would be challenged in other media.  
         
        Search engines can be dangerous, even if you are not
        looking for dodgy information. Without getting in to too
        much detail there are sexual practises that can be
        classified under their more innocent references. However
        most such sites have warnings that the content is for
        those over 18 - however it is not unknown for the engines
        to bypass this page. 
         
        If you are going to make commercial use of what you find
        you need to be careful. The biggest joke in this business
        is that "stealing from one is plagiarism, stealing
        from many is research!" The real truth is that we
        journalists borrow from each other all the time and is a
        common device when interviews do not go so well to
        include references made to others! 
         
        Naturally if you are
        presenting work on the Internet itself there is no harm
        in linking to the source of the material and giving
        credit to the person/organisation that first said it.
        Links are a different kettle of fish and you can include
        someone else's collection all you want - there is no
        copyright issue there. 
         
        One of the new ways of searching for information is
        through multimedia CD's. This can save a lot of time and
        are ideal for children, because you know that the site
        has been checked out and given some form of stamp of
        approval. With all the goodwill in the world even
        mainstream mediums can dip in to obscene language and
        show unpleasant scenes. 
         
        Tim Berners-Lee invented the hypertext system so that you
        could leap from document to document with the minimum of
        effort. However this breaks down on commercial sites who
        are hardly likely to plug a rival. Equally the most
        promising of name or title can lead to a dead end of weak
        and unhelpful sites. Technical subjects being by far the
        worst culprits. 
         
        The one thing that is debated is how much of the Internet
        is registered with search engines. The most popular
        theories say that about a sixth of all sites have some
        kind of a listing and about half could be reached by
        links. However this is probably just a guess.
        Nevertheless very little of the unmapped world is
        significant and a lot of it is just personal sites that
        will be largely irrelevant to those that don't know the
        people in question. 
         
        Having said that there has been times where a site has
        been next to useless on itself - but the links have been
        given have saved simply hours of work on my own part. 
        Trinity 2002 (C) 
         |