Parmeshwor Vetwal: THE INTERNET AND WORLD WIDE WEB

1.0 What is the Internet?
The Internet is a global connection of computers. These computers are connected via a huge network of telecommunications links. The Internet allows you to access to a whole resource of data and information stored at different sites (called hosts or servers) and locations all around the world. The communication links that interconnect each host computer use a common method of transmission, known as TCP/IP, which stands for Transmission Control Protocol/Internet Protocol.

Each computer connected to the Internet (by the way, it is always spelt with a capital I) can act as a host. A host computer provides information for other people to access and retrieve.

1.1 How did the Internet evolve?

The Internet had its origins in the cold war between Russia and America during the 1960's. Concerned about the survivability of its communications in the event of a nuclear strike, the US air force needed to ensure that it could still communicate with its forces.

Internet as a connection of nodes

The RAND corporation proposed a system with no centralized authority, as any centralized system would be a target of any possible attack.

The proposed system, developed by Paul Baran, suggested a decentralized system that would still operate even if parts of it were destroyed.

All interconnections in the network could send and receive messages, forwarding them onto other interconnection points (called nodes) until the message reached its destination.

Packets travel different paths

can survive node failure

Information would be sent in little packets, each packet would be self contained and have its own address information. Packets would travel from node to node, each node deciding how to send the packet to the next available node. Even if some nodes were destroyed, the message could still be sent by an alternative route.

In this way, the network would withstand a nuclear strike. After implementing the network, it was known as ARPANET (the Advanced Research Projects Agency Network) and used by the US military and US universities. Gradually, as more and more connections were made, it has evolved to the Internet.

1.2 What sort of features or services does the Internet provide?

The Internet provides for a wide range of services. Some of these are listed in the table below.

Service	Description of Service
EMAIL	Electronic mail. Permits the sending and receiving of messages to other users connected to the Internet.
FTP	File Transfer Protocol. A means of sending and receiving files from one computer to another.
GOPHER	An early form of representing information as graphical icons or symbols that could be displayed in a window and then downloaded. It has been replaced by the WWW
USENET NEWS	A number of discussion groups that allow users to post questions and replies, sorted by topic. Also known as news.
WWW	World Wide Web. Accessed using a web browser such as Netscape Navigator or Internet Explorer, a means of locating and displaying information located on the Internet.

1.3 What makes up the Internet?

The Internet consists of a number of computers called hosts and interconnecting equipment such as routers and telecommunication links that interconnect routers and hosts together.

A host is a computer on the Internet. Each host is capable of acting as a provider of information, such as files, documents or images. A host can also access information on another host if it has the required permissions to do so.

A router is a device that joins telecommunications links and groups of computers together. It provides a mechanism for determining a route (or path) between the two computers that want to exchange information.

routers

Each host computer is identified in two ways. Firstly, each computer on the Internet has a unique assigned name, such as host1.cit.gov.np which is referred to as its domain name. Secondly, each computer on the Internet also has a unique numerical address, called a TCP/IP address. This is a group of four numbers joined by dots. For example, the computer known as host1.cit.ac.nz could have a TCP/IP address of 156.59.20.49

The allocation of unique domain names and TCP/IP addresses is controlled by governing bodies in each country.

A host computer that provides information for others to use is normally called a server. Examples of servers are web servers, file servers, mail servers and news servers.

1.4 Who controls the Internet?

No-one actually owns the Internet as such, though telecommunications companies do own and operate the links that carry all the information being exchanged between host computers. There are governing bodies that control or oversee certain parts of the Internet. One example is the allocation of domain names and IP addresses. Another is the control of standards, such as RFC's (called Request For Comments, documents that propose or suggest new standards or services for the Internet).

InterNIC is responsible for the world wide allocation of domain names and IP addresses.

1.5 How do I access the Internet?

Each user can access the Internet through connection on an existing network or via a modem (a device that allows the computer to use a telephone line to a remote network or ISP) from a remote site such as a private residence. The data and information that can be accessed on the Internet comes in numerous different formats and there a wide range of applications that interpret the information for the user.

Internet Service Providers (ISP's) are companies which provide you with access to the Internet. This can be via dial-up connection using a modem, or using a higher speed connection. Various charging levels may exist, but a popular method for home users is flat rate (per month unlimited time and data amount). Traditionally the Internet was purely a text based global pool of information and access was either limited or required a certain specialized knowledge. The development of the Internet today has ensured that information now comes in other formats such as graphical, audio and animated images, and the interface for such information is now a lot more dynamic and user friendly.

1.6 What is the common protocol of the Internet?

The common protocol (language that the computers use to communicate with each other) is TCP/IP, known as Transmission Control Protocol, Internet Protocol.

1.7 What are domain names?

Servers or host computers are arranged according to geographical location. For instance, all countries in the world have a country suffix, except the USA. New Zealand’s suffix is nz, while Canada's is ca and Nepal’s is np. Typically, the domain name of a host computer looks like

server name
organization name
type of organization
country name

For instance, the server www.kec.edu.np defines it as a host called www, belonging to an organization called kec, which is an educational institution (edu stands for educational) located in Nepal (np means Nepal).

Similarly, the server www.yahoo.com defines it as a host called www, belonging to an organization called yahoo, which is a commercial organization located in the United States.

Some common abbreviations are

Abbreviation	Meaning
ac	academic
co	company
com	commercial
edu	educational
gov	government
govt	government
mil	military
net	large Internet service provider
org	non-profit organisation

Some common country names are

Abbreviation	Meaning
au	Australia
be	Belgium
ge	Germany
jp	Japan
mx	Mexico
np	nepal
uk	united kingdom

Exercise 1
It is often possible to guess the URL of a specific company. Take for example the company Microsoft, a commercial organization in America. The URL for Microsoft's web server is http://www.microsoft.com

Other companies use abbreviations to signify their URL.For example, Auckland Institute of Technology in New Zealand use http://www.ait.ac.nz

See if you can guess the URL's of the following

Name	Country	Type	URL
Kathmandu University of Nepal	Nepal	Educational
Ministry of Education	Nepal	Government
NASA	America	Commercial
Navy	America	Military
Curtin University	Australia	Academic
Harvard University	America	Academic

1.8 What are e-mail addresses?

One way that users communicate with each other on the Internet is via electronic mail. It is just like writing a real letter to someone that you want to communicate with. All letters have a header or address portion. For example, you write the name of the person and where they live on the envelope. This information is used by the postal services to ensure your letter is delivered to the correct destination.

In the same way, electronic mail (email) has a header or address portion, which is used by routers and other devices on the Internet to determine the correct destination. Electronic mail is created using a mail program.

In the real world you are identified by a name and a location (the place where you live or work). In the electronic world of the Internet, a similar process is used. Each user has an assigned name (called an account) given to them by the network administrator or ISP. This identifies your name. The other portion is your address, which is a server on the Internet that holds your mail (it’s like your physical mail box). These two portions are linked together using the @ (called at) sign.

Here is an example

amajikanepal@hotmail.com

This identifies the username as amajikanepal on a mail server at hotmail.com

Many organizations now offer you the chance to open a free email account (such as hotmail.com, gmail.com and yahoo.com).

1.8.1 What if you do not know someone's email address?

So what do you do if you don't know their email address. The first step would be to try and find out what organization they work for. For example, we know John works at a company called First Events and they have a web site at firstevents.com

All mail servers have a special account called postmaster that is used for a number of purposes. We could send an email request to postmaster@firstevents.com requesting John's email address.

CHAPTER II

2.0 What is the World Wide Web?

The World Wide Web is a collection of host machines which deliver documents, graphics and multi-media to users via the Internet. The common protocol that is used on the WWW is http, which stands for Hyper-text transport protocol. It is a protocol that adds on or runs on top of TCP/IP, the common protocol used for communication between hosts on the Internet.

Each server computer on the World Wide Web can provide files in any format, such as a graphics file, text document, PowerPoint presentation or audio file. Prior to the World Wide Web, the display, searching, and viewing of files was complicated and time consuming.

2.1 What are the advantages of the WWW?

The WWW allows users to link documents together using clickable links. A browser (viewing application that displays web pages) is used to view the content that server computers provide. This browser automatically displays the documents correctly formatted along with the graphic images or additional multimedia components that the author of the page wishes to incorporate.

Some of the advantages of the WWW are

Easy linking of documents to other documents
Content is simple to create and make available for use
Support for mixed multimedia components in a single document (such as text, images, sound)
Open standards
Accessible to anyone connected to the Internet with an appropriate web browser
Facilitates the publication, dissemination and sharing of information on a global basis

2.2 What is a Web Browser?

You are already using a web browser to view this document. A web browser is a software application that interprets documents that you request from a web server on the WWW and displays them for you to view. The two common types of web browsers are Netscape Communicator and Internet Explorer.

2.3 How Do Web Servers Work?

Pages or files are stored on Web Servers. Users access these pages using a graphical browser like Netscape Navigator or Internet Explorer. Pages can include graphics, sound, movies and other media rich content, as well as references to other pages on the same site or other sites.

web server handling client request

When a client requests a document or file from a WWW server, a connection is made to that computer using the HTTP protocol. The WWW server services the request, locates the information, and sends it back to the client. The connection between the client and the WWW server is then released. The client browser software then interprets the retrieved HTML document and formats it on the client computers screen.

2.4 What is a Hyper-Link?

A hyperlink is a clickable link to another document or resource. It is normally shown in blue underline. When a user clicks on a hyperlink, the client will retrieve the document associated with that link, by requesting the document from the designated server upon which the document resides.

2.5 What is a URL?

A Uniform Resource Locater is a means of specifying the pathname for any resource on the Internet. It consists of three parts

a protocol part
a host part
a document name

For instance, the following URL

http://www.kec.edu.np/admission.htm

Specifies the protocol as http, the host or WWW server as www.kec.edu.np and the document as admission.htm.

A URL can be typed into the Address bar Window of a web browser.

web browser address bar

2.6 How do I find Information on the Internet?

Because there is so much information available on the Internet today, it is important to be able to access information quickly.

Several sites have specialized in searching. They do this by creating an index (called a database) of documents that are stored on WWW servers. Users request that their sites be added to this database by using an on-line form, in some cases this is free, in other cases there could be charges involved.

Some sites which specialize in offering search facilities are,

www.yahoo.com
www.excite.com
www.metacrawler.com

Spiders or Web Robots are programs normally run by these sites which browse your site and add what it finds to its database of subjects and pages. Spiders can create a heavy load on a WWW site and virtually overload it by progressively going through every document, requesting these documents at a very rapid rate.

Searching for information on the Internet is covered in chapter 4.

CHAPTER III

3.0 What services are available?

The Internet and WWW offer a range of services. Some of these deal with transferring of information from one computer to another, such as file transfer or FTP (file transfer protocol). Other services deal with accessing remote computers and running programs on them, such as provided by Telnet.

3.1 Archie

In the early days of the Internet, people commonly used to transfer files between computers or make documents and programs available via FTP (file transfer protocol) servers. These used a command language, unlike the graphical format of web browsers we are familiar with today.

One of the early problems we had was how to find programs and information. Web sites like yahoo and excite did not exist, and in fact, the WWW did not exist at that time either. So people used to index the files they had available and send the location of those files to a common server that would index them. This was commonly done overnight, so that new files would appear in this index the following morning. Only the file names were indexed, not their contents. If you did not know the name of the file it was very difficult to find where on the Internet it might be located.

As people wrote programs and made them available, archie was a quick way to determine where the program was available so you could download the program and use it. In the early days of the Internet, traffic costs were high, so people used archie to find the nearest location to them where they could download the file from, thus avoiding costly traffic charges.

This index was called archie. Today, the large search engines like yahoo and excite has replaced archie, making it much easier to locate documents and programs.

3.2 Gopher

It was not long before people using the Internet realized that finding information on the Internet was a hard task. Gopher consisted of two parts, a Gopher server and a Gopher client.

A Gopher server was a means of providing documents and programs (in fact any type of file) to users. Whereas archie and ftp where purely command type interfaces (where users had to type commands and the responses were given back as text), Gopher implemented a means of displaying the information graphically for the client. This made navigation of the documents available on Gopher servers much easier.

Another advantage was the ability of Gopher servers to link to other Gopher servers. If you could not find the information you were looking for on one Gopher server, you could try the next one and so on till you located the desired information.

The Gopher client was a primitive form of web browser, and displayed the information available on the Gopher server in graphical format using small easily recognizable icons. This meant you could readily identify the information as a text document or a program. Clicking on the associated icon downloaded the file to your computer, where you could save it on disk for later viewing or use.

Picture of a Gopher client

In the picture on the left, it displays a typical Gopher list of the information available on the Gopher server.

Note the use of different icons to represent different file types.

Gopher is seldom used today, with the WWW and web browsers offering much more functionality.

3.3 File Transfer (FTP)

FTP stands for file transfer protocol. It is a means of transferring files between computers of different types on the Internet.

Standard commands are used to connect to computers which hold files. These computers are called ftp servers. When you logon to an ftp server, you need to supply a username and password. Most ftp servers allow users to use an account called anonymous, which allow users to logon using this account with limited access. When users logon to an ftp server using the account anonymous, it is usual for them to use their email address as the password.

The features of ftp are,

ftp servers exist to provide a storage place for useful files and programs
users can logon to these servers and download the files to their local computer
you have to know the server name and whereabouts the file is located (subdirectory and filename)
it is command driven using commands like open, bye, get
it is hard to see the general structure or layout of any ftp server [would be nice to have a roadmap or hierarchical diagram of the directory tree]

To learn more about ftp, try the associated ftp exercise.

3.4 Telnet

Telnet is a service that allows a user to login and run programs on a remote server on the Internet. This requires a user account (name and password) on the telnet server. On the user computer, a telnet client application is run, similar to the ftp command interface discussed earlier.

The general features of telnet are

you must logon using a specific account and password
you run programs on the host computer
program output is seen on your own computer
it is interactive
you must know the commands and what programs are available

All commands you type are sent to the host computer you are connected to and executed there. You see the programs output on your local computer. In this way, it is possible for a user to run programs on a much more powerful computer than their own, or run software which may not be present on their own local computer.

3.5 E-mail

Electronic mail is a way of sending messages between users over the Internet. As outlined in chapter one, each user must have an email address. An email program allows users to create messages and send them to other users.

There are five essential parts to an email message.

email message

Email uses the protocol SMTP (Simple Mail Transport Protocol) to forward email messages around the Internet. When you create an email message it is send to your mail server and held in an outgoing folder. At regular intervals, your mail server will attempt to send the messages in its outgoing folder by contacting the destination mail servers and transferring the messages to them. This principle is called store and forward. If the destination mail server cannot be found, the messages are eventually discarded.

Bounced email occurs when an email message is sent to a mail server, but there is no valid recipient for that message. For example, the address postmistress@mail.cit.ac.nz would be returned (by deliberately misspelling) with no such user.

3.6 Whois

This is a service that provides information on Internet users, domain names such as kec.edu.np and organizations. Using it, you can find email addresses, host computer names and domain names. Organizations and users register their names, host computers and organization domain names in a large registry database.

There is a large whois server at http://whois.userland.com/.

Here is a sample output of a whois query.

E:\temp>whois microsoft.com
[whois.internic.net]
Microsoft Corporation (MICROSOFT-DOM) MICROSOFT.COM
Microsoft.com (STARTPLAYING-DOM) STARTPLAYING.COM
Microsoft.com China (SUFUTE-DOM) SUFUTE.COM
Microsoft.com China (AC56-DOM) AC990.NET
Microsoft.com Timko Enterprises (TIMKOSOFT-DOM) TIMKOSOFT.COM
microsoft.com (M1CR0SOFT2-DOM) M1CR0SOFT.NET
microsoft.com (GOTASF-DOM) GOTASF.COM
microsoft.com Micro-Soft Corporation (MICROASS2-DOM) MICROASS.NET
E:\temp>

Not all domains are able to be queried, and many organizations have not bothered to register, so it is of limited value.

3.7 Finger

In the early days of the Internet, the number of host computers or servers was significantly less than there are now. In addition, there was not the level of knowledge about companies and who operated what servers that there is today.

Some people operated servers that ran the finger service, a program that responds to finger requests from a client. This finger service would respond with information about who operated the computer and other details like a contact name, address or phone number.

Finger is not used much today. Try the finger exercise.

3.8 Wais

As people and organizations began to publish information on the Internet; a method was required of indexing the contents of documents so that people could search these indexes by keywords or other criteria. This led to the development of Wide Area Information Servers (WAIS, pronounced "ways").

There were four basic elements to WAIS, a client program to perform searching, the WAIS server, the indexed database, and a tool for indexing the documents in the database.

Internet search engines such as Yahoo and Excite have replaced the functionality of WAIS.

3.9 Usenet News

Usenet news is a large discussion type service that allows users to post and reply to messages in certain categories. It is a great way to ask questions and communicate with people of similar interests.

News is held on a news server. This server is connected to the Internet and regularly gets news articles from other news servers. Any new messages that local users write are forwarded from the local news server to the next level up, and so on, until the message propagates to all other news servers in the world.

As you might expect, this can generate a large amount of information traffic. To help with the organization of news, articles are placed into groups. There is a hierarchy of groups, illustrated below,

alt	alternate lifestyles
biz	business companies
comp	computers
misc	miscellaneous
rec	recreational
soc	social issues

In addition, there are also country specific and organization specific news groups. For instance, Nepal has some news groups which begin with np.

To read news you require a news-reader client and access to a news server. News uses the NNTP protocol that runs on top of TCP/IP.

3.10 Ping

The ping command is a program that tests to see if you can reach a specific host on the Internet. It does this by sending a special echo request to the host computer, which then replies. The ping program sends these echo request packets every second, and displays the results on the screen.

A good analogy is sonar between ships and submarines. A ship searching for a submarine sends a sonar ping, and if it hits a submarine, the sonar is echoed back to the ship and thus the ship knows it has found the submarine. Ping works just like this.

The default setting for ping is to send four echo requests one second apart to the host computer, and await each reply.

The following shows a sample output for ping:

C:\>ping ds.internic.net
Pinging ds.internic.net [192.20.239.132] with 32 bytes of data:
Reply from 192.20.239.132: bytes=32 time=101ms TTL=243
Reply from 192.20.239.132: bytes=32 time=100ms TTL=243
Reply from 192.20.239.132: bytes=32 time=120ms TTL=243
Reply from 192.20.239.132: bytes=32 time=120ms TTL=243

C:\>

3.11 Traceroute

Tracert is a program that determines the path taken over the Internet to a specific host computer. It does this by forcing each router along the path to send back a message, and displays these messages on the screen, effectively building up a list of all routers and links that is between the client computer and the host computer.

Tracert can establish where a connection is broken. Take for example our own organization at CIT. If we were talking to a computer at say microsoft.com, there is a reasonable number of paths and routers involved. Firstly, our lab is connected to a router that connects us to our ISP, which in turn connects us to Kathmandu, which in turn connects us to an overseas link, which in turn connects us across many other routers till we finally end up in Redmond Washington, USA. If a particular link fails, tracert will display that link as 'unreachable', showing us whether the fault lies in our own organization, that of our ISP, an international link or some link within the USA.

Each link is called a hop. The following tracert example is from a computer within CIT (on remote dialup) to Victoria University of Wellington (in the same city but on a different ISP).

C:\WINDOWS>tracert www.vuw.ac.nz

Tracing route to totara.its.vuw.ac.nz [130.195.2.249] over a maximum of 30 hops:

1 143 ms 116 ms 116 ms rasnt.cit.ac.nz [156.59.20.76]
2 124 ms 118 ms 118 ms route64.cit.ac.nz [156.59.64.1]
3 130 ms 119 ms 117 ms portal-e0-10m.cit.ac.nz [156.59.220.2]
4 150 ms 128 ms 126 ms ba2-atm5-0-11-lmtn.Wellington.clix.net.nz [203.97.0.149]
5 139 ms 125 ms 125 ms ba1-fe0-0-0-lmtn.Wellington.clix.net.nz [203.167.244.161]
6 143 ms 131 ms 177 ms netlink1.wix.net.nz [202.7.0.65]
7 145 ms 138 ms 133 ms c8.wlg.netlink.net.nz [203.97.132.146]
8 139 ms 143 ms 134 ms vuw-cl81.netlink.net.nz [203.97.144.81]
9 141 ms 131 ms 135 ms vuw-wn1e2-10m.netlink.net.nz [203.97.176.202]
10 147 ms 134 ms 148 ms totara.its.vuw.ac.nz [130.195.2.249]

Trace complete.

C:\WINDOWS>

3.12 Nslookup

NSlookup is a tool that queries special servers (called domain name servers, or DNS for short) for computer domain names and their associated IP addresses.

For instance, our server ice.cit.ac.nz has the IP address of 156.59.20.50 and is also a web server. If you typed in the URL of this server in your web browser, what actually happens is that your computer first queries the DNS server with the name, and the IP address of the ice.cit.ac.nz is returned to your computer by the DNS server.

NSlookup allows you to check the configuration of a DNS server. NSlookup is an interactive program (like ftp that accepts commands, though there are some graphic versions around) and is available on Windows NT workstation computers (it is not on Windows 95 or 98 computers).

The following example runs nslookup and queries for the IP address of the server ice.cit.ac.nz

C:\>nslookup ice.cit.ac.nz
Server: araiahi.cit.ac.nz
Address: 156.59.65.10

Name: ice.cit.ac.nz
Address: 156.59.20.50

C:\>

Try the sample exercises on Ping, Traceroute and Nslookup.

CHAPTER IV

4.0 Search Engines and Finding Information on the Internet

In this section you will learn about search engines and how to locate information on the Internet.

4.1 What is a search engine?

A search engine is a means of searching for information that can be found on the Internet. For example, when accessing a search engine you might specify that you want to search for information about "Polar Bears", in which case the search engine would return all the URL's it knows about that has information about Polar Bears.

This picture is an example of a search on www.excite.com for the phrase "Polar Bear".

search on excite

4.2 How does a search engine know where the information is?

There are a number of ways a search engine can know about where information is to be found. Firstly, a search engine can list information by keywords or page titles. These keywords or titles (subject categories) can either be submitted by user's that provide information on the Internet, or can be extracted by accessing web pages and extracting the page title and keywords from the header of the web page.

This keyword extraction relies on the appropriate HTML code in the header of the web page (it is called a meta-tag). The advantage is that it quicker to index information and less traffic is involved (only headers are requested from web sites, the entire web document is NOT read).

The second method a search engine can use relies upon reading every page it knows about (usually pages are submitted for inclusion by web authors). This technique involves the use of programs called 'spiders' or 'web robots' that request every page then extracts all words from the content of the page and stores these words in a large database.

Not all search engines are the same. Some use keyword extraction via meta-tags whilst other use keywords via page content indexing. Obviously content indexing is a much better method because you are more likely to find specific information. However, this method has a number of problems. One is the sheer size of the resultant database and number of pages involved (which means a lot of traffic, and it might take two weeks to fully search all those pages). As the size of the WWW continues to grow this becomes more and more difficult.

Keeping the database up-to-date is a serious problem. It is common to find that pages returned by a search engine have in fact since been moved or deleted.

A search engine that uses topic keywords and categories is yahoo.com whilst a search engine that uses content indexing is www.altavista.com or www.excite.com.

4.3 How does a search engine work?

When a user accesses a search engine, they are presented with a graphical interface form, on which they specify what they are looking for. When they tell the search engine to start the search (by pressing "enter" or clicking on a specific button), the search engine invokes a program that queries its database (a collection of all the web pages it has access to).

The results are returned to the user as a number of possible URL's. Often, these will be ranked in priority or success rate, with higher values meaning more likely to contain the information you request (what it really means is that it contains more occurrences of the key words you were searching for compared to other documents).

4.4 What is a MetaSearch engine?

A metasearch engine queries two or more search engines simultaneously and then displays the results. Metasearch engines do not have their own database or index web pages; instead, they send their queries to search engines. The results are integrated, with duplicates often being removed or merges, and sometimes rank the returned results.

Metasearch engines are good for quickly identifying page location especially where the keywords used are very specific (using common words returns virtually useless information). In addition, metasearch engines do not return all the pages found to the user. They often only return the top ten or so pages.

You use a metasearch engine the same as a standard search engine, except that you also probably specify the search engines to look at. In addition, metasearch engines do not accept the advanced queries that a standard search engine does.

Examples of metasearch engines are www.northernlight.com and www.cyber411.com.

This picture is an example of a search conducted on cyber411 for "Polar Bear".

You should be able to see that not all pages returned by this metasearch engine deal with "polar bear"s.

search on cyber411

4.5 What are Yellow Pages and Directories?

Yellow pages and directories are specific information available on the Internet. This information can contain usernames, company information and product information. An example of a directory is www.yahoo.com that lists a large number of categories. Another is www.infoseek.com.

Exercise 1
Use the yellowpages site listed above and find a company in Wellington that cleans carpets.

Exercise 2
Use www.infoseek.com to browse the directory listings on Topics-Education-Internet-Acceptable Use

4.6 What other type of searchable indexes are there?

There are a number of other databases available on the web. These allow you to search usenet news articles (www.dejanews.com), newspaper articles, journals, magazines, business and personal directories.

Web Site Name	URL	Comment
Bigbook	http://www.bigbook.com	Shopping, City and Consumer Guides, Yellow pages
Dejanews	http://www.dejanews.com	Search usenet news postings
Encyberpedia	http://www.encyberpedia.com	An Internet encyclopedia
Infospace	http://www.infospace.com	Directory of people, business, government, cities
Localeyes	http://www.localeyes.com	Search for business web sites by typing business name
Medsite	http://www.medsite.com	Search for medical information
Whowhere	http://www.whowhere.com	Find people on the Web
World Email	http://www.worldemail.com/	World email addresses
Worldpages	http://www.worldpages.com	Business and people search

There are also many databases online that are subscription service only. One such example is ABI/Inform from http://www.ovid.com/, which provides a comprehensive index of full text articles from thousands of journals and magazines.

4.7 Basic Search Queries

Simple queries involve you specifying a keyword or multiple keywords. The more specific you can be the better, choosing words carefully. For instance, the keyword "polar" will also return occurrences of the words "polars" and "polarity". Your query is entered into the search dialog box of the form associated with the search engine, and pressing the enter key (or clicking on an icon called "go" or "search" will instruct the search engine to begin looking at its database for matches.

4.7.1 By Keyword

Keyword searching is popular where you are looking for specific information. As an example, lets say we want to find out when christopher columbus discovered America. Typical keywords we could use are "Columbus", "America" and "discover". Note that in this instance I have not used the keyword "Christopher".

The reason for this is that when you use keywords, the search engine returns all pages containing that keyword. So, in the above example, we get returned all pages containing any of the keywords. This means any page containing Columbus, or the keyword America, or the keyword discover.

This is really not what we want. We want all pages that contain all three keywords, so we should modify the query to specify that all three words are required.

4.7.2 Keywords and the AND operator

In the previous example, any page containing any of the keywords was returned. This time, using the AND keyword, we can specify the query as

This will restrict the search so that all three keywords must occur in each returned article.

4.7.3 By Phrase

Lets take the example that you remember a phrase someone once said, but you have forgotten who said it. Searching by phrase looks for the exact occurrence of words in articles of the database. The phrase is enclosed in double quotes. The following example searches for the phrase "Now is the time of our"

4.7.4 Using the NOT operator

Once you have a number of returns to your query, you might look at these results and determine that the search engine is including other pages that have a common element which you do not want. For example, in our search on Columbus, the search engine might be returning a large number of results that contain items for sale and museums. In this case, we want to restrict our query further. To do this we can use the NOT operator to specify keywords that are to be excluded, this eliminated these unwanted results.

Tip
Use lowercase in search phrases. Most search engines will return both upper and lowercase results if you use lowercase. If you use uppercase, only uppercase matches will be returned.

4.8 Advanced Search Queries

Most search engines allow advanced search options. These allow you to sort the results, specify dates and exclude more results. The following information specifically applies to altavista, but some other search engines support these options.

4.8.1 Using the + operator

The + operator works in the same manner as the AND operator, meaning that the word or phrase must occur in order for the reference to be returned by the search engine.

4.8.2 Using the – operator

The - operator works in the same manner as the NOT operator, meaning that the word or phrase must not occur in order for the reference to be returned by the search engine.

4.8.3 Excluding certain domains

In our search for information about Columbus, we might decide it is a good idea to restrict our search to only servers that are educational in nature. Thus we can specify that only servers in the .edu domain need to be searched (as this is more likely where the information will be found).

4.8.4 Looking for URL links

There are times when we might want to know who is linked to our pages. We can do this by using the link operator, as illustrated below. This example searches for all links that have the words "smac/csware.htm" anywhere in a URL link.

Note
Most metasearch engines do not support advanced search queries.

4.9 A list of search engines and on-line databases

The following table is a brief summary of some of the online resources you can use to find information on the Internet.

Search Engines and Directores	Name	URL	Comment
	Altavista	http://www.altavista.com	250+ million pages and six million articles from 50,000 Usenet newsgroups. Full text search of pages. Good for academic research.
	Excite	http://www.excite.com	130+ million pages and directories. Full text search.
	Galaxy	http://galaxy.einet.net/galaxy.html	Directory, thousands of web pages.
	Hotbot	http://www.hotbot.com	110+ million pages, WWW, Usenet newsgroups, most keywords on pages indexed.
	Infoseek	http://www.infoseek.com	100+ million pages, web sites, but also images, newsgroups, faqs, news, companies, phone numbers, and email addresses, current news and Directories.
	Lycos	http://www.lycos.com	Directory, reviews of the best Web sites, browse-able by category and rated on a scale of 1-50.
	Magellan	http://www.mckinley.com/	Directory, rates and reviews of sites, academic resources, gopher, ftp, telnet, and newsgroup pages. Organized into 24 primary topics (such as autos, business, education, lifestyle, new, relationships, travel).
	Northern Light	http://www.northernlight.com	Search web sites, articles or a special collection (over 1 million articles from books, magazines, journals, newspapers etc). Sorts results by relevance, and removes duplicate links.
	Webcrawler	http://www.webcrawler.com	Search web sites, usenet, gopher, ftp. Searches only titles, keywords and URL's.
	World Lecture Hall	http://www.utexas.edu/world/lecture/	Directory of academic resources.
	World Virtual Library	http://www.vlib.org/	Directory of resources and subjects.
	Yahoo	http://www.yahoo.com	Directory, useful for browsing, general searching and locating popular sites. 14 broad categories (such as computers/ internet, news and media, regional, society and culture).

MetaSearch Engines	Name	URL	Comment
	Ask Jeeves	http://www.askjeeves.com	Simple interface lets you ask questions.
	Debriefing	http://www.debriefing.com
	Cyber411	http://www.c4.com	Searches up to 17 engines, news engines such as cnn and abcnews, financial news and family search engines. Returns first 50 results from each engine, ranked and removes duplicates.
	Dogpile	http://www.dogpile.com	Search using engines, catalogs, usenet, ftp and Business news.
	Mamma	http://www.mamma.com	The mother of all search engines.
	Metacrawler	http://www.metacrawler.com	Search the web or newsgroups for keywords and phrases. Also has a large browse-able directory.
	Savvy Search	http://www.savvysearch.com	Queries nineteen search engines simultaneously, results listed in order of relevance, removes duplicates, summary of each of the findings.

CHAPTER V

5.0 Web pages and HTML

In this section you will learn about web pages and HTML.

5.1 What is a web page?

A web page is a simple text document that contains information (text, images, sound, video and links) to be displayed and instructions on how to format that information on the screen. The format instructions are called HTML (hypertext mark-up language) tags, and are simple instructions that inform the web browser as to how the information should be displayed.

HTML tags are not displayed by the web browser. A web page can be written using a simple text editor such as notepad. The page is then saved and stored on a web server. When you access that page on a web server by entering the URL of the page, it is downloaded by your web browser and then all the HTML tags are interpreted and the information displayed accordingly.

A web page typically has the file extension .htm and the beginning document of a web site is normally called default.htm or index.htm

5.2 What do I use to write a web page?

A web page can be written using a simple text editor such as notepad. The problem with using notepad is you have to know the HTML command tags in order to specify how the information is to be formatted. There are other editors that allow you to write HTML web pages in WYSIWYG (what you see is what you get, pronounced whiz-ee-wig) format, such as Microsoft Word, Frontpage express and Frontpage 2000, Dreamweaver and Hotmetal Pro.

The advantage of using a sophisticated editor is that is much quicker to produce HTML pages than using notepad, and many of these editors can produce complex HTML code simply and quickly.

Frontpage express

This diagram shows this very page being edited in Microsoft Front Page Express.

Note all the formatting buttons on the toolbar for bold, italics and specifying font styles and font sizes.

5.3 What is HTML?

HTML is a series of tags enclosed in < and > brackets. For instance, is an HTML tag that defines a head section of an HTML document. Certain characters are reserved, such & < > which are interpreted as HTML codes.

Each HTML page adheres to a basic structure. This looks like

Title of Document

Textual Information to be displayed

When viewed in the browser, this page looks like,

Textual Information to be displayed

5.4 Some Simple HTML Command Tags

The following table illustrates some basic HTML command tags.

Tag	Example
BOLD TYPE The BOLD print tag starts with and ends with so that all text in-between the tags is printed in bold.	the source HTML looks like This is bold text and this is not. the output by the browser looks like This is bold text and this is not.
LARGE SIZE TEXT There are 6 header sizes, from H1, the smallest, to H6, the largest. The HTML tags to are used to define the size of text. The normal size is about .	the source HTML looks like This is header size 3 the output by the browser looks like This is header size 3
ITALIC TEXT The ITALIC print tag starts with and ends with so that all text in-between the tags is printed in italics.	the source HTML looks like This is italic text and this is not. the output by the browser looks like This is italic text and this is not.
ADDING IMAGES Graphic images are added to an HTML page using the tag.	the source HTML looks like This is an icon the output by the browser looks like This is an icon
LINKING TO OTHER PAGES This is called a hyper-link. It shows up in the document as underlined text, and allows the user to load another page.	the source HTML looks like Goto next page the output by the browser looks like Goto next page

5.5 How do I view my web pages?

You can create web pages using Notepad or another editor and save these on your floppy disk. When you start your web browser, you can specify the pathname to the web page on your floppy instead of a URL. For example, if you saved your web page as A:\mypage.htm then you would specify the address of the web page as A:/mypage.htm

5.6 How do I get my web page on a web server?

To place your web pages on a web server so other people on the Internet can see them, you require access to a web server and permissions to store the files. Some ISP providers will allocate space for their customers, though there are a number of companies that provide personal web space for a small fee per month.

Files are normally transferred using ftp, though other methods such as the use of Microsoft Front Page or Web publisher programs are also popular. You will need to check with your ISP as to what method they support.

Try this exercise

CHAPTER VI

6.0Multimedia and Communication

In this section you will learn about some new multimedia communication methods that the Internet is using today.

6.1 What about bits, bytes and bandwidth?

In this section you will learn about bits, bytes and bandwidth.

6.1.1 Bits

A computer stores information in digital format as a series of on and off states, called logic 1's and logic 0's. The smallest element of information in a computer is a bit, and can hold either a 1 or 0.

6.1.2 Bytes

Obviously, a computer using only one bit can hold two states, either a 1 or a 0. This is not practical for handling a lot of information! So, more than one bit is grouped together. A Byte is actually a grouping of 8 bits, and thus as a whole can hold 256 possible values. A computer uses a byte to hold characters, such as the character 'A' , the digit '3' or the symbol '*'.

In computer language, the symbols K and M are used to denote thousand and million. The symbol B represents bytes and the symbol b represents bits.

From this, we can write the amount of memory in a computer as 64KB, meaning 64 thousand bytes (it can hold roughly 64 thousand characters).

Exercise
How much memory is 4Kb expressed as KB?

How many characters (roughly) can be held in 64MB of memory?

6.1.3 Bandwidth

When we transfer information across the Internet, we talk in terms of how much information we can send or receive per second. The maximum amount that can be sent at any one time is a restriction of the connection we are using, and is called bandwidth.

The bandwidth of a connection is expressed in terms of so many bits per second. For instance, you might have a dial-up connection to you ISP that supports 56Kbps (56 thousand bits per second). As there are eight bits in every character, dividing 56K by 8 gives us 7KB per second.

Bandwidth is usually expressed in bits per second. The higher the value, the more information that can be sent per second.

6.2 What is streaming audio and video?

Streaming audio and video is the ability to download video and audio from the Internet and have this play without having to wait for all of the file to be downloaded. This means streaming audio and video can support "live" events, and viewers can watch or listen to events as they occur.

Streaming audio and video requires special servers and software to be installed on your computer.

6.2.1 Streaming audio

The major problem associated with the traditional formats of waveform audio (such as wave files) is the file size. When you use a waveform audio file on a web page, the browser must download the entire file before it can be played. For files of reasonable size, this leads to long delays for the receiver. Streaming audio addresses the issue of this delay by allowing the file to be played whilst it is still being downloaded. Special players are required for viewing and playing streaming audio clips.

Streaming audio is primarily associated with Real Networks. "Real Player" supports the playing of streaming audio over the Internet. The advantages of using streaming audio are

the use of a special codec that reduces the bandwidth requirements for audio down to about 5Kbps
instant playing of the audio clip
suits limited bandwidth connections (i.e., a dial-up user with a 28.8Kbps modem)

When a user selects a streaming audio clip, the streaming audio player contacts the server and establishes the bandwidth and delays between the user computer and the host server. A few seconds of the clip is then downloaded into temporary buffer storage before the clip begins to play. As the clip begins to play, more content is downloaded and placed into temporary storage. If there is a congestion delay in the Internet connection, the clip continues to play using the contents of the buffer storage. Hopefully, before the buffer storage is exhausted, the congestion delay will cease and thus the buffer storage will fill up again. In this manner, the streaming player tries to achieve smooth playback of the clip. Without using storage, gaps or breaks in the audio would be evident when a congestion delay occurred.

A special content provider program is necessary for producing streaming audio clips. A limited free version can be downloaded from the Real Networks web site. In addition, to provide streaming audio clips from a server requires the use of a streaming server.

6.2.2 Streaming Video

Streaming video is very similar to streaming audio. We are faced with the same problem of file size and the delay involved in downloading the entire video file before it can be played on the client computer. Streaming video addresses the issue of this delay by allowing the file to be played whilst it is still being downloaded. A special player is required for viewing and playing streaming video.

Streaming video is primarily associated with Real Networks. "Real Player" and "Real Player G2" support the playing of streaming video (and audio) over the Internet. The advantages of using streaming video are

the use of a special codec that reduces the bandwidth requirements for video
instant playing of the video clip
suits limited bandwidth connections (i.e., a dial-up user with a 28.8Kbps modem)

When a user selects a streaming video clip, the streaming video player contacts the server and establishes the bandwidth and delays between the user computer and the host server. A few seconds of the clip is then downloaded into temporary buffer storage before the clip begins to play. As the clip begins to play, more content is downloaded and placed into temporary storage. If there is a congestion delay in the Internet connection, the clip continues to play using the contents of the buffer storage. Hopefully, before the buffer storage is exhausted, the congestion delay will cease and thus the buffer storage will fill up again. In this manner, the streaming player tries to achieve smooth playback of the clip. Without using storage, gaps or breaks in the video would be evident when a congestion delay occurred.

A special content provider program is necessary for producing streaming video clips. A limited free version can be downloaded from the Real Networks web site. In addition, to provide streaming video clips from a server requires the use of a streaming server.

Try this exercise on streaming video and audio.

6.3 How can I communicate with others?

In this section you will learn about software that allows you to communicate with others on the Internet.

6.3.1 Microsoft Net meeting

This software product supports the following

text chat
video
shared whiteboard
transferring of files
directory of connected users

Netmeeting

Using the directory, you can find someone to communicate with and call them. If they answer, you can communicate using text chat, video or audio (provided you and they have the necessary hardware support). You can even exchange files such as documents or pictures.

One of the problems with net meeting is the audio tends to drop out (break up and becomes inaudible) when using it on the Internet, though it works fine over a high-speed company network. Net meeting is normally installed with Internet Explorer, or you can download it from http://www.microsoft.com/windows/netmeeting/. There is a web site dedicated to Netmeeting at http://www.netmeet.net/.

Try this exercise on communicating with others using netmeeting.

6.3.2 Ivisit

Ivisit is currently a free video conferencing package that allows you similar functionality to that of net meeting.

Ivisit supports,

text chat
voice chat
video
directory services
file transfers

Ivisit

One good thing about ivisit is the ability to communicate with more than one person at once. Another good feature is the quality of audio that is possible, though the video picture quality does suffer when using slow connections on the Internet. Ivisit can be downloaded from http://www.ivisit.com.

6.3.3 Icq and Powwow

These two programs allow you to communicate with your friends on the Internet, using text or voice chat.

6.3.3.1 ICQ
One of the problems of talking with friends on the Internet is knowing if they are online. This program lets you see if your friends are on-line and provides you the means to contact them. As soon as they log on, you can be notified.

ICQ

ICQ provides the following services

voice and text chat
user notification when friends log on to the Internet
file transfer
sending of brief messages
exchanging of URL's
email

In addition, ICQ provides a large user directory where you can post information about yourself. This directory can be searched by you or other ICQ users, so you can find people with similar interests. Icq can be downloaded from http://www.icq.com/download/.

6.3.3.2 Powwow
Another product that provides similar functionality to ICQ is Powwow. It does provide some interesting animated effects in chat mode, and lets you talk to multiple users at once.

PowWow

Powwow provides the following features,

instant voice and chat messaging
group chat
shared white boards
community bulletin boards
file transfers and URL's

Powwow can be downloaded from http://ww2.tribal.com/download/default.cfm

6.4 What is a directory server?

A directory server is a host on the Internet that holds a list of users. Not all directory servers are the same and most support different products. This means a directory server that supports Netmeeting will not support users using Ivisit.

6.5 Internet Relay Chat

Internet Relay Chat (IRC) allows you to meet online with people of similar interests. IRC consists of a large number of channels (a channel is like a room where messages are exchanged) that contain group conversations. Channels can be created by anyone, and users can join channels and exchange messages.

It is text based mode of communication. Users often use cryptic codes and abbreviations to signify phrases of speech, so they do not have to type whole words.

Channels are located on IRC servers, which often talk to each other and share channels, so that messages on one server are relayed to the same channels on other IRC servers.

To access IRC requires the use of an IRC client. One such client is available from http://www.mirc.co.uk/ on creating a simple web page.

CHAPTER VII

7.0Odds and Ends

In this section you will learn about some rules on conduct online, abbreviations to use in text or email, what proxy servers and firewalls are, as well as how to find information using FAQ's.

7.1 FAQ's (Frequently Asked Questions)

An FAQ is a file generated by an individual(s) on the Internet that relates to a particular topic. These FAQ's are generally posted monthly to the Usenet newsgroup news.answers and archived on the following ftp site.

ftp://rtfm.mit.edu/pub/usenet/comp.answers/

FAQ's cover a very wide range, from UNIX, Windows, programming, data communications, cell-relay, graphics, viruses, editors and much more.

An FAQ attempts to answer most of the common questions that you might ask, like where do I find software, help, examples or other questions.

FAQ's follow a standard format layout, so if you were planning on developing one, it would be wise to check the FAQ on how to design an FAQ!

7.2 RFC's (Request for comments)

A Request For Comment is a document that contains information about a protocol (a set of rules for exchanging information, like TCP/IP) or policy associated with the Internet.

A new RFC can be developed by an individual, group or company. The new RFC outlines how the new service will work, and is placed on the Internet for public comment. If it is accepted by an Internet Standards Committee, then the RFC is implemented.

Further information about the process involved in developing an RFC, as well as a list of all the RFC's can be found online at http://www.rfc-editor.org/

7.3 Netiquette

Netiquette means online conduct. It is how you or others behave online. One of the problems with the Internet is sometimes you could pretend to be someone else, and mislead people. This is due to the lack of visual and other means in being able to identify who we are communicating with.

To help set some form of acceptable practice for online conduct, a set of rules were devised. However, there is difficulty in enforcing the rules on the Internet as there is no such thing as Internet police!

In 1994, Virginia Shea wrote a book called 'Netiquette' (Albion books) that is widely accepted as the standard guide to online conduct. In her book, Shea outlined ten basic rules for online conduct.

Rule 1: Remember the Human
Rule 2: Adhere to the same standards of behavior online that you follow in real life
Rule 3: Know where you are in cyberspace
Rule 4: Respect other people's time and bandwidth
Rule 5: Make yourself look good online
Rule 6: Share expert knowledge
Rule 7: Help keep flame wars under control
Rule 8: Respect other people's privacy
Rule 9: Don't abuse your power
Rule 10: Be forgiving of other people's mistakes

There is an online web site dedicated to netiquette at http://www.etiquette.net/index.html

Tip
There is one more important thing you should remember. When communicating with others in email or chat, avoid the use of capital letters. IF YOU USE UPPERCASE, THIS IS DEEMED TO BE SHOUTING AND OTHERS WILL GET THE IMPRESSION YOU ARE ANGRY WITH THEM!

7.4 Emoticons

In the real world, we have so much feedback about what people do when we communicate. This feedback is called paralanguage, and includes the way people say words, their body language (how they stand, use their hands), nodding of the head, context, surroundings and other factors.

On the Internet, most of this paralanguage is lost. Sometimes we feel it would be nice to let other people know how we are feeling. They cannot see us, and in most cases we are communicating using email or text chat.

In 1993 Doherty and David Sanderson published a book called "Smileys" (O'Reilly & Associates) that outlined how to display emotions using simple text characters. This let to the term emoticons, or icons that represent emotions.

Let us start with a simple one. The following emoticon represents a smiling face (turn you head on the side to view it).

:-)

The following emoticon represents a sad face!

:-(

The evolution of chat rooms has led to the development of abbreviations to represent feelings, thus avoiding having to type too much text (this also has the advantage of a quicker typed response). Sometimes people enclose the abbreviations using angle brackets (the <> characters) or using an asterisk. The following table illustrates some of these abbreviations.

Abbreviation	Meaning
	grin
BTW	By the way
BFN	Bye for now
FYI	For your information
GMTA	Great minds think alike
IMHO	In my humble opinion
LOL	Laughing out loud
S	Smile

Exercise
Use your Internet searching skills to find out what the following emoticons and abbreviations stand for.

Abbreviation	Meaning
AAMOF
HTH
IAC
ROTFL
TIA
(-)
:-o
;-)
:@

7.5 Spamming

Spamming refers to the sending out of messages or information to thousands of people on the Internet. This is not acceptable conduct (it breaches several rules of Netiquette as outline above. Can you determine which ones?). Many ISP's now have policies on spamming, which will involve the disabling of the offenders account.

7.6 Flaming

Flaming is the sending of abusive email or messages to other people or newsgroups. You flame people when you disrespect their opinion and become abusive, attacking them personally, calling them names or questioning their parentage.

Flaming is considered bad online conduct and is not tolerated well in chat rooms or newgroups. Individuals can quickly be turned on by other people in the chat room or newsgroup. It is best to practice good online conduct and avoid getting into personal confrontations with other people.

7.7 Mailing Lists and List Servers

A list server provides a number of mailing lists. A mailing list uses email messages and provides a mechanism for people of similar interests to intercommunicate.

It is different from a newsgroup. A newsgroup is public and open to all, but mailing lists are run on list servers and involve a process of subscription or joining. In addition, to prevent flaming, many mail lists are moderated, where all messages are first checked before they are sent on to all members of the mail list.

When you send an email message to a mail list, it is then forwarded by the list server to all members of that list. A list server also provides a number of commands you use to get an archive of the list messages, see who else is a member, or to leave the list (cease to be a member).

There is a wide range of mail lists available on the Internet covering a wide range of topics such as education, distance learning, drug abuse and other topics.

Exercise
Practise your Internet skills to locate a number of mail lists that interest you.

7.8 Firewalls

A firewall is a means of protecting a corporate network from outside abuse. In the very early days of the development of the motor car, the engine compartment was isolated from the passengers by use of steel plates. In the event of a fire or explosion, this prevented damage to the passenger by helping to limit the spread of flames or prevent engine parts from flying to the passenger area.

This dividing wall is still present in modern day cars and is called a firewall. In Internet terms, the engine bay is like the Internet and holds many dangers that could damage the corporate network (like the passengers). To protect the corporate network, a firewall is used.

In technical terms, a firewall is a router that restricts Internet access by only allowing access to certain host computers (and specified services on those host computers) within the organization. Of course, a firewall not only protects outside intruders accessing internal corporate computers, but it can also prevent internal employees from accessing the Internet!

7.9 Proxy Servers

When a firewall is used to restrict internal employees from accessing the Internet, a proxy server is used to provide access. When an employee requests a web page for example, this request is forwarded to the proxy server, which then contacts the host computer where the page resides, downloads the page, keeps a copy of it, then sends the page back to the employee.

If another employee requests the same page, the proxy server will know it already has a copy of the page, so will not need to contact the host computer, and instead just forward the copy of the page it has back to the employee. This has saved Internet traffic, a major reason why proxy servers are used.

So, proxy servers can help in reducing the amount of Internet traffic coming into an organization. The proxy server must have access to the Internet in order to do this.

Parmeshwor Vetwal

Thursday, July 31, 2008

THE INTERNET AND WORLD WIDE WEB

CONTENTS

CHAPTER I

CHAPTER I

CHAPTER II

CHAPTER III

CHAPTER IV

CHAPTER V

to

are used to define the size of text. The normal size is about

.

This is header size 3

CHAPTER VI

CHAPTER VII

No comments:

Followers