Web development #1: Internet and the World Wide Web

Sander Rossel

4.76/5 (20 votes)

Nov 10, 2014

CPOL

11 min read

48217

The first in a series on web development.

This is the first installment of a blog series about web development. You can find other blogs here:

For the past four years I’ve worked in VB.NET, C# and WinForms. Wanting to see more than just Microsoft and WinForms I applied for a new job and was hired. Hooray for me! Anyway, I’m starting januari and I’ve been told my first assignment would be to create a website using ASP.NET MVC and Knockout.js. So that’s great! Away from WinForms and into new territory. I can’t wait to start. I’ve never done serious web development though. Sure, I’ve seen some HTML, CSS and JavaScript, but that’s about it. Of course that’s something that can be easily fixed. So I’m starting a series on web development.
I’ve been looking into web development before and the first thing I noticed was that there’s a huge amount of protocols, languages and frameworks I supposedly need to know about. But where do I start? We have TCP/IP, HTTP(S), Apache, IIS, XML, HTML, CSS, JavaScript, AJAX, PHP, ASP.NET, ASP.NET MVC, Java, Ruby On Rails, Python, SQL, jQuery, AngularJS, Knockout.js, Node.js, Bootstrap, and there’s new stuff coming almost daily… It doesn’t seem to stop!
So in this post I want to start by explaining some of those technologies and why you do or do not need them (but might want to use them). In next blog posts we are actually going to use some of them.

Understanding the Internet

Do we need to know how the internet works to create beautiful, responsive web pages? Probably not, but knowing what the internet is, what is involved, and how it works on a high level can certainly help in giving you that egde you need to become a great web developer. So in this post I’m not going to show you code and we’re not going to create a web page, instead I’m going to explain, without too much detail, how the internet works.

In short the internet is a worldwide set of computer networks. It spans many technologies, like email, chat, file transfer and web pages. The World Wide Web (or www) is a set of web pages and sites that make up a (substantial) subset of the internet. In fact, the www is so big that the terms internet and www are often used interchangeably.

A little history

You may think that the internet is a phenomenon from the past 15 to 20 years, but the first version, called ARPANET (Advanced Research Projects Agency Network), goes as far back as the late 60’s! It was initially developed to connect universities and laboratories in the U.S. to make it easier to share data. You might be surprised that many of the idea’s and technologies that were founded back then are still present in the internet as we know it today! Some of those idea’s include: sending data between computers in packets, linked networks should still work if they’re not connected to other networks, computers can be added or removed dynamically, everyone should be able to create programs and devices to connect to it through a uniform interface (protocols), and last, but not least, there is no centralized control over the network.

Organizations

Those last two points seem to contradict each other, there is no centralized control, but we do want standardization and consistency throughout the internet by using protocols. There are actually some organizations that more or less regulate internet technologies. The Internet Engineering Task Force (IETF) creates specifications for internet protocols for the way information is exchanged on the internet. ICANN, the Internet Corporation for Assigned Numbers and Names, controls web site names. The World Wide Web Consortium, also known as W3C, creates recommendations for web standards for various web programming languages like HTML and CSS and how browsers should interpret those languages.
Additionally there is the ISO (International Organization for Standardization) that provides various standards, including many for IT.

Technologies

We now know a little about the internet and its history. But we still don’t know how it all works. In the next section we’re going to take a look at some technologies and how they work together to get data across the globe.

IP (Internet Protocol)

For now, try to think as the internet as old fashioned snail mail. A packet of data is send from point A to B. Like with regular mail a computer must know where to send a packet. In the real world we have addresses and zipcodes or postal codes (unfortunately we have no worldwide standard for this). Our computers have something similar, called an IP address (Internet Protocol address). An IP address consists of four bytes (that is 4×8 bits, making a total of 32 bits). One byte can make the numbers 0 to 255. A typical IP address could look something like 192.0.78.17 (that’s the IP of WordPress.com). Because 32 bits can ‘only’ have 4.294.967.296 (2³²) unique addresses, and this proved to be insufficient, another IP protocol is available, consisting of a total of 128 bits (or eight groups of hexadecimal digits). The two versions of IP are also called IPv4 (32 bits) and IPv6 (128 bits).
A special IP address is the IP that a computer can use to connect to itself, 127.0.01, also known as localhost.

Now whenever you send data to another computer, let’s say a blog post, your data is cut into smaller packets. Those packets are then send to a computer with a unique IP address. A router is used to locate the computer with the specific IP address. Multiple routers typically work together to get your packets to the other computer. All computers are ultimately connected to a router. That means your computer is not connected to the other computer, but you are connected through a long chain of routers.

TCP (Transmission Control Protocol)

We have seen that the IP protocol is essential in getting data from A to B. The IP protocol is quite minimal and doesn’t actually do much other than getting that data from A to B. If, for some reason, packets of data get lost, corrupted, duplicated or delayed we need more sophisticated protocols. One such protocol is TCP, or Transmission Control Protocol, also called TCP/IP because it makes use of the IP protocol.
TCP solves the problems I just mentioned. It does this by adding extra information to packets of data. With this information TCP can put packets back in the correct order (which is the order in which they were sent, not the order in which they are received) and it can detect errors and duplicates or missing packages and try to solve these problems. So in short TCP guarantees reliable and in-order delivery of data between computers.

Additionally, TCP associates each program or service with a unique number, making it possible for multiple programs to share the same computer and internet connection. This number is also called a port. Some common internet services have been given a standard port, such as 21 for file transfer (FTP), 25 and 110 for email (SMTP and POP3) and 80 for web (HTTP).

Other protocols run on top of IP too, like UDP (User Datagram Protocol), which is faster than TCP, but at the cost of reliability. I will not discuss any here.

The World Wide Web

So that’s the internet in a nutshell (a really very tiny nutshell…). That’s cool, but what we really want is to write web pages for the www! But before we get to that let’s take a closer look at how the www works.

The www is basically a worldwide set of documents formatted in a language called HTML (HyperText Markup Language), a form of XML (Extensible Markup Language). A web server is a computer that is running a special piece of software, like Apache or Microsoft’s Internet Information Services (IIS), that can serve these documents to clients (other computers). Users request these pages through web browsers like IE, Firefox and Chrome.

Domain Name System (DNS) and Uniform Resource Locators (URLs)

As we know each computer on the internet can be identified by IP. Yet, in our browser we type wordpress.com and not 192.0.78.17. We owe this convencience to the Domain Name System (DNS). The plain text name for a web server is called a domain name. Businesses and other users can buy domain names, which are managed by a known set of root DNS servers. So every time you type the name of a website the IP is fetched based on domain name and you can connect to the server using the IP address.

The domain name becomes part of a larger text, known as a Uniform Resource Locator (URL). A URL starts with a protocol, then the host or domain name and finally a path to a file or document on the server. For example http://www.codeproject.com/Members/SanderRossel indicates that we are going to connect to the codeproject.com server using the HTTP (HyperText Transfer Protocol). On the server we want the folder Members/SanderRossel, which contains the default index.html document. When you type the URL in your browser window the result is that my CodeProject profile is displayed.
The www part of the URL is a convention and is completely optional (it’s not a standard).
The .com part is the top-level domain and roughly organizes websites by geography, type of organization or content. .com for commercial, .edu for educational, .nl for websites in the Netherlands, .fr for French websites, etc. The most popular top-level domain is, of course, .com (even for non-commercial websites, such as CodeProject!).
URLs can contain different information too, like the port to use and even a specific point on a document (also called an anchor).

HyperText Transfer Protocol (HTTP)

To request data from and to a web server another protocol is used, the HyperText Transfer Protocol. This is another layer on top of TCP. HTTP is a set of commands that a computer can send to a web client. These commands include, but are not limited to:

GET, for requesting a file from the server.
POST, for submitting form information to the server.
PUT, for uploading files to the server.

Whenever you request a webpage your browser send a GET message to the server, the server sends back the page and your browser displays it.

Along with the requested document comes a code, the HTTP status code, which indicates whether your request was successful. These status codes include, but are not limited to:

200, OK.
403, Forbidden to access a page.
404, Page not found.

HTTP is a stateless protocol, which means there is no persistent connection between a client and a server. This is important when developing websites! The state of an application cannot be tracked on the server. I’ll get to that in later blog posts though.

You may have heard of HTTPS as well. The S stands for Secure (not Stateless!). I’m not discussing it here, but I thought I should mention it.

So what do we need?

So in the introduction of this blog post I mentioned some technologies. I have already explained some in this post. What others are we going to need for building websites?

I’ve already mentioned HTML, which is the main language to describe the contents of a website. In addition there’s CSS (Cascading Style Sheet) to supply stylistic information of a page. JavaScript can be used to make webpages interactive. To create dynamic pages you’ll need a language that generates HTML on a web server, like PHP, Java, Ruby On Rails, ASP.NET (C#), Python or others.
So there’s the good news. Basically you need to know only HTML to create the most basic of websites. To create a website that looks nice you need some CSS too. Add JavaScript to your stack and your webpage can be pretty sweet. And you probably want some server side language to actually present up-to-date information, like Java or ASP.NET.

All the others, jQuery, AngularJS, Knockout.js etc. are libraries (or files) in these languages. They are nice to have, and some are indispensable when solving specific problems, but they aren’t strictly necessary.

So we now have a basic understanding about the internet and the World Wide Web!
In the next blog post we are going to have a look at HTML and CSS, followed by JavaScript. After that we’ll take a look at generating HTML using a server side language.

Stay tuned!

The post Web development #1: Internet and the World Wide Web appeared first on Sander's bits.