Difference between revisions of "url-formats"

From Microformats Wiki
url-formats
Jump to navigation Jump to search
(→‎CGI: link to mirror of "original" spec environment variables and include a few more details)
(more CGI details, and dates for each)
Line 57: Line 57:
 
* :port is omitted if the port is 80
 
* :port is omitted if the port is 80
 
* empty abs_path is replaced with '''/'''
 
* empty abs_path is replaced with '''/'''
 +
 +
== DOM ==
 +
1996 https://developer.mozilla.org/en/DOM/window.location#Properties
 +
 +
The window.location object represent the URL of the window's page and thus also has properties (terms) for the different parts/pieces.
 +
 +
Properties:
 +
 +
* '''protocol''' - e.g. "http:"
 +
* '''host''' - e.g. "www.example.com:80"
 +
** '''hostname''' - e.g. "www.example.com"
 +
** '''port''' - e.g. "80"
 +
* '''pathname''' - e.g. "/search"
 +
* '''search''' - e.g. "?q=devmo"
 +
* '''hash''' - e.g. "#test"
  
 
== CGI ==
 
== CGI ==
Common Gateway Interface, specifically, Environment Variables
+
~1997-1999? Common Gateway Interface, specifically, Environment Variables
1999?
+
 
 
* http://tools.ietf.org/html/rfc3875
 
* http://tools.ietf.org/html/rfc3875
 
* http://www.citycat.ru/doc/CGI/overview/env.html
 
* http://www.citycat.ru/doc/CGI/overview/env.html
Line 78: Line 93:
 
** '''?'''
 
** '''?'''
 
** '''query-string''' - QUERY_STRING
 
** '''query-string''' - QUERY_STRING
 
  
 
Environment variables:
 
Environment variables:
 
* '''SERVER_PROTOCOL''' - not the protocol scheme, e.g. "HTTP/1.1"
 
* '''SERVER_PROTOCOL''' - not the protocol scheme, e.g. "HTTP/1.1"
* '''HTTP_HOST''' or '''SERVER_NAME''' - e.g "example.com"
+
* '''SERVER_NAME''' or '''HTTP_HOST''' - e.g "example.com"
 
* '''SERVER_PORT''' - e.g. "80"
 
* '''SERVER_PORT''' - e.g. "80"
 
* '''REMOTE_USER''' - the username (but not password)
 
* '''REMOTE_USER''' - the username (but not password)
 
* '''PATH''' - not the URL path, but to the web server on the system
 
* '''PATH''' - not the URL path, but to the web server on the system
 
* '''REQUEST_URI''' - e.g. "/cgi-bin/printenv.pl/ponylove?q=20%C001er&moar=kitties"
 
* '''REQUEST_URI''' - e.g. "/cgi-bin/printenv.pl/ponylove?q=20%C001er&moar=kitties"
** '''SCRIPT_NAME''' - e.g. "/cgi-bin/printenv.pl"
+
** '''SCRIPT_NAME''' - e.g. "/cgi-bin/printenv.pl" (first two segments?)
** '''PATH_INFO''' - e.g. "/ponylove"
+
** '''PATH_INFO''' - e.g. "/ponylove" (remainder of path)
 
** '''QUERY_STRING''' - e.g. "q=20%C001er&moar=kitties"
 
** '''QUERY_STRING''' - e.g. "q=20%C001er&moar=kitties"
 
== DOM ==
 
* https://developer.mozilla.org/en/DOM/window.location#Properties
 
 
The window.location object represent the URL of the window's page and thus also has properties (terms) for the different parts/pieces.
 
 
Properties:
 
 
* '''protocol''' - e.g. "http:"
 
* '''host''' - e.g. "www.example.com:80"
 
** '''hostname''' - e.g. "www.example.com"
 
** '''port''' - e.g. "80"
 
* '''pathname''' - e.g. "/search"
 
* '''search''' - e.g. "?q=devmo"
 
* '''hash''' - e.g. "#test"
 
  
 
== Googler ==
 
== Googler ==
Per Matt Cutts's blog post <cite>[http://www.mattcutts.com/blog/seo-glossary-url-definitions/ Talk like a Googler: parts of a url]</cite>: of for example:
+
2007 Per Matt Cutts's blog post <cite>[http://www.mattcutts.com/blog/seo-glossary-url-definitions/ Talk like a Googler: parts of a url]</cite>: of for example:
  
 
<nowiki>http://video.google.co.uk:80/videoplay?docid=-7246927612831078230&hl=en#00h02m30s</nowiki>
 
<nowiki>http://video.google.co.uk:80/videoplay?docid=-7246927612831078230&hl=en#00h02m30s</nowiki>

Revision as of 01:50, 24 August 2011

<entry-title>URL formats</entry-title>

URLs are often defined and represented in various systems as a set of various pieces/parts. This page documents the implicit formats from those systems.

URL specification

The URL specification is perhaps the most canonical source for the names of the different parts of a URL.

1994 http://www.w3.org/Addressing/URL/url-spec.txt

Names are quoted literally, dropping any "The" prefix and "part" suffix.

  • PrePrefix - e.g. "URL:". The portion before the "http".
  • Scheme - e.g. "http"
  • :
  • Internet protocol parts
    • // (until the following /)
    • user name (if present, followed by an @ after optional password (see next field)).
    • password (if present, preceded by a :)
    • internet domain name - e.g. "www.w3.org"
    • port number (if present, preceded by a :)
  • Path
    • search
  • fragmentid - "the hash sign and following"

HTTP

The HTTP specification has a few notes about the format/portions of HTTP URLs.

1996 http://www.ietf.org/rfc/rfc1945.txt - 3.2.1 General Syntax

  • URI
    • absoluteURI
      • scheme
      • :
      • relativeURI
        • net_path
          • //
          • net_loc
          • abs_path
            • /
            • rel_path
              • path
                • fsegment
                • segment (zero or more, if present, preceded by /)
              • params (if present, preceded by ;)
              • query (if present, preceded by ?)
    • fragment (if present, preceded by #)

Also:

  • http_URL
    • http://
    • host
    • port (if present, preceded by :)
    • abs_path (as defined above)

Canonicalization:

  • host is lowercased
  • :port is omitted if the port is 80
  • empty abs_path is replaced with /

DOM

1996 https://developer.mozilla.org/en/DOM/window.location#Properties

The window.location object represent the URL of the window's page and thus also has properties (terms) for the different parts/pieces.

Properties:

  • protocol - e.g. "http:"
  • host - e.g. "www.example.com:80"
    • hostname - e.g. "www.example.com"
    • port - e.g. "80"
  • pathname - e.g. "/search"
  • search - e.g. "?q=devmo"
  • hash - e.g. "#test"

CGI

~1997-1999? Common Gateway Interface, specifically, Environment Variables

http://example.com/cgi-bin/printenv.pl/ponylove?q=20%C001er&moar=kitties

Terms:

  • script-URI
    • scheme same as SERVER_PROTOCOL
    • ://
    • server-name - SERVER_NAME
    • :
    • server-port - SERVER_PORT
    • script-path same as SCRIPT_NAME
    • extra-path same as PATH_INFO
    • ?
    • query-string - QUERY_STRING

Environment variables:

  • SERVER_PROTOCOL - not the protocol scheme, e.g. "HTTP/1.1"
  • SERVER_NAME or HTTP_HOST - e.g "example.com"
  • SERVER_PORT - e.g. "80"
  • REMOTE_USER - the username (but not password)
  • PATH - not the URL path, but to the web server on the system
  • REQUEST_URI - e.g. "/cgi-bin/printenv.pl/ponylove?q=20%C001er&moar=kitties"
    • SCRIPT_NAME - e.g. "/cgi-bin/printenv.pl" (first two segments?)
    • PATH_INFO - e.g. "/ponylove" (remainder of path)
    • QUERY_STRING - e.g. "q=20%C001er&moar=kitties"

Googler

2007 Per Matt Cutts's blog post Talk like a Googler: parts of a url: of for example:

http://video.google.co.uk:80/videoplay?docid=-7246927612831078230&hl=en#00h02m30s

Parts of a url:

  • protocol - e.g. "http"
  • host or hostname - e.g. "video.google.co.uk"
    • subdomain - e.g. "video"
    • domain name - e.g. "google.co.uk"
    • top-level domain or TLD - e.g. "uk" (which in this case is also referred to as a country-code top-level domain or ccTLD.
  • port - e.g. "80"
  • path - e.g. "/videoplay"
  • parameters - e.g. "?docid=-7246927612831078230&hl=en"
    • parameter - e.g. "docid" with value "-7246927612831078230"
  • fragment or named anchor - e.g. "#00h02m30s"

related