Difference between revisions of "url-formats"

From Microformats Wiki
url-formats
Jump to navigation Jump to search
(add DOM location object terms for URL parts/pieces)
(add Googler terms for parts of a URL)
Line 60: Line 60:
 
== CGI ==
 
== CGI ==
 
* http://tools.ietf.org/html/rfc3875
 
* http://tools.ietf.org/html/rfc3875
* http://en.wikipedia.org/wiki/Common_Gateway_Interface - has example: http://example.com/cgi-bin/printenv.pl/ponylove?q=20%C001er&moar=kitties
+
* http://en.wikipedia.org/wiki/Common_Gateway_Interface - has example:  
 +
 
 +
<nowiki>http://example.com/cgi-bin/printenv.pl/ponylove?q=20%C001er&moar=kitties</nowiki>
  
 
Terms:
 
Terms:
Line 101: Line 103:
 
* '''hash''' - e.g. "#test"
 
* '''hash''' - e.g. "#test"
  
 +
== Googler ==
 +
Per Matt Cutts's blog post <cite>[http://www.mattcutts.com/blog/seo-glossary-url-definitions/ Talk like a Googler: parts of a url]</cite>: of for example:
 +
 +
<nowiki>http://video.google.co.uk:80/videoplay?docid=-7246927612831078230&hl=en#00h02m30s</nowiki>
  
 +
Parts of a url:
 +
* '''protocol''' - e.g. "http"
 +
* '''host''' or '''hostname''' - e.g. "video.google.co.uk"
 +
** '''subdomain''' - e.g. "video"
 +
** '''domain name''' - e.g. "google.co.uk"
 +
** '''top-level domain''' or '''TLD''' - e.g. "uk" (which in this case is also referred to as a '''country-code top-level domain''' or '''ccTLD'''.
 +
* '''port''' - e.g. "80"
 +
* '''path''' - e.g. "/videoplay"
 +
* '''parameters''' - e.g. "?docid=-7246927612831078230&hl=en"
 +
** '''parameter''' - e.g. "docid" with value "-7246927612831078230"
 +
* '''fragment''' or '''named anchor''' - e.g. "#00h02m30s"
  
 
== related ==
 
== related ==
 
* [[url]]
 
* [[url]]

Revision as of 00:56, 24 August 2011

<entry-title>URL formats</entry-title>

URLs are often defined and represented in various systems as a set of various pieces/parts. This page documents the implicit formats from those systems.

URL specification

The URL specification is perhaps the most canonical source for the names of the different parts of a URL.

1994 http://www.w3.org/Addressing/URL/url-spec.txt

Names are quoted literally, dropping any "The" prefix and "part" suffix.

  • PrePrefix - e.g. "URL:". The portion before the "http".
  • Scheme - e.g. "http"
  • :
  • Internet protocol parts
    • // (until the following /)
    • user name (if present, followed by an @ after optional password (see next field)).
    • password (if present, preceded by a :)
    • internet domain name - e.g. "www.w3.org"
    • port number (if present, preceded by a :)
  • Path
    • search
  • fragmentid - "the hash sign and following"

HTTP

The HTTP specification has a few notes about the format/portions of HTTP URLs.

1996 http://www.ietf.org/rfc/rfc1945.txt - 3.2.1 General Syntax

  • URI
    • absoluteURI
      • scheme
      • :
      • relativeURI
        • net_path
          • //
          • net_loc
          • abs_path
            • /
            • rel_path
              • path
                • fsegment
                • segment (zero or more, if present, preceded by /)
              • params (if present, preceded by ;)
              • query (if present, preceded by ?)
    • fragment (if present, preceded by #)

Also:

  • http_URL
    • http://
    • host
    • port (if present, preceded by :)
    • abs_path (as defined above)

Canonicalization:

  • host is lowercased
  • :port is omitted if the port is 80
  • empty abs_path is replaced with /

CGI

http://example.com/cgi-bin/printenv.pl/ponylove?q=20%C001er&moar=kitties

Terms:

  • script-URI
    • scheme same as SERVER_PROTOCOL
    • ://
    • server-name - SERVER_NAME
    • :
    • server-port - SERVER_PORT
    • script-path same as SCRIPT_NAME
    • extra-path same as PATH_INFO
    • ?
    • query-string - QUERY_STRING


Environment variables:

  • SERVER_PROTOCOL - not the protocol scheme, e.g. "HTTP/1.1"
  • HTTP_HOST - e.g "example.com"
  • SERVER_PORT - e.g. "80"
  • REMOTE_USER
  • PATH - not the URL path, but to the web server on the system
  • REQUEST_URI - e.g. "/cgi-bin/printenv.pl/ponylove?q=20%C001er&moar=kitties"
    • SCRIPT_NAME - e.g. "/cgi-bin/printenv.pl"
    • PATH_INFO - e.g. "/ponylove"
    • QUERY_STRING - e.g. "q=20%C001er&moar=kitties"

DOM

The window.location object represent the URL of the window's page and thus also has properties (terms) for the different parts/pieces.

Properties:

  • protocol - e.g. "http:"
  • host - e.g. "www.example.com:80"
    • hostname - e.g. "www.example.com"
    • port - e.g. "80"
  • pathname - e.g. "/search"
  • search - e.g. "?q=devmo"
  • hash - e.g. "#test"

Googler

Per Matt Cutts's blog post Talk like a Googler: parts of a url: of for example:

http://video.google.co.uk:80/videoplay?docid=-7246927612831078230&hl=en#00h02m30s

Parts of a url:

  • protocol - e.g. "http"
  • host or hostname - e.g. "video.google.co.uk"
    • subdomain - e.g. "video"
    • domain name - e.g. "google.co.uk"
    • top-level domain or TLD - e.g. "uk" (which in this case is also referred to as a country-code top-level domain or ccTLD.
  • port - e.g. "80"
  • path - e.g. "/videoplay"
  • parameters - e.g. "?docid=-7246927612831078230&hl=en"
    • parameter - e.g. "docid" with value "-7246927612831078230"
  • fragment or named anchor - e.g. "#00h02m30s"

related