url-formats

From Microformats Wiki
Revision as of 01:19, 24 August 2011 by Tantek (talk | contribs) (→‎CGI: link to mirror of "original" spec environment variables and include a few more details)
Jump to navigation Jump to search
The printable version is no longer supported and may have rendering errors. Please update your browser bookmarks and please use the default browser print function instead.

<entry-title>URL formats</entry-title>

URLs are often defined and represented in various systems as a set of various pieces/parts. This page documents the implicit formats from those systems.

URL specification

The URL specification is perhaps the most canonical source for the names of the different parts of a URL.

1994 http://www.w3.org/Addressing/URL/url-spec.txt

Names are quoted literally, dropping any "The" prefix and "part" suffix.

  • PrePrefix - e.g. "URL:". The portion before the "http".
  • Scheme - e.g. "http"
  • :
  • Internet protocol parts
    • // (until the following /)
    • user name (if present, followed by an @ after optional password (see next field)).
    • password (if present, preceded by a :)
    • internet domain name - e.g. "www.w3.org"
    • port number (if present, preceded by a :)
  • Path
    • search
  • fragmentid - "the hash sign and following"

HTTP

The HTTP specification has a few notes about the format/portions of HTTP URLs.

1996 http://www.ietf.org/rfc/rfc1945.txt - 3.2.1 General Syntax

  • URI
    • absoluteURI
      • scheme
      • :
      • relativeURI
        • net_path
          • //
          • net_loc
          • abs_path
            • /
            • rel_path
              • path
                • fsegment
                • segment (zero or more, if present, preceded by /)
              • params (if present, preceded by ;)
              • query (if present, preceded by ?)
    • fragment (if present, preceded by #)

Also:

  • http_URL
    • http://
    • host
    • port (if present, preceded by :)
    • abs_path (as defined above)

Canonicalization:

  • host is lowercased
  • :port is omitted if the port is 80
  • empty abs_path is replaced with /

CGI

Common Gateway Interface, specifically, Environment Variables 1999?

http://example.com/cgi-bin/printenv.pl/ponylove?q=20%C001er&moar=kitties

Terms:

  • script-URI
    • scheme same as SERVER_PROTOCOL
    • ://
    • server-name - SERVER_NAME
    • :
    • server-port - SERVER_PORT
    • script-path same as SCRIPT_NAME
    • extra-path same as PATH_INFO
    • ?
    • query-string - QUERY_STRING


Environment variables:

  • SERVER_PROTOCOL - not the protocol scheme, e.g. "HTTP/1.1"
  • HTTP_HOST or SERVER_NAME - e.g "example.com"
  • SERVER_PORT - e.g. "80"
  • REMOTE_USER - the username (but not password)
  • PATH - not the URL path, but to the web server on the system
  • REQUEST_URI - e.g. "/cgi-bin/printenv.pl/ponylove?q=20%C001er&moar=kitties"
    • SCRIPT_NAME - e.g. "/cgi-bin/printenv.pl"
    • PATH_INFO - e.g. "/ponylove"
    • QUERY_STRING - e.g. "q=20%C001er&moar=kitties"

DOM

The window.location object represent the URL of the window's page and thus also has properties (terms) for the different parts/pieces.

Properties:

  • protocol - e.g. "http:"
  • host - e.g. "www.example.com:80"
    • hostname - e.g. "www.example.com"
    • port - e.g. "80"
  • pathname - e.g. "/search"
  • search - e.g. "?q=devmo"
  • hash - e.g. "#test"

Googler

Per Matt Cutts's blog post Talk like a Googler: parts of a url: of for example:

http://video.google.co.uk:80/videoplay?docid=-7246927612831078230&hl=en#00h02m30s

Parts of a url:

  • protocol - e.g. "http"
  • host or hostname - e.g. "video.google.co.uk"
    • subdomain - e.g. "video"
    • domain name - e.g. "google.co.uk"
    • top-level domain or TLD - e.g. "uk" (which in this case is also referred to as a country-code top-level domain or ccTLD.
  • port - e.g. "80"
  • path - e.g. "/videoplay"
  • parameters - e.g. "?docid=-7246927612831078230&hl=en"
    • parameter - e.g. "docid" with value "-7246927612831078230"
  • fragment or named anchor - e.g. "#00h02m30s"

related