url-formats: Difference between revisions

From Microformats Wiki
Jump to navigation Jump to search
(add Python 2 urlparse parameters)
m (→‎URI specification: copy/past typo)
Line 136: Line 136:
*** '''path-rootless''' or
*** '''path-rootless''' or
*** '''path-empty'''
*** '''path-empty'''
* '''query''' (if present, preceded by '''"?"''') e.g. "/over/there"
* '''query''' (if present, preceded by '''"?"''') e.g. "name=ferret"
* '''fragment''' (if present, preceded by  "#") e.g. "nose"
* '''fragment''' (if present, preceded by  "#") e.g. "nose"


== Googler ==
== Googler ==

Revision as of 06:05, 24 August 2011

<entry-title>URL formats</entry-title>

URLs are often defined and represented in various systems as a set of various pieces/parts. This page documents the implicit formats from those systems.

URL specification

The URL specification is perhaps the most canonical source for the names of the different parts of a URL.

1994 http://www.w3.org/Addressing/URL/url-spec.txt

Names are quoted literally, dropping any "The" prefix and "part" suffix.

  • PrePrefix - e.g. "URL:". The portion before the "http".
  • Scheme - e.g. "http"
  • :
  • Internet protocol parts
    • // (until the following /)
    • user name (if present, followed by an @ after optional password (see next field)).
    • password (if present, preceded by a :)
    • internet domain name - e.g. "www.w3.org"
    • port number (if present, preceded by a :)
  • Path
    • search
  • fragmentid - "the hash sign and following"

HTTP

The HTTP specification has a few notes about the format/portions of HTTP URLs.

1996 http://www.ietf.org/rfc/rfc1945.txt - 3.2.1 General Syntax

  • URI
    • absoluteURI
      • scheme
      • :
      • relativeURI
        • net_path
          • //
          • net_loc
          • abs_path
            • /
            • rel_path
              • path
                • fsegment
                • segment (zero or more, if present, preceded by /)
              • params (if present, preceded by ;)
              • query (if present, preceded by ?)
    • fragment (if present, preceded by #)

Also:

  • http_URL
    • http://
    • host
    • port (if present, preceded by :)
    • abs_path (as defined above)

Canonicalization:

  • host is lowercased
  • :port is omitted if the port is 80
  • empty abs_path is replaced with /

DOM

1996 https://developer.mozilla.org/en/DOM/window.location#Properties

The window.location object represent the URL of the window's page and thus also has properties (terms) for the different parts/pieces.

Properties:

  • protocol - e.g. "http:"
  • host - e.g. "www.example.com:80"
    • hostname - e.g. "www.example.com"
    • port - e.g. "80"
  • pathname - e.g. "/search"
  • search - e.g. "?q=devmo"
  • hash - e.g. "#test"

CGI

~1997-1999? Common Gateway Interface, specifically, Environment Variables

http://example.com/cgi-bin/printenv.pl/ponylove?q=20%C001er&moar=kitties

Terms:

  • script-URI
    • scheme same as SERVER_PROTOCOL
    • ://
    • server-name - SERVER_NAME
    • :
    • server-port - SERVER_PORT
    • script-path same as SCRIPT_NAME
    • extra-path same as PATH_INFO
    • ?
    • query-string - QUERY_STRING

Environment variables:

  • SERVER_PROTOCOL - not the protocol scheme, e.g. "HTTP/1.1"
  • SERVER_NAME or HTTP_HOST - e.g "example.com"
  • SERVER_PORT - e.g. "80"
  • REMOTE_USER - the username (but not password)
  • PATH - not the URL path, but to the web server on the system
  • REQUEST_URI - e.g. "/cgi-bin/printenv.pl/ponylove?q=20%C001er&moar=kitties"
    • SCRIPT_NAME - e.g. "/cgi-bin/printenv.pl" (first two segments?)
    • PATH_INFO - e.g. "/ponylove" (remainder of path)
    • QUERY_STRING - e.g. "q=20%C001er&moar=kitties"

Python 2

2000[1] Python 2 urlparse

Attributes

  • scheme - e.g. "http"
  • netloc - e.g. "www.cwi.nl:80"
    • username
    • password
    • hostname
    • port
  • path - e.g. "/%7Eguido/Python.html"
  • params
  • query
  • fragment

URI specification

2005 URI Generic Syntax

foo://example.com:8042/over/there?name=ferret#nose

  • scheme - e.g. "foo"
  • ":"
  • hier-part - e.g.
    • "//"
    • authority - e.g. "example.com:8042"
    • path - e.g. "/over/there"
      • path-abempty or
      • path-absolute or
      • path-rootless or
      • path-empty
  • query (if present, preceded by "?") e.g. "name=ferret"
  • fragment (if present, preceded by "#") e.g. "nose"

Googler

2007 Per Matt Cutts's blog post Talk like a Googler: parts of a url: of for example:

http://video.google.co.uk:80/videoplay?docid=-7246927612831078230&hl=en#00h02m30s

Parts of a url:

  • protocol - e.g. "http"
  • host or hostname - e.g. "video.google.co.uk"
    • subdomain - e.g. "video"
    • domain name - e.g. "google.co.uk"
    • top-level domain or TLD - e.g. "uk" (which in this case is also referred to as a country-code top-level domain or ccTLD.
  • port - e.g. "80"
  • path - e.g. "/videoplay"
  • parameters - e.g. "?docid=-7246927612831078230&hl=en"
    • parameter - e.g. "docid" with value "-7246927612831078230"
  • fragment or named anchor - e.g. "#00h02m30s"

related