url-formats: Difference between revisions
(more CGI details, and dates for each) |
(add Python 2 urlparse parameters) |
||
Line 104: | Line 104: | ||
** '''PATH_INFO''' - e.g. "/ponylove" (remainder of path) | ** '''PATH_INFO''' - e.g. "/ponylove" (remainder of path) | ||
** '''QUERY_STRING''' - e.g. "q=20%C001er&moar=kitties" | ** '''QUERY_STRING''' - e.g. "q=20%C001er&moar=kitties" | ||
== Python 2 == | |||
2000[http://en.wikipedia.org/wiki/Python_%28programming_language%29#History] Python 2 urlparse | |||
* http://docs.python.org/library/urlparse.html | |||
Attributes | |||
* '''scheme''' - e.g. "http" | |||
* '''netloc''' - e.g. "www.cwi.nl:80" | |||
** '''username''' | |||
** '''password''' | |||
** '''hostname''' | |||
** '''port''' | |||
* '''path''' - e.g. "/%7Eguido/Python.html" | |||
* '''params''' | |||
* '''query''' | |||
* '''fragment''' | |||
== URI specification == | |||
2005 URI Generic Syntax | |||
* http://www.ietf.org/rfc/rfc3986.txt with example: | |||
<nowiki>foo://example.com:8042/over/there?name=ferret#nose</nowiki> | |||
* '''scheme''' - e.g. "foo" | |||
* '''":"''' | |||
* '''hier-part''' - e.g. | |||
** '''"//"''' | |||
** '''authority''' - e.g. "example.com:8042" | |||
** '''path''' - e.g. "/over/there" | |||
*** '''path-abempty''' or | |||
*** '''path-absolute''' or | |||
*** '''path-rootless''' or | |||
*** '''path-empty''' | |||
* '''query''' (if present, preceded by '''"?"''') e.g. "/over/there" | |||
* '''fragment''' (if present, preceded by "#") e.g. "nose" | |||
== Googler == | == Googler == |
Revision as of 06:02, 24 August 2011
<entry-title>URL formats</entry-title>
URLs are often defined and represented in various systems as a set of various pieces/parts. This page documents the implicit formats from those systems.
URL specification
The URL specification is perhaps the most canonical source for the names of the different parts of a URL.
1994 http://www.w3.org/Addressing/URL/url-spec.txt
Names are quoted literally, dropping any "The" prefix and "part" suffix.
- PrePrefix - e.g. "URL:". The portion before the "http".
- Scheme - e.g. "http"
- :
- Internet protocol parts
- // (until the following /)
- user name (if present, followed by an @ after optional password (see next field)).
- password (if present, preceded by a :)
- internet domain name - e.g. "www.w3.org"
- port number (if present, preceded by a :)
- Path
- search
- fragmentid - "the hash sign and following"
HTTP
The HTTP specification has a few notes about the format/portions of HTTP URLs.
1996 http://www.ietf.org/rfc/rfc1945.txt - 3.2.1 General Syntax
- URI
- absoluteURI
- scheme
- :
- relativeURI
- net_path
- //
- net_loc
- abs_path
- /
- rel_path
- path
- fsegment
- segment (zero or more, if present, preceded by /)
- params (if present, preceded by ;)
- query (if present, preceded by ?)
- path
- net_path
- fragment (if present, preceded by #)
- absoluteURI
Also:
- http_URL
- http://
- host
- port (if present, preceded by :)
- abs_path (as defined above)
Canonicalization:
- host is lowercased
- :port is omitted if the port is 80
- empty abs_path is replaced with /
DOM
1996 https://developer.mozilla.org/en/DOM/window.location#Properties
The window.location object represent the URL of the window's page and thus also has properties (terms) for the different parts/pieces.
Properties:
- protocol - e.g. "http:"
- host - e.g. "www.example.com:80"
- hostname - e.g. "www.example.com"
- port - e.g. "80"
- pathname - e.g. "/search"
- search - e.g. "?q=devmo"
- hash - e.g. "#test"
CGI
~1997-1999? Common Gateway Interface, specifically, Environment Variables
- http://tools.ietf.org/html/rfc3875
- http://www.citycat.ru/doc/CGI/overview/env.html
- http://en.wikipedia.org/wiki/Common_Gateway_Interface - has example:
http://example.com/cgi-bin/printenv.pl/ponylove?q=20%C001er&moar=kitties
Terms:
- script-URI
- scheme same as SERVER_PROTOCOL
- ://
- server-name - SERVER_NAME
- :
- server-port - SERVER_PORT
- script-path same as SCRIPT_NAME
- extra-path same as PATH_INFO
- ?
- query-string - QUERY_STRING
Environment variables:
- SERVER_PROTOCOL - not the protocol scheme, e.g. "HTTP/1.1"
- SERVER_NAME or HTTP_HOST - e.g "example.com"
- SERVER_PORT - e.g. "80"
- REMOTE_USER - the username (but not password)
- PATH - not the URL path, but to the web server on the system
- REQUEST_URI - e.g. "/cgi-bin/printenv.pl/ponylove?q=20%C001er&moar=kitties"
- SCRIPT_NAME - e.g. "/cgi-bin/printenv.pl" (first two segments?)
- PATH_INFO - e.g. "/ponylove" (remainder of path)
- QUERY_STRING - e.g. "q=20%C001er&moar=kitties"
Python 2
2000[1] Python 2 urlparse
Attributes
- scheme - e.g. "http"
- netloc - e.g. "www.cwi.nl:80"
- username
- password
- hostname
- port
- path - e.g. "/%7Eguido/Python.html"
- params
- query
- fragment
URI specification
2005 URI Generic Syntax
- http://www.ietf.org/rfc/rfc3986.txt with example:
foo://example.com:8042/over/there?name=ferret#nose
- scheme - e.g. "foo"
- ":"
- hier-part - e.g.
- "//"
- authority - e.g. "example.com:8042"
- path - e.g. "/over/there"
- path-abempty or
- path-absolute or
- path-rootless or
- path-empty
- query (if present, preceded by "?") e.g. "/over/there"
- fragment (if present, preceded by "#") e.g. "nose"
Googler
2007 Per Matt Cutts's blog post Talk like a Googler: parts of a url: of for example:
http://video.google.co.uk:80/videoplay?docid=-7246927612831078230&hl=en#00h02m30s
Parts of a url:
- protocol - e.g. "http"
- host or hostname - e.g. "video.google.co.uk"
- subdomain - e.g. "video"
- domain name - e.g. "google.co.uk"
- top-level domain or TLD - e.g. "uk" (which in this case is also referred to as a country-code top-level domain or ccTLD.
- port - e.g. "80"
- path - e.g. "/videoplay"
- parameters - e.g. "?docid=-7246927612831078230&hl=en"
- parameter - e.g. "docid" with value "-7246927612831078230"
- fragment or named anchor - e.g. "#00h02m30s"