url-formats: Difference between revisions

From Microformats Wiki
Jump to navigation Jump to search
(added HTTP URL syntax list of parts of the URL)
(add CGI terms)
Line 57: Line 57:
* :port is omitted if the port is 80
* :port is omitted if the port is 80
* empty abs_path is replaced with '''/'''
* empty abs_path is replaced with '''/'''
== CGI ==
* http://tools.ietf.org/html/rfc3875
* http://en.wikipedia.org/wiki/Common_Gateway_Interface - has example: http://example.com/cgi-bin/printenv.pl/ponylove?q=20%C001er&moar=kitties
Terms:
* '''script-URI'''
** '''scheme''' same as SERVER_PROTOCOL
** '''://'''
** '''server-name''' - SERVER_NAME
** ''':'''
** '''server-port''' - SERVER_PORT
** '''script-path''' same as SCRIPT_NAME
** '''extra-path''' same as PATH_INFO
** '''?'''
** '''query-string''' - QUERY_STRING
Environment variables:
* '''SERVER_PROTOCOL''' - not the protocol scheme, e.g. "HTTP/1.1"
* '''HTTP_HOST''' - e.g "example.com"
* '''SERVER_PORT''' - e.g. "80"
* '''REMOTE_USER'''
* '''PATH''' - not the URL path, but to the web server on the system
* '''REQUEST_URI''' - e.g. "/cgi-bin/printenv.pl/ponylove?q=20%C001er&moar=kitties"
** '''SCRIPT_NAME''' - e.g. "/cgi-bin/printenv.pl"
** '''PATH_INFO''' - e.g. "/ponylove"
** '''QUERY_STRING''' - e.g. "q=20%C001er&moar=kitties"


== related ==
== related ==
* [[url]]
* [[url]]

Revision as of 23:52, 21 August 2011

<entry-title>URL formats</entry-title>

URLs are often defined and represented in various systems as a set of various pieces/parts. This page documents the implicit formats from those systems.

URL specification

The URL specification is perhaps the most canonical source for the names of the different parts of a URL.

1994 http://www.w3.org/Addressing/URL/url-spec.txt

Names are quoted literally, dropping any "The" prefix and "part" suffix.

  • PrePrefix - e.g. "URL:". The portion before the "http".
  • Scheme - e.g. "http"
  • :
  • Internet protocol parts
    • // (until the following /)
    • user name (if present, followed by an @ after optional password (see next field)).
    • password (if present, preceded by a :)
    • internet domain name - e.g. "www.w3.org"
    • port number (if present, preceded by a :)
  • Path
    • search
  • fragmentid - "the hash sign and following"

HTTP

The HTTP specification has a few notes about the format/portions of HTTP URLs.

1996 http://www.ietf.org/rfc/rfc1945.txt - 3.2.1 General Syntax

  • URI
    • absoluteURI
      • scheme
      • :
      • relativeURI
        • net_path
          • //
          • net_loc
          • abs_path
            • /
            • rel_path
              • path
                • fsegment
                • segment (zero or more, if present, preceded by /)
              • params (if present, preceded by ;)
              • query (if present, preceded by ?)
    • fragment (if present, preceded by #)

Also:

  • http_URL
    • http://
    • host
    • port (if present, preceded by :)
    • abs_path (as defined above)

Canonicalization:

  • host is lowercased
  • :port is omitted if the port is 80
  • empty abs_path is replaced with /

CGI

Terms:

  • script-URI
    • scheme same as SERVER_PROTOCOL
    • ://
    • server-name - SERVER_NAME
    • :
    • server-port - SERVER_PORT
    • script-path same as SCRIPT_NAME
    • extra-path same as PATH_INFO
    • ?
    • query-string - QUERY_STRING


Environment variables:

  • SERVER_PROTOCOL - not the protocol scheme, e.g. "HTTP/1.1"
  • HTTP_HOST - e.g "example.com"
  • SERVER_PORT - e.g. "80"
  • REMOTE_USER
  • PATH - not the URL path, but to the web server on the system
  • REQUEST_URI - e.g. "/cgi-bin/printenv.pl/ponylove?q=20%C001er&moar=kitties"
    • SCRIPT_NAME - e.g. "/cgi-bin/printenv.pl"
    • PATH_INFO - e.g. "/ponylove"
    • QUERY_STRING - e.g. "q=20%C001er&moar=kitties"


related