url-formats: Difference between revisions

From Microformats Wiki
Jump to navigation Jump to search
(draft with documentation of terms used for URL components by the 1994 URL spec)
 
(added HTTP URL syntax list of parts of the URL)
Line 3: Line 3:
URLs are often defined and represented in various systems as a set of various pieces/parts. This page documents the implicit formats from those systems.
URLs are often defined and represented in various systems as a set of various pieces/parts. This page documents the implicit formats from those systems.


== formats ==
== URL specification ==
 
=== URL specification ===
The URL specification is perhaps the most canonical source for the names of the different parts of a URL.
The URL specification is perhaps the most canonical source for the names of the different parts of a URL.


Line 24: Line 22:
** '''search'''
** '''search'''
* '''fragmentid''' - "the hash sign and following"
* '''fragmentid''' - "the hash sign and following"
== HTTP ==
The HTTP specification has a few notes about the format/portions of HTTP URLs.
1996 http://www.ietf.org/rfc/rfc1945.txt - 3.2.1 General Syntax
* '''URI'''
** '''absoluteURI'''
*** '''scheme'''
*** ''':'''
*** '''relativeURI'''
**** '''net_path'''
***** '''//'''
***** '''net_loc'''
***** '''abs_path'''
****** '''/'''
****** '''rel_path'''
******* '''path'''
******** '''fsegment'''
******** '''segment''' (zero or more, if present, preceded by '''/''')
******* '''params''' (if present, preceded by ''';''')
******* '''query''' (if present, preceded by '''?''')
** '''fragment''' (if present, preceded by '''#''')
Also:
* '''http_URL'''
** '''http://'''
** '''host'''
** '''port''' (if present, preceded by ''':''')
** '''abs_path''' (as defined above)
Canonicalization:
* host is lowercased
* :port is omitted if the port is 80
* empty abs_path is replaced with '''/'''


== related ==
== related ==
* [[url]]
* [[url]]

Revision as of 23:35, 21 August 2011

<entry-title>URL formats</entry-title>

URLs are often defined and represented in various systems as a set of various pieces/parts. This page documents the implicit formats from those systems.

URL specification

The URL specification is perhaps the most canonical source for the names of the different parts of a URL.

1994 http://www.w3.org/Addressing/URL/url-spec.txt

Names are quoted literally, dropping any "The" prefix and "part" suffix.

  • PrePrefix - e.g. "URL:". The portion before the "http".
  • Scheme - e.g. "http"
  • :
  • Internet protocol parts
    • // (until the following /)
    • user name (if present, followed by an @ after optional password (see next field)).
    • password (if present, preceded by a :)
    • internet domain name - e.g. "www.w3.org"
    • port number (if present, preceded by a :)
  • Path
    • search
  • fragmentid - "the hash sign and following"

HTTP

The HTTP specification has a few notes about the format/portions of HTTP URLs.

1996 http://www.ietf.org/rfc/rfc1945.txt - 3.2.1 General Syntax

  • URI
    • absoluteURI
      • scheme
      • :
      • relativeURI
        • net_path
          • //
          • net_loc
          • abs_path
            • /
            • rel_path
              • path
                • fsegment
                • segment (zero or more, if present, preceded by /)
              • params (if present, preceded by ;)
              • query (if present, preceded by ?)
    • fragment (if present, preceded by #)

Also:

  • http_URL
    • http://
    • host
    • port (if present, preceded by :)
    • abs_path (as defined above)

Canonicalization:

  • host is lowercased
  • :port is omitted if the port is 80
  • empty abs_path is replaced with /

related