uniform resource identifier

Thursday, July 30, 2009


A URI is a compact representation of a resource available to your application on the intranet or Internet. The Uri class defines the properties and methods for handling URIs, including parsing, comparing, and combining. The Uri class properties are read-only; to create a modifiable object, use the UriBuilder class.

Relative URIs (for example, "/new/index.htm") must be expanded with respect to a base URI so that they are absolute. The MakeRelative method is provided to convert absolute URIs to relative URIs when necessary.

The Uri constructors do not escape URI strings if the string is a well-formed URI including a scheme identifier.

The Uri properties return a canonical data representation in escaped encoding, with all characters with Unicode values greater than 127 replaced with their hexadecimal equivalents. To put the URI in canonical form, the Uri constructor performs the following steps:

*

Converts the URI scheme to lowercase.
*

Converts the host name to lowercase.
*

If the host name is an IPv6 address, the canonical IPv6 address is used. ScopeId and other optional IPv6 data are removed.
*

Removes default and empty port numbers.
*

Canonicalizes the path for hierarchical URIs by compacting sequences such as /./, /../, //, including escaped representations. Note that there are some schemes for which escaped representations are not compacted.
*

For hierarchical URIs, if the host is not terminated with a forward slash (/), one is added.
*

By default, any reserved characters in the URI are escaped in accordance with RFC 2396. This behavior changes if International Resource Identifiers or International Domain Name parsing is enabled in which case reserved characters in the URI are escaped in accordance with RFC 3986 and RFC 3987.

As part of canonicalization in the constructor for some schemes, escaped representations are compacted. The schemes for which URI will compact escaped sequences include the following: file, http, https, net.pipe, and net.tcp. For all other schemes, escaped sequences are not compacted. For example: if you percent encode the two dots ".." as "%2E%2E" then the URI constructor will compact this sequence for some schemes. For example, the following code sample shows a URI constructor for the http scheme.

Uri uri = new Uri("http://myUrl/%2E%2E/%2E%2E");
Console.WriteLine(uri.AbsoluteUri);
Console.WriteLine(uri.PathAndQuery);


When this code is executed, it returns the following output with the escaped sequence compacted.

http://myUrl/
/


The following code example shows a URI constructor for the ftp scheme:

Uri uri = new Uri("ftp://myUrl/%2E%2E/%2E%2E");
Console.WriteLine(uri.AbsoluteUri);
Console.WriteLine(uri.PathAndQuery);


When this code is executed, it returns the following output with the escaped sequence not compacted.

ftp://myUrl/%2E%2E/%2E%2E
/%2E%2E/%2E%2E


You can transform the contents of the Uri class from an escape encoded URI reference to a readable URI reference by using the ToString method. Note that some reserved characters might still be escaped in the output of the ToString method. This is to support unambiguous reconstruction of a URI from the value returned by ToString.

Some URIs include a fragment identifier or a query or both. A fragment identifier is any text that follows a number sign (#), not including the number sign; the fragment text is stored in the Fragment property. Query information is any text that follows a question mark (?) in the URI; the query text is stored in the Query property.

In the .NET Framework version 1.1, if the string specified to a constructor contains an unknown scheme and "c:\", the Uri class inserts "//" after the colon. For example, the URI xyz:c:\abc is converted to xyz://c:/abc. In the .NET Framework version 2.0, this behavior has been removed, and the example string is converted to xyz:c:/abc.
NoteNote:

The URI class supports the use of IP addresses in both quad-notation for IPv4 protocol and colon-hexadecimal for IPv6 protocol. Remember to enclose the IPv6 address in square brackets, as in http://[::1].
International Resource Identifier Support

Web addresses are typically expressed using uniform resource identifiers that consist of a very restricted set of characters:

*

Upper and lower case ASCII letters from the English alphabet.
*

Digits from 0 to 9.
*

A small number of other ASCII symbols.

The specifications for URIs are documented in RFC 2396, RFC 2732, RFC 3986, and RFC 3987 published by the Internet Engineering Task Force (IETF).

With the growth of the Internet, there is a growing need to identify resources using languages other than English. Identifiers which facilitate this need and allow non-ASCII characters (characters in the Unicode/ISO 10646 character set) are known as International Resource Identifiers (IRIs). The specifications for IRIs are documented in RFC 3987 published by IETF. Using IRIs allows a URL to contain Unicode characters.

The existing Uri class has been extended in .NET Framework v3.5, 3.0 SP1, and 2.0 SP1 to provide IRI support based on RFC 3987. Current users will not see any change from the .NET Framework 2.0 behavior unless they specifically enable IRI. This ensures application compatibility with prior versions of the .NET Framework.

To enable support for IRI, the following two changes are required:

1.

Add the following line to the machine.config file under the .NET Framework 2.0 directory



2.

Specify whether you want Internationalized Domain Name (IDN) parsing applied to the domain name and whether IRI parsing rules should be applied. This can be done in the machine.config or in the app.config file. For example, add the following:








Enabling IDN will convert all Unicode labels in a domain name to their Punycode equivalents. Punycode names contain only ASCII characters and always start with the xn-- prefix. The reason for this is to support existing DNS servers on the Internet, since most DNS servers only support ASCII characters (see RFC 3940).

Enabling IRI and IDN affects the value of the Uri..::.DnsSafeHost property. Enabling IRI and IDN can also change the behavior of the Equals, OriginalString, GetComponents, and IsWellFormedOriginalString methods.

There are three possible values for IDN depending on the DNS servers that are used:

*

idn enabled = All

This value will convert any Unicode domain names to their Punycode equivalents (IDN names).
*

idn enabled = AllExceptIntranet

This value will convert all Unicode domain names not on the local Intranet to use the Punycode equivalents (IDN names). In this case to handle international names on the local Intranet, the DNS servers that are used for the Intranet should support Unicode name resolution.
*

idn enabled = None

This value will not convert any Unicode domain names to use Punycode. This is the default value which is consistent with the .NET Framework 2.0 behaviour.

Enabling IRI parsing (iriParsing enabled = true) will do normalization and character checking according to the latest IRI rules in RFC 3986 and RFC 3987. The default value is false and will do normalization and character checking according to RFC 2396 and RFC 2732 (for IPv6 literals).

IRI and IDN processing in the Uri class can also be controlled using the System.Configuration..::.IriParsingElement , System.Configuration..::.IdnElement , and System.Configuration..::.UriSection configuration setting classes. The System.Configuration..::.IriParsingElement setting enables or disables IRI processing in the Uri class. The System.Configuration..::.IdnElement setting enables or disables IDN processing in the Uri class. The System.Configuration..::.IriParsingElement setting also indirectly controls IDN. IRI processing must be enabled for IDN processing to be possible. If IRI processing is disabled, then IDN processing will be set to the default setting where the .NET Framework 2.0 behavior is used for compatibility and IDN names are not used.

The configuration setting for the System.Configuration..::.IriParsingElement and System.Configuration..::.IdnElement will be read once when the first System..::.Uri class is constructed. Changes to configuration settings after that time are ignored.

The System..::.GenericUriParser class has also been extended to allow creating a customizable parser that supports IRI and IDN. The behavior of a System..::.GenericUriParser object is specified by passing a bitwise combination of the values available in the System..::.GenericUriParserOptions enumeration to the System..::.GenericUriParser constructor. The GenericUriParserOptions..::.IriParsing type indicates the parser supports the parsing rules specified in RFC 3987 for International Resource Identifiers (IRI). Whether IRI is used is dictated by the configuration values previously discussed.

The GenericUriParserOptions..::.Idn type indicates the parser supports Internationalized Domain Name (IDN) parsing (IDN) of host names. Whether IDN is used is dictated by the configuration values previously discussed.
Performance Considerations

If you use a Web.config file that contains URIs to initialize your application, additional time is required to process the URIs if their scheme identifiers are nonstandard. In such a case, initialize the affected parts of your application when the URIs are needed, not at start time.

Notes to Callers:

Because of security concerns, your application should use caution when accepting Uri instances from untrusted sources and with dontEscape set to true.You can check a URI string for validity by calling the IsWellFormedOriginalString method.

Windows Mobile for Pocket PC, Windows Mobile for Smartphone, Windows CE Platform Note:

The .NET Compact Framework does not differentiate between relative and absolute paths. Also, the .NET Compact Framework processes URI strings prefixed by the file:// scheme differently from the full .NET Framework. A relative file://myfile specification resolves as \\myfile. Using file:///myfile (three slashes) resolves as \myfile in the root directory. To ensure successful operations, specify absolute path information.
Examples

The following example creates an instance of the Uri class and uses it to create a WebRequest instance.
Visual Basic

Dim siteUri As New Uri("http://www.contoso.com/")

Dim wr As WebRequest = WebRequest.Create(siteUri)


C#

Uri siteUri = new Uri("http://www.contoso.com/");

WebRequest wr = WebRequest.Create(siteUri);


Visual C++

Uri^ siteUri = gcnew Uri( "http://www.contoso.com/" );
WebRequest^ wr = WebRequest::Create( siteUri );

JScript

var siteUri : Uri = new Uri("http://www.contoso.com/");
var wr : WebRequest = WebRequest.Create(siteUri);
share this post
Share to Facebook Share to Twitter Share to Google+ Share to Stumble Upon Share to Evernote Share to Blogger Share to Email Share to Yahoo Messenger More...

0 comments: