HttpClient preference architecture

Quality and extent of the HTTP/1.0 and HTTP/1.1 spec compliance vary significantly among commonly used HTTP agents and HTTP servers. That requires of HttpClient to be able to

  • mimic (mis-)behavior of widely used web browsers;
  • support flexible and configurable level of leniency toward non-critical protocol violations especially in those gray areas of the specification subject to different, at times conflicting, interpretations;
  • apply a different set of parameters to individual HTTP methods, hosts, or client instances using common interface;

HTTP parameters

As of version 3 HttpClient sports a new preference API based on HttpParams interface. All major components of the HttpClient toolkit (agents, host configurations, methods, connections, connection managers) contain a collection of HTTP parameters, which determine the runtime behavior of those components.

HttpClient httpclient = new HttpClient();
HttpVersion ver = (HttpVersion)httpclient.getParams().getParameter("http.protocol.version");

In a nutshell HTTP parameters is a collection of name/object pairs that can be linked with other collections to form a hierarchy. If a particular parameter value has not been explicitly defined in the collection itself, its value will be drawn from the upper level collection of parameters.

HttpClient httpclient = new HttpClient();
httpclient.getParams().setParameter("http.protocol.version", HttpVersion.HTTP_1_1);
httpclient.getParams().setParameter("http.socket.timeout", new Integer(1000));
httpclient.getParams().setParameter("http.protocol.content-charset", "UTF-8");

HostConfiguration hostconfig = new HostConfiguration();
hostconfig.setHost("www.yahoo.com");
hostconfig.getParams().setParameter("http.protocol.version", HttpVersion.HTTP_1_0);
		
GetMethod httpget = new GetMethod("/");
httpget.getParams().setParameter("http.socket.timeout", new Integer(5000));
		
try {
  // Internally the parameter collections will be linked together
  // by performing the following operations: 
  // hostconfig.getParams().setDefaults(httpclient.getParams());
  // httpget.getParams().setDefaults(hostconfig.getParams());
  httpclient.executeMethod(hostconfig, httpget);
  System.out.println(httpget.getParams().getParameter("http.protocol.version"));
  System.out.println(httpget.getParams().getParameter("http.socket.timeout"));
  System.out.println(httpget.getParams().getParameter("http.protocol.content-charset"));
} finally {
  httpget.releaseConnection();
}

The code above will produce the following output:

HTTP/1.0
5000
UTF-8

When resolving a parameter HttpClient uses the following algorithm:

  • start parameter lookup from the lowest level at which this parameter applies
  • if the parameter is undefined at the current level, defer its resolution to the next level up in the hierarchy
  • return parameter value from the lowest level in the hierarchy the parameter defined at
  • return null if the parameter is undefined

This architecture enables the users to define generic parameters at a higher level (for instance, at the agent level or host level) and selectively override specific parameters at a lower level (for instance, at the method level). Whenever a parameter is not explicitly defined at a given level, the defaults of the upper levels will apply.

HTTP parameter hierarchy

Presently HttpClient provides the following parameter hierarchy:

global--+                            | DefaultHttpParams
        |                            |
      client                         | HttpClient
        |                            |
        +-- connection manager       | HttpConnectionManager
        |     |                      |
        |     +-- connection         | HttpConnection
        |                            |
        +-- host                     | HostConfiguration
              |                      |
              +-- method             | HttpMethod

Supported parameters

HTTP method parameters

Applicable at the following levels: global -> client -> host -> method

Name Type Description Default

http.useragent

String

The content of the User-Agent header used by the HTTP methods.

official release name, e.g. "Jakarta Commons-HttpClient/3.0"

http.protocol.version

HttpVersion

The HTTP protocol version used per default by the HTTP methods.

HttpVersion.HTTP_1_1

http.protocol.unambiguous-statusline

Boolean

Defines whether HTTP methods should reject ambiguous HTTP status line.

<undefined>

http.protocol.single-cookie-header

Boolean

Defines whether cookies should be put on a single response header.

<undefined>

http.protocol.strict-transfer-encoding

Boolean

Defines whether responses with an invalid Transfer-Encoding header should be rejected.

<undefined>

http.protocol.reject-head-body

Boolean

Defines whether the content body sent in response to HEAD request should be rejected.

<undefined>

http.protocol.head-body-timeout

Integer

Sets period of time in milliseconds to wait for a content body sent in response to HEAD response from a non-compliant server. If the parameter is not set or set to -1 non-compliant response body check is disabled.

<undefined>

http.protocol.expect-continue

Boolean

Activates 'Expect: 100-Continue' handshake for the entity enclosing methods. The 'Expect: 100-Continue' handshake allows a client that is sending a request message with a request body to determine if the origin server is willing to accept the request (based on the request headers) before the client sends the request body.

The use of the 'Expect: 100-continue' handshake can result in noticeable performance improvement for entity enclosing requests (such as POST and PUT) that require the target server's authentication.

'Expect: 100-continue' handshake should be used with caution, as it may cause problems with HTTP servers and proxies that do not support HTTP/1.1 protocol.

<undefined>

http.protocol.credential-charset

String

The charset to be used when encoding credentials. If not defined then the value of the 'http.protocol.element-charset' should be used.

<undefined>

http.protocol.element-charset

String

The charset to be used for encoding/decoding HTTP protocol elements (status line and headers).

'US-ASCII'

http.protocol.content-charset

String

The charset to be used for encoding content body.

'ISO-8859-1'

http.protocol.cookie-policy

String

The cookie policy to be used for cookie management.

CookiePolicy.RFC_2109

http.protocol.warn-extra-input

Boolean

Defines HttpClient's behavior when a response provides more bytes than expected (specified with Content-Length header, for example).

Such surplus data makes the HTTP connection unreliable for keep-alive requests, as malicious response data (faked headers etc.) can lead to undesired results on the next request using that connection.

If this parameter is set to true, any detection of extra input data will generate a warning in the log.

<undefined>

http.protocol.status-line-garbage-limit

Integer

Defines the maximum number of ignorable lines before we expect a HTTP response's status code.

With HTTP/1.1 persistent connections, the problem arises that broken scripts could return a wrong Content-Length (there are more bytes sent than specified). Unfortunately, in some cases, this is not possible after the bad response, but only before the next one. So, HttpClient must be able to skip those surplus lines this way.

Set this to 0 to disallow any garbage/empty lines before the status line. To specify no limit, use Integer#MAX_VALUE.

<undefined>

http.socket.timeout

Integer

Sets the socket timeout (SO_TIMEOUT) in milliseconds to be used when executing the method. A timeout value of zero is interpreted as an infinite timeout.

<undefined>

http.method.retry-handler

HttpMethodRetryHandler

The method retry handler used for retrying failed methods. For details see the Exception handling guide.

default implementation

http.dateparser.patterns

Collection

Date patterns used for parsing. The patterns are stored in a Collection and must be compatible with SimpleDateFormat.

'EEE, dd MMM yyyy HH:mm:ss zzz',

'EEEE, dd-MMM-yy HH:mm:ss zzz',

'EEE MMM d HH:mm:ss yyyy',

'EEE, dd-MMM-yyyy HH:mm:ss z',

'EEE, dd-MMM-yyyy HH-mm-ss z',

'EEE, dd MMM yy HH:mm:ss z',

'EEE dd-MMM-yyyy HH:mm:ss z',

'EEE dd MMM yyyy HH:mm:ss z',

'EEE dd-MMM-yyyy HH-mm-ss z',

'EEE dd-MMM-yy HH:mm:ss z',

'EEE dd MMM yy HH:mm:ss z',

'EEE,dd-MMM-yy HH:mm:ss z',

'EEE,dd-MMM-yyyy HH:mm:ss z',

'EEE, dd-MM-yyyy HH:mm:ss z'

http.method.response.buffer.warnlimit

Integer

The maximum buffered response size (in bytes) that triggers no warning. Buffered responses exceeding this size will trigger a warning in the log. If not set, the limit is 1 MB.

<undefined>

http.method.multipart.boundary

String

The multipart boundary string to use in conjunction with the MultipartRequestEntity. When not set a random value will be generated for each request.

<undefined>

Whenever a parameter is left undefined (no value is explicitly set anywhere in the parameter hierarchy) HttpClient will use its best judgment to pick up a value. This default behavior is likely to provide the best compatibility with widely used HTTP servers.

HTTP connection parameters

Applicable at the following levels: global -> client -> connection manager -> connection

Name Type Description Default

http.socket.timeout

Integer

The default socket timeout (SO_TIMEOUT) in milliseconds which is the timeout for waiting for data. A timeout value of zero is interpreted as an infinite timeout. This value is used when no socket timeout is set in the HTTP method parameters.

<undefined>

http.tcp.nodelay

Boolean

Determines whether Nagle's algorithm is to be used. The Nagle's algorithm tries to conserve bandwidth by minimizing the number of segments that are sent. When applications wish to decrease network latency and increase performance, they can disable Nagle's algorithm (by enabling TCP_NODELAY). Data will be sent earlier, at the cost of an increase in bandwidth consumption and number of packets.

<undefined>

http.socket.sendbuffer

Integer

The value to set on Socket.setSendBufferSize(int). This value is a suggestion to the kernel from the application about the size of buffers to use for the data to be sent over the socket.

<undefined>

http.socket.receivebuffer

Integer

The value to set on Socket.setReceiveBufferSize(int). This value is a suggestion to the kernel from the application about the size of buffers to use for the data to be received over the socket.

<undefined>

http.socket.linger

Integer

The linger time (SO_LINGER) in seconds. This option disables/enables immediate return from a close() of a TCP Socket. Enabling this option with a non-zero Integer timeout means that a close() will block pending the transmission and acknowledgement of all data written to the peer, at which point the socket is closed gracefully. Value 0 implies that the option is disabled. Value -1 implies that the JRE default is used.

<undefined>

http.connection.timeout

Integer

The timeout until a connection is established. A value of zero means the timeout is not used.

<undefined>

http.connection.stalecheck

Boolean

Determines whether stale connection check is to be used. Disabling stale connection check may result in slight performance improvement at the risk of getting an I/O error when executing a request over a connection that has been closed at the server side.

<undefined>

Whenever a parameter is left undefined (no value is explicitly set anywhere in the parameter hierarchy) HttpClient will use its best judgment to pick up a value. This default behavior is likely to provide the best compatibility with widely used HTTP servers.

HTTP connection manager parameters

Applicable at the following levels: global -> client -> connection manager

Name Type Description Default

http.connection-manager.max-per-host

Map

Defines the maximum number of connections allowed per host configuration. These values only apply to the number of connections from a particular instance of HttpConnectionManager.

This parameter expects a value of type Map. The value should map instances of HostConfiguration to Integers. The default value can be specified using ANY_HOST_CONFIGURATION.

<undefined>

http.connection-manager.max-total

Integer

Defines the maximum number of connections allowed overall. This value only applies to the number of connections from a particular instance of HttpConnectionManager.

<undefined>

Whenever a parameter is left undefined (no value is explicitly set anywhere in the parameter hierarchy) HttpClient will use its best judgment to pick up a value. This default behavior is likely to provide the best compatibility with widely used HTTP servers.

Host configuration parameters

Applicable at the following levels: global -> client -> host

Name Type Description Default

http.default-headers

Collection

The request headers to be sent per default with each request. This parameter expects a value of type Collection. The collection is expected to contain HTTP headers

<undefined>

Whenever a parameter is left undefined (no value is explicitly set anywhere in the parameter hierarchy) HttpClient will use its best judgment to pick up a value. This default behavior is likely to provide the best compatibility with widely used HTTP servers.

HTTP client parameters

Applicable at the following levels: global -> client

Name Type Description Default

http.connection-manager.timeout

Long

The timeout in milliseconds used when retrieving an HTTP connection from the HTTP connection manager. 0 means to wait indefinitely.

<undefined>

http.connection-manager.class

Class

The default HTTP connection manager class.

SimpleHttpConnectionManager class

http.authentication.preemptive

Boolean

Defines whether authentication should be attempted preemptively. See authentication guide.

<undefined>

http.protocol.reject-relative-redirect

Boolean

Defines whether relative redirects should be rejected. Although redirects are supposed to be absolute it is common internet practice to use relative URLs.

<undefined>

http.protocol.max-redirects

Integer

Defines the maximum number of redirects to be followed. The limit on number of redirects is intended to prevent infinite loops.

<undefined>

http.protocol.allow-circular-redirects

Boolean

Defines whether circular redirects (redirects to the same location) should be allowed. The HTTP spec is not sufficiently clear whether circular redirects are permitted, therefore optionally they can be enabled.

<undefined>

Whenever a parameter is left undefined (no value is explicitly set anywhere in the parameter hierarchy) HttpClient will use its best judgment to pick up a value. This default behavior is likely to provide the best compatibility with widely used HTTP servers.