Documentation updates for working with DataBuffers

Issue: SPR-17409
7 years ago · 4faee165db
3 changed files with 173 additions and 146 deletions
--- a/src/docs/asciidoc/core/core-databuffer-codec.adoc
+++ b/src/docs/asciidoc/core/core-databuffer-codec.adoc
@ -1,156 +1,95 @@
 [[databuffers]]
 = Data Buffers and Codecs
-The `DataBuffer` interface defines an abstraction over byte buffers.
+Java NIO provides `ByteBuffer` but many libraries build their own byte buffer API on top,
-The main reason for introducing it (and not using the standard `java.nio.ByteBuffer` instead) is Netty.
+especially for network operations where reusing buffers and/or using direct buffers is
-Netty does not use `ByteBuffer` but instead offers `ByteBuf` as an alternative.
+beneficial for performance. For example Netty has the `ByteBuf` hierarchy, Undertow uses
-Spring's `DataBuffer` is a simple abstraction over `ByteBuf` that can also be used on non-Netty
+XNIO, Jetty uses pooled byte buffers with a callback to be released, and so on.
-platforms (that is, Servlet 3.1+).
+The `spring-core` module provides a set of abstractions to work with various byte buffer
 APIs as follows:
 * <<databuffers-factory>> abstracts the creation of a data buffer.
 * <<databuffers-buffer>> represents a byte buffer, which may be
 <<databuffers-buffer-pooled,pooled>>.
 * <<databuffers-utils>> offers utility methods for data buffers.
 * <<Codecs>> decode or encode streams data buffer streams into higher level objects.
 [[databuffers-factory]]
 == `DataBufferFactory`
-The `DataBufferFactory` offers functionality to allocate new data buffers as well as to wrap
+`DataBufferFactory` is used to create data buffers in one of two ways:
 existing data.
 The `allocateBuffer` methods allocate a new data buffer with a default or given capacity.
 Though `DataBuffer` implementations grow and shrink on demand, it is more efficient to give the
 capacity upfront, if known.
 The `wrap` methods decorate an existing `ByteBuffer` or byte array.
 Wrapping does not involve allocation. It decorates the given data with a `DataBuffer`
 implementation.
-There are two implementation of `DataBufferFactory`: the `NettyDataBufferFactory`
+. Allocate a new data buffer, optionally specifying capacity upfront, if known, which is
-(for Netty platforms, such as Reactor Netty) and `DefaultDataBufferFactory`
+more efficient even though implementations of `DataBuffer` can grow and shrink on demand.
-(for other platforms, such as Servlet 3.1+ servers).
+. Wrap an existing `byte[]` or `java.nio.ByteBuffer`, which decorates the given data with
 a `DataBuffer` implementation and that does not involve allocation.
 Note that WebFlux applications do not create a `DataBufferFactory` directly but instead
 access it through the `ServerHttpResponse` or the `ClientHttpRequest` on the client side.
 The type of factory depends on the underlying client or server, e.g.
 `NettyDataBufferFactory` for Reactor Netty, `DefaultDataBufferFactory` for others.
 == The `DataBuffer` Interface
-The `DataBuffer` interface is similar to `ByteBuffer` but offers a number of advantages.
+[[databuffers-buffer]]
-Similar to Netty's `ByteBuf`, the `DataBuffer` abstraction offers independent read and write
+== `DataBuffer`
 positions.
 This is different from the JDK's `ByteBuffer`, which exposes only one position for both reading and
 writing and a separate `flip()` operation to switch between the two  I/O operations.
 In general, the following invariant holds for the read position, write position, and the capacity:
-====
+The `DataBuffer` interface offers similar operations as `java.nio.ByteBuffer` but also
-[literal]
+brings a few additional benefits some of which are inspired by the Netty `ByteBuf`.
-[subs="verbatim,quotes"]
+Below is a partial list of benefits:
 --
 	0 <= read position <= write position <= capacity
 --
 ====
-When reading bytes from the `DataBuffer`, the read position is automatically updated in accordance with
+* Read and write with independent positions, i.e. not requiring a call to `flip()` to
-the amount of data read from the buffer.
+alternate between read and write.
-Similarly, when writing bytes to the `DataBuffer`, the write position is updated with the amount of
+* Capacity expanded on demand as with `java.lang.StringBuilder`.
-data written to the buffer.
+* Pooled buffers and reference counting via <<databuffers-buffer-pooled>>.
-Also, when writing data, the capacity of a `DataBuffer` is automatically expanded, in the same fashion as `StringBuilder`,
+* View a buffer as `java.nio.ByteBuffer`, `InputStream`, or `OutputStream`.
-`ArrayList`, and similar types.
+* Determine the index, or the last index, for a given byte.
 Besides the reading and writing functionality mentioned above, the `DataBuffer` also has methods to
 view a (slice of a) buffer as a `ByteBuffer`, an `InputStream`, or an `OutputStream`.
 Additionally, it offers methods to determine the index of a given byte.
 As mentioned earlier, there are two implementation of `DataBufferFactory`: the `NettyDataBufferFactory`
 (for Netty platforms, such as Reactor Netty) and
 `DefaultDataBufferFactory` (for other platforms, such as
 Servlet 3.1+ servers).
 === `PooledDataBuffer`
 The `PooledDataBuffer` is an extension to `DataBuffer` that adds methods for reference counting.
 The `retain` method increases the reference count by one.
 The `release` method decreases the count by one and releases the buffer's memory when the count
 reaches 0.
 Both of these methods are related to reference counting, a mechanism that we explain <<databuffer-reference-counting,later>>.
 Note that `DataBufferUtils` offers useful utility methods for releasing and retaining pooled data
 buffers.
 These methods take a plain `DataBuffer` as a parameter but only call `retain` or `release` if the
 passed data buffer is an instance of `PooledDataBuffer`.
 [[databuffer-reference-counting]]
 ==== Reference Counting
 Reference counting is not a common technique in Java. It is much more common in other programming
 languages, such as Object C and C++.
 In and of itself, reference counting is not complex. It basically involves tracking the number of
 references that apply to an object.
 The reference count of a `PooledDataBuffer` starts at 1, is incremented by calling `retain`,
 and is decremented by calling `release`.
 As long as the buffer's reference count is larger than 0, the buffer is not released.
 When the number decreases to 0, the instance is released.
 In practice, this means that the reserved memory captured by the buffer is returned back to
 the memory pool, ready to be used for future allocations.
 In general, the last component to access a `DataBuffer` is responsible for releasing it.
 Within Spring, there are two sorts of components that release buffers: decoders and transports.
 Decoders are responsible for transforming a stream of buffers into other types (see <<codecs>>),
 and transports are responsible for sending buffers across a network boundary, typically as an HTTP message.
 This means that, if you allocate data buffers for the purpose of putting them into an outbound HTTP
 message (that is, a client-side request or server-side response), they do not have to be released.
 The other consequence of this rule is that if you allocate data buffers that do not end up in the
 body (for instance, because of a thrown exception), you have to release them yourself.
 The following snippet shows a typical `DataBuffer` usage scenario when dealing with methods that
 throw exceptions:
 ====
 [source,java,indent=0]
 [subs="verbatim,quotes"]
 ----
 	DataBufferFactory factory = ...
 	DataBuffer buffer = factory.allocateBuffer(); <1>
 	boolean release = true; <2>
 	try {
  		writeDataToBuffer(buffer); <3>
  		putBufferInHttpBody(buffer);
  		release = false; <4>
 	}
 	finally {
  		if (release) {
 			DataBufferUtils.release(buffer); <5>
 		}
 	}
 	private void writeDataToBuffer(DataBuffer buffer) throws IOException { <3>
 		...
 	}
 ----
 <1> A new buffer is allocated.
 <2> A boolean flag indicates whether the allocated buffer should be released.
 <3> This example method loads data into the buffer. Note that the method can throw an `IOException`.
 Therefore, a `finally` block to release the buffer is required.
 <4> If no exception occurred, we switch the `release` flag to `false` as the buffer is now
 released as part of sending the HTTP body across the wire.
 <5> If an exception did occur, the flag is still set to `true`, and the buffer is released
 here.
 ====
 [[databuffers-buffer-pooled]]
 == `PooledDataBuffer`
 As explained in the Javadoc for
 https://docs.oracle.com/javase/8/docs/api/java/nio/ByteBuffer.html[ByteBuffer],
 byte buffers can be direct or non-direct. Direct buffers may reside outside the Java heap
 which eliminates the need for copying for native I/O operations. That makes direct buffers
 particularly useful for receiving and sending data over a socket, but they're also more
 expensive to create and release, which leads to the idea of pooling buffers.
 `PooledDataBuffer` is an extension of `DataBuffer` that helps with reference counting which
 is essential for byte buffer pooling. How does it work? When a `PooledDataBuffer` is
 allocated the reference count is at 1. Calls to `retain()` increment the count, while
 calls to `release()` decrement it. As long as the count is above 0, the buffer is
 guaranteed not to be released. When the count is decreased to 0, the pooled buffer can be
 released, which in practice could mean the reserved memory for the buffer is returned to
 the memory pool.
 Note that instead of operating on `PooledDataBuffer` directly, in most cases it's better
 to use the convenience methods in `DataBufferUtils` that apply release or retain to a
 `DataBuffer` only if it is an instance of `PooledDataBuffer`.
 === `DataBufferUtils`
 The `DataBufferUtils` class contains various utility methods that operate on data buffers.
 It contains methods for reading a `Flux` of `DataBuffer` objects from an `InputStream` or NIO
 `Channel` and methods for writing a data buffer `Flux` to an `OutputStream` or `Channel`.
 `DataBufferUtils` also exposes `retain` and `release` methods that operate on plain `DataBuffer`
 instances (so that casting to a `PooledDataBuffer` is not required).
-Additionally, `DataBufferUtils` exposes `compose`, which merges a stream of data buffers into one.
+
-For instance, this method can be used to convert the entire HTTP body into a single buffer (and
+[[databuffers-utils]]
-from that, a `String` or `InputStream`).
+== `DataBufferUtils`
-This is particularly useful when dealing with older, blocking APIs.
+
-Note, however, that this puts the entire body in memory, and therefore uses more memory than a pure
+`DataBufferUtils` offers a number of utility methods to operate on data buffers:
-streaming solution would.
+
 * Join a stream of data buffers into a single buffer possibly with zero copy, e.g. via
 composite buffers, if that's supported by the underlying byte buffer API.
 * Turn `InputStream` or NIO `Channel` into `Flux<DataBuffer>`, and vice versa a
 `Publisher<DataBuffer>` into `OutputStream` or NIO `Channel`.
 * Methods to release or retain a `DataBuffer` if the buffer is an instance of
 `PooledDataBuffer`.
 * Skip or take from a stream of bytes until a specific byte count.
@ -158,19 +97,73 @@ streaming solution would.
 [[codecs]]
 == Codecs
-The `org.springframework.core.codec` package contains the two main abstractions for converting a
+The `org.springframework.core.codec` package provides the following stragy interfaces:
-stream of bytes into a stream of objects or vice-versa.
+
-The `Encoder` is a strategy interface that encodes a stream of objects into an output stream of
+* `Encoder` to encode `Publisher<T>` into a stream of data buffers.
-data buffers.
+* `Decoder` to decode `Publisher<DataBuffer>` into a stream of higher level objects.
-The `Decoder` does the reverse: It turns a stream of data buffers into a stream of objects.
+
-Note that a decoder instance needs to consider <<databuffer-reference-counting,reference counting>>.
+The `spring-core` module provides `byte[]`, `ByteBuffer`, `DataBuffer`, `Resource`, and
-
+`String` encoder and decoder implementations. The `spring-web` module adds Jackson JSON,
-Spring comes with a wide array of default codecs (to convert from and to `String`,
+Jackson Smile, JAXB2, Protocol Buffers and other encoders and decoders. See
-`ByteBuffer`, and byte arrays) and codecs that support marshalling libraries such as JAXB and
+<<web-reactive.adoc#webflux-codecs,Codecs>> in the WebFlux section.
-Jackson (with https://github.com/FasterXML/jackson-core/issues/57[Jackson 2.9+ support for non-blocking parsing]).
+
-Within the context of Spring WebFlux, codecs are used to convert the request body into a
+
-`@RequestMapping` parameter or to convert the return type into the response body that is sent back
+
-to the client.
+
-The default codecs are configured in the `WebFluxConfigurationSupport` class. You can
+[[databuffers-using]]
-change them by overriding the `configureHttpMessageCodecs` when you inherit from that class.
+== Using `DataBuffer`
-For more information about using codecs in WebFlux, see <<web-reactive#webflux-codecs>>.
+
 When working with data buffers, special care must be taken to ensure buffers are released
 since they may be <<databuffers-buffer-pooled,pooled>>. We'll use codecs to illustrate
 how that works but the concepts apply more generally. Let's see what codecs must do
 internally to manage data buffers.
 A `Decoder` is the last to read input data buffers, before creating higher level
 objects, and therefore it must release them as follows:
 . If a `Decoder` simply reads each input buffer and is ready to
 release it immediately, it can do so via `DataBufferUtils.release(dataBuffer)`.
 . If a `Decoder` is using `Flux` or `Mono` operators such as `flatMap`, `reduce`, and
 others that prefetch and cache data items internally, or is using operators such as
 `filter`, `skip`, and others that leave out items, then
 `doOnDiscard(PooledDataBuffer.class, DataBufferUtils::release)` must be added to the
 composition chain to ensure such buffers are released prior to being discarded, possibly
 also as a result an error or cancellation signal.
 . If a `Decoder` holds on to one or more data buffers in any other way, it must
 ensure they are released when fully read, or in case an error or cancellation signals that
 take place before the cached data buffers have been read and released.
 Note that `DataBufferUtils#join` offers a safe and efficient way to aggregate a data
 buffer stream into a single data buffer. Likewise `skipUntilByteCount` and
 `takeUntilByteCount` are additional safe methods for decoders to use.
 An `Encoder` allocates data buffers that others must read (and release). So an `Encoder`
 doesn't have much to do. However an `Encoder` must take care to release a data buffer if
 a serialization error occurs while populating the buffer with data. For example:
 ====
 [source,java,indent=0]
 [subs="verbatim,quotes"]
 ----
 	DataBuffer buffer = factory.allocateBuffer();
 	boolean release = true;
 	try {
 		// serialize and populate buffer..
 		release = false;
 	}
 	finally {
 		if (release) {
 			DataBufferUtils.release(buffer);
 		}
 	}
 	return buffer;
 ----
 ====
 The consumer of an `Encoder` is responsible for releasing the data buffers it receives.
 In a WebFlux application, the output of the `Encoder` is used to write to the HTTP server
 response, or to the client HTTP request, in which case releasing the data buffers is the
 responsibility of the code writing to the server response, or to the client request.
 Note that when running on Netty, there are debugging options for
 https://github.com/netty/netty/wiki/Reference-counted-objects#troubleshooting-buffer-leaks[troubleshooting buffer leaks].
--- a/src/docs/asciidoc/web/webflux-websocket.adoc
+++ b/src/docs/asciidoc/web/webflux-websocket.adoc
@ -204,6 +204,22 @@ class ExampleHandler implements WebSocketHandler {
 [[webflux-websocket-databuffer]]
 === `DataBuffer`
 `DataBuffer` is the representation for a byte buffer in WebFlux. The Spring Core part of
 the reference has more on that in the section on
 <<core#databuffers,Data Buffers and Codecs>>. The key point to understand is that on some
 servers like Netty, byte buffers are pooled and reference counted, and must be released
 when consumed to avoid memory leaks.
 When running on Netty, applications must use `DataBufferUtils.retain(dataBuffer)` if they
 wish to hold on input data buffers in order to ensure they are not released, and
 subsequently use `DataBufferUtils.release(dataBuffer)` when the buffers are consumed.
 [[webflux-websocket-server-handshake]]
 === Handshake
 [.small]#<<web.adoc#websocket-server-handshake,Same as in the Servlet stack>>#
--- a/src/docs/asciidoc/web/webflux.adoc
+++ b/src/docs/asciidoc/web/webflux.adoc
@ -671,7 +671,7 @@ to encode and decode HTTP message content.
 application, while a `Decoder` can be wrapped with `DecoderHttpMessageReader`.
 * {api-spring-framework}/core/io/buffer/DataBuffer.html[`DataBuffer`] abstracts different
 byte buffer representations (e.g. Netty `ByteBuf`, `java.nio.ByteBuffer`, etc.) and is
-what all codecs work on. See <<core#databuffers, Data Buffers and Codecs>> in the
+what all codecs work on. See <<core#databuffers,Data Buffers and Codecs>> in the
 "Spring Core" section for more on this topic.
 The `spring-core` module provides `byte[]`, `ByteBuffer`, `DataBuffer`, `Resource`, and
@ -741,7 +741,7 @@ consistently for access to the cached form data versus reading from the raw requ
 [[webflux-codecs-multipart]]
-==== Multipart Data
+==== Multipart
 `MultipartHttpMessageReader` and `MultipartHttpMessageWriter` support decoding and
 encoding "multipart/form-data" content. In turn `MultipartHttpMessageReader` delegates to
@ -772,6 +772,24 @@ comment-only, empty SSE event or any other "no-op" data that would effectively s
 a heartbeat.
 [[webflux-codecs-buffers]]
 ==== `DataBuffer`
 `DataBuffer` is the representation for a byte buffer in WebFlux. The Spring Core part of
 the reference has more on that in the section on
 <<core#databuffers,Data Buffers and Codecs>>. The key point to understand is that on some
 servers like Netty, byte buffers are pooled and reference counted, and must be released
 when consumed to avoid memory leaks.
 WebFlux applications generally do not need to be concerned with such issues, unless they
 consume or produce data buffers directly, as opposed to relying on codecs to convert to
 and from higher level objects. Or unless they choose to create custom codecs. For such
 cases please review the the information in <<core#databuffers,Data Buffers and Codecs>>,
 especially the section on <<core#databuffers-using,Using DataBuffer>>.
 [[webflux-logging]]
 === Logging