From 4faee165db315afe47fc2daca4e1db41ec256c0d Mon Sep 17 00:00:00 2001 From: Rossen Stoyanchev Date: Thu, 25 Oct 2018 23:43:45 -0400 Subject: [PATCH] Documentation updates for working with DataBuffers Issue: SPR-17409 --- .../asciidoc/core/core-databuffer-codec.adoc | 281 +++++++++--------- src/docs/asciidoc/web/webflux-websocket.adoc | 16 + src/docs/asciidoc/web/webflux.adoc | 22 +- 3 files changed, 173 insertions(+), 146 deletions(-) diff --git a/src/docs/asciidoc/core/core-databuffer-codec.adoc b/src/docs/asciidoc/core/core-databuffer-codec.adoc index c530657fccd..8b953296922 100644 --- a/src/docs/asciidoc/core/core-databuffer-codec.adoc +++ b/src/docs/asciidoc/core/core-databuffer-codec.adoc @@ -1,156 +1,95 @@ [[databuffers]] = Data Buffers and Codecs -The `DataBuffer` interface defines an abstraction over byte buffers. -The main reason for introducing it (and not using the standard `java.nio.ByteBuffer` instead) is Netty. -Netty does not use `ByteBuffer` but instead offers `ByteBuf` as an alternative. -Spring's `DataBuffer` is a simple abstraction over `ByteBuf` that can also be used on non-Netty -platforms (that is, Servlet 3.1+). +Java NIO provides `ByteBuffer` but many libraries build their own byte buffer API on top, +especially for network operations where reusing buffers and/or using direct buffers is +beneficial for performance. For example Netty has the `ByteBuf` hierarchy, Undertow uses +XNIO, Jetty uses pooled byte buffers with a callback to be released, and so on. +The `spring-core` module provides a set of abstractions to work with various byte buffer +APIs as follows: +* <> abstracts the creation of a data buffer. +* <> represents a byte buffer, which may be +<>. +* <> offers utility methods for data buffers. +* <> decode or encode streams data buffer streams into higher level objects. + +[[databuffers-factory]] == `DataBufferFactory` -The `DataBufferFactory` offers functionality to allocate new data buffers as well as to wrap -existing data. -The `allocateBuffer` methods allocate a new data buffer with a default or given capacity. -Though `DataBuffer` implementations grow and shrink on demand, it is more efficient to give the -capacity upfront, if known. -The `wrap` methods decorate an existing `ByteBuffer` or byte array. -Wrapping does not involve allocation. It decorates the given data with a `DataBuffer` -implementation. +`DataBufferFactory` is used to create data buffers in one of two ways: -There are two implementation of `DataBufferFactory`: the `NettyDataBufferFactory` -(for Netty platforms, such as Reactor Netty) and `DefaultDataBufferFactory` -(for other platforms, such as Servlet 3.1+ servers). +. Allocate a new data buffer, optionally specifying capacity upfront, if known, which is +more efficient even though implementations of `DataBuffer` can grow and shrink on demand. +. Wrap an existing `byte[]` or `java.nio.ByteBuffer`, which decorates the given data with +a `DataBuffer` implementation and that does not involve allocation. +Note that WebFlux applications do not create a `DataBufferFactory` directly but instead +access it through the `ServerHttpResponse` or the `ClientHttpRequest` on the client side. +The type of factory depends on the underlying client or server, e.g. +`NettyDataBufferFactory` for Reactor Netty, `DefaultDataBufferFactory` for others. -== The `DataBuffer` Interface -The `DataBuffer` interface is similar to `ByteBuffer` but offers a number of advantages. -Similar to Netty's `ByteBuf`, the `DataBuffer` abstraction offers independent read and write -positions. -This is different from the JDK's `ByteBuffer`, which exposes only one position for both reading and -writing and a separate `flip()` operation to switch between the two I/O operations. -In general, the following invariant holds for the read position, write position, and the capacity: +[[databuffers-buffer]] +== `DataBuffer` -==== -[literal] -[subs="verbatim,quotes"] --- - 0 <= read position <= write position <= capacity --- -==== +The `DataBuffer` interface offers similar operations as `java.nio.ByteBuffer` but also +brings a few additional benefits some of which are inspired by the Netty `ByteBuf`. +Below is a partial list of benefits: -When reading bytes from the `DataBuffer`, the read position is automatically updated in accordance with -the amount of data read from the buffer. -Similarly, when writing bytes to the `DataBuffer`, the write position is updated with the amount of -data written to the buffer. -Also, when writing data, the capacity of a `DataBuffer` is automatically expanded, in the same fashion as `StringBuilder`, -`ArrayList`, and similar types. - -Besides the reading and writing functionality mentioned above, the `DataBuffer` also has methods to -view a (slice of a) buffer as a `ByteBuffer`, an `InputStream`, or an `OutputStream`. -Additionally, it offers methods to determine the index of a given byte. - -As mentioned earlier, there are two implementation of `DataBufferFactory`: the `NettyDataBufferFactory` -(for Netty platforms, such as Reactor Netty) and -`DefaultDataBufferFactory` (for other platforms, such as -Servlet 3.1+ servers). - - - -=== `PooledDataBuffer` - -The `PooledDataBuffer` is an extension to `DataBuffer` that adds methods for reference counting. -The `retain` method increases the reference count by one. -The `release` method decreases the count by one and releases the buffer's memory when the count -reaches 0. -Both of these methods are related to reference counting, a mechanism that we explain <>. - -Note that `DataBufferUtils` offers useful utility methods for releasing and retaining pooled data -buffers. -These methods take a plain `DataBuffer` as a parameter but only call `retain` or `release` if the -passed data buffer is an instance of `PooledDataBuffer`. - - -[[databuffer-reference-counting]] -==== Reference Counting - -Reference counting is not a common technique in Java. It is much more common in other programming -languages, such as Object C and C++. -In and of itself, reference counting is not complex. It basically involves tracking the number of -references that apply to an object. -The reference count of a `PooledDataBuffer` starts at 1, is incremented by calling `retain`, -and is decremented by calling `release`. -As long as the buffer's reference count is larger than 0, the buffer is not released. -When the number decreases to 0, the instance is released. -In practice, this means that the reserved memory captured by the buffer is returned back to -the memory pool, ready to be used for future allocations. - -In general, the last component to access a `DataBuffer` is responsible for releasing it. -Within Spring, there are two sorts of components that release buffers: decoders and transports. -Decoders are responsible for transforming a stream of buffers into other types (see <>), -and transports are responsible for sending buffers across a network boundary, typically as an HTTP message. -This means that, if you allocate data buffers for the purpose of putting them into an outbound HTTP -message (that is, a client-side request or server-side response), they do not have to be released. -The other consequence of this rule is that if you allocate data buffers that do not end up in the -body (for instance, because of a thrown exception), you have to release them yourself. -The following snippet shows a typical `DataBuffer` usage scenario when dealing with methods that -throw exceptions: +* Read and write with independent positions, i.e. not requiring a call to `flip()` to +alternate between read and write. +* Capacity expanded on demand as with `java.lang.StringBuilder`. +* Pooled buffers and reference counting via <>. +* View a buffer as `java.nio.ByteBuffer`, `InputStream`, or `OutputStream`. +* Determine the index, or the last index, for a given byte. -==== -[source,java,indent=0] -[subs="verbatim,quotes"] ----- - DataBufferFactory factory = ... - DataBuffer buffer = factory.allocateBuffer(); <1> - boolean release = true; <2> - try { - writeDataToBuffer(buffer); <3> - putBufferInHttpBody(buffer); - release = false; <4> - } - finally { - if (release) { - DataBufferUtils.release(buffer); <5> - } - } - private void writeDataToBuffer(DataBuffer buffer) throws IOException { <3> - ... - } ----- -<1> A new buffer is allocated. -<2> A boolean flag indicates whether the allocated buffer should be released. -<3> This example method loads data into the buffer. Note that the method can throw an `IOException`. -Therefore, a `finally` block to release the buffer is required. -<4> If no exception occurred, we switch the `release` flag to `false` as the buffer is now -released as part of sending the HTTP body across the wire. -<5> If an exception did occur, the flag is still set to `true`, and the buffer is released -here. -==== +[[databuffers-buffer-pooled]] +== `PooledDataBuffer` + +As explained in the Javadoc for +https://docs.oracle.com/javase/8/docs/api/java/nio/ByteBuffer.html[ByteBuffer], +byte buffers can be direct or non-direct. Direct buffers may reside outside the Java heap +which eliminates the need for copying for native I/O operations. That makes direct buffers +particularly useful for receiving and sending data over a socket, but they're also more +expensive to create and release, which leads to the idea of pooling buffers. + +`PooledDataBuffer` is an extension of `DataBuffer` that helps with reference counting which +is essential for byte buffer pooling. How does it work? When a `PooledDataBuffer` is +allocated the reference count is at 1. Calls to `retain()` increment the count, while +calls to `release()` decrement it. As long as the count is above 0, the buffer is +guaranteed not to be released. When the count is decreased to 0, the pooled buffer can be +released, which in practice could mean the reserved memory for the buffer is returned to +the memory pool. +Note that instead of operating on `PooledDataBuffer` directly, in most cases it's better +to use the convenience methods in `DataBufferUtils` that apply release or retain to a +`DataBuffer` only if it is an instance of `PooledDataBuffer`. -=== `DataBufferUtils` -The `DataBufferUtils` class contains various utility methods that operate on data buffers. -It contains methods for reading a `Flux` of `DataBuffer` objects from an `InputStream` or NIO -`Channel` and methods for writing a data buffer `Flux` to an `OutputStream` or `Channel`. -`DataBufferUtils` also exposes `retain` and `release` methods that operate on plain `DataBuffer` -instances (so that casting to a `PooledDataBuffer` is not required). -Additionally, `DataBufferUtils` exposes `compose`, which merges a stream of data buffers into one. -For instance, this method can be used to convert the entire HTTP body into a single buffer (and -from that, a `String` or `InputStream`). -This is particularly useful when dealing with older, blocking APIs. -Note, however, that this puts the entire body in memory, and therefore uses more memory than a pure -streaming solution would. + +[[databuffers-utils]] +== `DataBufferUtils` + +`DataBufferUtils` offers a number of utility methods to operate on data buffers: + +* Join a stream of data buffers into a single buffer possibly with zero copy, e.g. via +composite buffers, if that's supported by the underlying byte buffer API. +* Turn `InputStream` or NIO `Channel` into `Flux`, and vice versa a +`Publisher` into `OutputStream` or NIO `Channel`. +* Methods to release or retain a `DataBuffer` if the buffer is an instance of +`PooledDataBuffer`. +* Skip or take from a stream of bytes until a specific byte count. + @@ -158,19 +97,73 @@ streaming solution would. [[codecs]] == Codecs -The `org.springframework.core.codec` package contains the two main abstractions for converting a -stream of bytes into a stream of objects or vice-versa. -The `Encoder` is a strategy interface that encodes a stream of objects into an output stream of -data buffers. -The `Decoder` does the reverse: It turns a stream of data buffers into a stream of objects. -Note that a decoder instance needs to consider <>. - -Spring comes with a wide array of default codecs (to convert from and to `String`, -`ByteBuffer`, and byte arrays) and codecs that support marshalling libraries such as JAXB and -Jackson (with https://github.com/FasterXML/jackson-core/issues/57[Jackson 2.9+ support for non-blocking parsing]). -Within the context of Spring WebFlux, codecs are used to convert the request body into a -`@RequestMapping` parameter or to convert the return type into the response body that is sent back -to the client. -The default codecs are configured in the `WebFluxConfigurationSupport` class. You can -change them by overriding the `configureHttpMessageCodecs` when you inherit from that class. -For more information about using codecs in WebFlux, see <>. +The `org.springframework.core.codec` package provides the following stragy interfaces: + +* `Encoder` to encode `Publisher` into a stream of data buffers. +* `Decoder` to decode `Publisher` into a stream of higher level objects. + +The `spring-core` module provides `byte[]`, `ByteBuffer`, `DataBuffer`, `Resource`, and +`String` encoder and decoder implementations. The `spring-web` module adds Jackson JSON, +Jackson Smile, JAXB2, Protocol Buffers and other encoders and decoders. See +<> in the WebFlux section. + + + + +[[databuffers-using]] +== Using `DataBuffer` + +When working with data buffers, special care must be taken to ensure buffers are released +since they may be <>. We'll use codecs to illustrate +how that works but the concepts apply more generally. Let's see what codecs must do +internally to manage data buffers. + +A `Decoder` is the last to read input data buffers, before creating higher level +objects, and therefore it must release them as follows: + +. If a `Decoder` simply reads each input buffer and is ready to +release it immediately, it can do so via `DataBufferUtils.release(dataBuffer)`. +. If a `Decoder` is using `Flux` or `Mono` operators such as `flatMap`, `reduce`, and +others that prefetch and cache data items internally, or is using operators such as +`filter`, `skip`, and others that leave out items, then +`doOnDiscard(PooledDataBuffer.class, DataBufferUtils::release)` must be added to the +composition chain to ensure such buffers are released prior to being discarded, possibly +also as a result an error or cancellation signal. +. If a `Decoder` holds on to one or more data buffers in any other way, it must +ensure they are released when fully read, or in case an error or cancellation signals that +take place before the cached data buffers have been read and released. + +Note that `DataBufferUtils#join` offers a safe and efficient way to aggregate a data +buffer stream into a single data buffer. Likewise `skipUntilByteCount` and +`takeUntilByteCount` are additional safe methods for decoders to use. + +An `Encoder` allocates data buffers that others must read (and release). So an `Encoder` +doesn't have much to do. However an `Encoder` must take care to release a data buffer if +a serialization error occurs while populating the buffer with data. For example: + +==== +[source,java,indent=0] +[subs="verbatim,quotes"] +---- + DataBuffer buffer = factory.allocateBuffer(); + boolean release = true; + try { + // serialize and populate buffer.. + release = false; + } + finally { + if (release) { + DataBufferUtils.release(buffer); + } + } + return buffer; +---- +==== + +The consumer of an `Encoder` is responsible for releasing the data buffers it receives. +In a WebFlux application, the output of the `Encoder` is used to write to the HTTP server +response, or to the client HTTP request, in which case releasing the data buffers is the +responsibility of the code writing to the server response, or to the client request. + +Note that when running on Netty, there are debugging options for +https://github.com/netty/netty/wiki/Reference-counted-objects#troubleshooting-buffer-leaks[troubleshooting buffer leaks]. diff --git a/src/docs/asciidoc/web/webflux-websocket.adoc b/src/docs/asciidoc/web/webflux-websocket.adoc index 53c28c601a1..e2a9d9bf202 100644 --- a/src/docs/asciidoc/web/webflux-websocket.adoc +++ b/src/docs/asciidoc/web/webflux-websocket.adoc @@ -204,6 +204,22 @@ class ExampleHandler implements WebSocketHandler { +[[webflux-websocket-databuffer]] +=== `DataBuffer` + +`DataBuffer` is the representation for a byte buffer in WebFlux. The Spring Core part of +the reference has more on that in the section on +<>. The key point to understand is that on some +servers like Netty, byte buffers are pooled and reference counted, and must be released +when consumed to avoid memory leaks. + +When running on Netty, applications must use `DataBufferUtils.retain(dataBuffer)` if they +wish to hold on input data buffers in order to ensure they are not released, and +subsequently use `DataBufferUtils.release(dataBuffer)` when the buffers are consumed. + + + + [[webflux-websocket-server-handshake]] === Handshake [.small]#<># diff --git a/src/docs/asciidoc/web/webflux.adoc b/src/docs/asciidoc/web/webflux.adoc index 35c08b67e1b..92ec1fe7c95 100644 --- a/src/docs/asciidoc/web/webflux.adoc +++ b/src/docs/asciidoc/web/webflux.adoc @@ -671,7 +671,7 @@ to encode and decode HTTP message content. application, while a `Decoder` can be wrapped with `DecoderHttpMessageReader`. * {api-spring-framework}/core/io/buffer/DataBuffer.html[`DataBuffer`] abstracts different byte buffer representations (e.g. Netty `ByteBuf`, `java.nio.ByteBuffer`, etc.) and is -what all codecs work on. See <> in the +what all codecs work on. See <> in the "Spring Core" section for more on this topic. The `spring-core` module provides `byte[]`, `ByteBuffer`, `DataBuffer`, `Resource`, and @@ -741,7 +741,7 @@ consistently for access to the cached form data versus reading from the raw requ [[webflux-codecs-multipart]] -==== Multipart Data +==== Multipart `MultipartHttpMessageReader` and `MultipartHttpMessageWriter` support decoding and encoding "multipart/form-data" content. In turn `MultipartHttpMessageReader` delegates to @@ -772,6 +772,24 @@ comment-only, empty SSE event or any other "no-op" data that would effectively s a heartbeat. +[[webflux-codecs-buffers]] +==== `DataBuffer` + +`DataBuffer` is the representation for a byte buffer in WebFlux. The Spring Core part of +the reference has more on that in the section on +<>. The key point to understand is that on some +servers like Netty, byte buffers are pooled and reference counted, and must be released +when consumed to avoid memory leaks. + +WebFlux applications generally do not need to be concerned with such issues, unless they +consume or produce data buffers directly, as opposed to relying on codecs to convert to +and from higher level objects. Or unless they choose to create custom codecs. For such +cases please review the the information in <>, +especially the section on <>. + + + + [[webflux-logging]] === Logging