Package org.forgerock.bloomfilter


package org.forgerock.bloomfilter

ForgeRock Bloom Filters

Implementations of thread-safe, scalable and rolling Bloom Filters. These are Set-like data structures that can scale to very large numbers of entries while only using a small amount of memory (a few bits) per element. The trade-off is that the set membership operation may report false positives (i.e., it may claim that an item is a member of the set when it isn't). The probability of false positives can be tuned by increasing the amount of memory used.

The BloomFilter interface describes the general contract of bloom filters in more detail, and the BloomFilters utility class provides static factory and builder methods for constructing bloom filters for various requirements.

Example


     BloomFilter<CharSequence> blacklistedSessions = BloomFilters.create(Funnels.stringFunnel(UTF8))
              .withInitialCapacity(10000)         // Initial size
              .withCapacityGrowthFactor(2.0)      // Double size when full
              .withFalsePositiveProbability(0.01) // 1% probability of false positives
              .withWriteBatchSize(1000)           // Batch writes
              .build();

     blacklistedSessions.add("Some session token");

     if (blacklistedSessions.mightContain("Some other session")) {
         // Take steps to confirm if token is actually black listed or not.
     }
 

Scalable and Rolling Bloom Filters

Beyond fixed-capacity Bloom Filters, whose probability of false positives rapidly increases once they have reached capacity, this package also provides scalable and rolling Bloom Filters. The former are an efficient and flexible implementation of the classic Scalable Bloom Filters paper by Almeida et al., Information Processing Letters, 101(6), p.255–261, 2007. The latter are a time-limited variation on this idea, whereby buckets in the scalable bloom filter can expire over time, freeing up memory. The buckets are then recycled ensuring that memory usage is kept reasonable.

Scalable Bloom Filters are useful for storing sets of objects where you do not know a priori the number of elements you might need to store. By dynamically expanding the capacity of the Bloom Filter, as well as reducing the false positive probability of subsequent buckets according to a geometric series, the Scalable Bloom Filter can expand to accommodate orders of magnitude more elements that originally estimated, while preserving the overall configured false positive probability. Use the BloomFilters.BloomFilterBuilder.withFalsePositiveProbabilityScaleFactor(double) and BloomFilters.BloomFilterBuilder.withCapacityGrowthFactor(double) builder methods to configure the scale factors for capacity and false positive probability in these implementations. The defaults (0.8 and 2.0 respectively) provide a good trade off of memory growth and performance.

Rolling Bloom Filters allow elements in a Bloom Filter to expire over time. Use the BloomFilters.BloomFilterBuilder.withExpiryStrategy(org.forgerock.bloomfilter.ExpiryStrategy) method to configure how elements in your Bloom Filter will expire. By default, elements do not expire.

Concurrency Strategies

The implementations provided are currently all thread-safe, and adopt a flexible approach to concurrency control. Two concurrency strategies are currently supported:
  • SYNCHRONIZED - uses synchronized blocks to ensure mutual exclusion of critical sections. For fixed-capacity bloom filters all methods are mutually exclusive. For scalable and rolling bloom filters, individual buckets are independently synchronized, allowing some additional concurrency. This implementation provides moderate read and write performance.
  • COPY_ON_WRITE - provides very fast read access at the cost of extremely expensive write operations, which must make a copy of (part of) the Bloom Filter to apply any changes. Write batching (see below) can be used to amortize write costs over many operations, however this implementation will still create additional temporary garbage and pressure on the garbage collector. Suitable for situations in which read performance (mightContain) is paramount and writes are relatively rare (and can tolerate increased latency).
Use the BloomFilters.BloomFilterBuilder.withConcurrencyStrategy(org.forgerock.bloomfilter.ConcurrencyStrategy) method to specify the concurrency strategy to use. The default is COPY_ON_WRITE.

Write Batching

To compensate for the relatively poor performance of COPY_ON_WRITE concurrency (see previous section), the implementation supports write batching. When enabled, individual calls to the BloomFilter.add(java.lang.Object) method will be buffered in a traditional concurrent collection class until the write batch size is reached. At this point, the buffer will be flushed to the underlying bloom filter implementation in a single operation. This amortizes the cost of copying the underlying collection, at the cost of increased worst-case latencies (when the buffer is flushed) and increased memory churn. The implementation is very highly optimised, supporting very high throughput of both writes and reads, and so is a good choice when throughput is paramount and occasional high write latencies can be tolerated. Use BloomFilters.BloomFilterBuilder.withWriteBatchSize(int) to enable write batching.
See Also: