Migration From V4

One significant improvement of Google Safe Browsing v5 over v4 (specifically, the v4 Update API) is data freshness and coverage. Since the protection highly depends on the client-maintained local database, the delay and size of the local database update is the main contributor of the missed protection. In v4, the typical client takes 20 to 50 minutes to obtain the most up-to-date version of threat lists. Unfortunately, phishing attacks spread fast: as of 2021, 60% of sites that deliver attacks live less than 10 minutes. Our analysis shows that around 25-30% of missing phishing protection is due to such data staleness. Further, some devices are not equipped to manage the entirety of the Google Safe Browsing threat lists, which continues to grow larger over time.

If you are currently using the v4 Update API, there is a seamless migration path from v4 to v5 without having to reset or erase the local database. This section documents how to do that.

Converting List Updates

Unlike V4, where lists are identified by the tuple of threat type, platform type, threat entry type, in v5 lists are simply identified by name. This provides flexibility when multiple v5 lists could share the same threat type. Platform types and threat entry types are removed in v5.

In v4, one would use the threatListUpdates.fetch method to download lists. In v5, one would switch to the hashLists.batchGet method.

The following changes should be made to the request:

  1. Remove the v4 ClientInfo object altogether. Instead of supplying a client's identification using a dedicated field, simply use the well-known User-Agent header. While there is no prescribed format for supplying the client identification in this header, we suggest simply including the original client ID and client version separated by a space character or a slash character.
  2. For each v4 ListUpdateRequest object: * Look up the corresponding v5 list name from the available lists and supply that name in the v5 request.
    • Remove unneeded fields such as threat_entry_type or platform_type.
    • The state field in v4 is directly compatible with the v5 versions field. The same byte string that would be sent to the server using the state field in v4 can simply be sent in v5 using the versions field.
    • For the v4 constraints, v5 uses a simplified version called SizeConstraints. Additional fields such as region should be dropped.

The following changes should be made to the response:

  1. The v4 enum ResponseType is simply replaced by a boolean field named partial_update.
  2. The minimum_wait_duration field can now be zero or omitted. If it is, the client is requested to immediately make another request. This only happens when the client specifies in SizeConstraints a smaller constraint on max update size than the max database size.
  3. The logic for decoding Rice-Golomb encoded hashes requires two main adjustments:
    • Endianness and Sorting: In v4, the hashes returned were sorted as little-endian values. In v5, they are treated as big-endian values. Because lexicographical sorting of byte strings is equivalent to numerical sorting of big-endian values, clients no longer need to perform a special sorting step. The custom little-endian sort, like the one in the Chromium v4 implementation, can be removed if previously implemented.
    • Variable Hash Lengths: The decoding algorithm must be updated to support various hash lengths that could be returned in the HashList.compressed_additions field, not just the four byte hash length used in v4. The length of the hashes returned in the response can be determined based on the HashList.metadata.hash_length returned from hashLists.list. Alternatively the naming of the hash list requested also signifies the expected hash sizes returned from the list. See the Local Database page for more details on the hash lists.

Converting Hash Searches

In v4, one would use the fullHashes.find method to get full hashes. The equivalent method in v5 is the hashes.search method.

The following changes should be made to the request:

  1. Structure the code to only send hash prefixes that are exactly 4 bytes in length.
  2. Remove the v4 ClientInfo objects altogether. Instead of supplying a client's identification using a dedicated field, simply use the well-known User-Agent header. While there is no prescribed format for supplying the client identification in this header, we suggest simply including the original client ID and client version separated by a space character or a slash character.
  3. Remove the client_states field. It is no longer necessary.
  4. It is no longer needed to include threat_types and similar fields.

The following changes should be made to the response:

  1. The minimum_wait_duration field has been removed. The client can always issue a new request on an as-needed basis.
  2. The v4 ThreatMatch object has been simplified into the FullHash object.
  3. Caching has been simplified into a single cache duration. See the above procedures for interacting with the cache.