One significant improvement of Google Safe Browsing v5 over v4 (specifically, the v4 Update API) is data freshness and coverage. Since the protection highly depends on the client-maintained local database, the delay and size of the local database update is the main contributor of the missed protection. In v4, the typical client takes 20 to 50 minutes to obtain the most up-to-date version of threat lists. Unfortunately, phishing attacks spread fast: as of 2021, 60% of sites that deliver attacks live less than 10 minutes. Our analysis shows that around 25-30% of missing phishing protection is due to such data staleness. Further, some devices are not equipped to manage the entirety of the Google Safe Browsing threat lists, which continues to grow larger over time.
If you are currently using the v4 Update API, there is a seamless migration path from v4 to v5 without having to reset or erase the local database. This section documents how to do that.
Converting List Updates
Unlike V4, where lists are identified by the tuple of threat type, platform type, threat entry type, in v5 lists are simply identified by name. This provides flexibility when multiple v5 lists could share the same threat type. Platform types and threat entry types are removed in v5.
In v4, one would use the threatListUpdates.fetch method to download lists. In v5, one would switch to the hashLists.batchGet method.
The following changes should be made to the request:
- Remove the v4
ClientInfoobject altogether. Instead of supplying a client's identification using a dedicated field, simply use the well-known User-Agent header. While there is no prescribed format for supplying the client identification in this header, we suggest simply including the original client ID and client version separated by a space character or a slash character. - For each v4
ListUpdateRequestobject: * Look up the corresponding v5 list name from the available lists and supply that name in the v5 request.- Remove unneeded fields such as
threat_entry_typeorplatform_type. - The
statefield in v4 is directly compatible with the v5versionsfield. The same byte string that would be sent to the server using thestatefield in v4 can simply be sent in v5 using theversionsfield. - For the v4 constraints, v5 uses a simplified version called
SizeConstraints. Additional fields such asregionshould be dropped.
- Remove unneeded fields such as
The following changes should be made to the response:
- The v4 enum
ResponseTypeis simply replaced by a boolean field namedpartial_update. - The
minimum_wait_durationfield can now be zero or omitted. If it is, the client is requested to immediately make another request. This only happens when the client specifies inSizeConstraintsa smaller constraint on max update size than the max database size. - The logic for decoding Rice-Golomb encoded hashes requires two main adjustments:
- Endianness and Sorting: In v4, the hashes returned were sorted as little-endian values. In v5, they are treated as big-endian values. Because lexicographical sorting of byte strings is equivalent to numerical sorting of big-endian values, clients no longer need to perform a special sorting step. The custom little-endian sort, like the one in the Chromium v4 implementation, can be removed if previously implemented.
- Variable Hash Lengths: The decoding algorithm must be updated to support various hash lengths that could be returned in the
HashList.compressed_additionsfield, not just the four byte hash length used in v4. The length of the hashes returned in the response can be determined based on theHashList.metadata.hash_lengthreturned fromhashLists.list. Alternatively the naming of the hash list requested also signifies the expected hash sizes returned from the list. See the Local Database page for more details on the hash lists.
Converting Hash Searches
In v4, one would use the fullHashes.find method to get full hashes. The equivalent method in v5 is the hashes.search method.
The following changes should be made to the request:
- Structure the code to only send hash prefixes that are exactly 4 bytes in length.
- Remove the v4
ClientInfoobjects altogether. Instead of supplying a client's identification using a dedicated field, simply use the well-known User-Agent header. While there is no prescribed format for supplying the client identification in this header, we suggest simply including the original client ID and client version separated by a space character or a slash character. - Remove the
client_statesfield. It is no longer necessary. - It is no longer needed to include
threat_typesand similar fields.
The following changes should be made to the response:
- The
minimum_wait_durationfield has been removed. The client can always issue a new request on an as-needed basis. - The v4
ThreatMatchobject has been simplified into theFullHashobject. - Caching has been simplified into a single cache duration. See the above procedures for interacting with the cache.