Stanford Seminar - Persistent and Unforgeable Watermarks for DeepNeural Networks

Опубликовано: 06 Октябрь 2024
на канале: Stanford Online

1,311

Huiying Li
University of Chicago

Emily Wegner
University of Chicago

October 30, 2019
As deep learning classifiers continue to mature, model providers with sufficient data and computation resources are exploring approaches to monetize the development of increasingly powerful models. Licensing models is a promising approach, but requires a robust tool for owners to claim ownership of models, i.e. a watermark. Unfortunately, current watermarks are all vulnerable to piracy attacks, where attackers embed forged watermarks into a model to dispute ownership. We believe properties of persistence and piracy resistance are critical to watermarks, but are fundamentally at odds with the current way models are trained and tuned.
In this work, we propose two new training techniques (out-of-bound values and null-embedding) that provide persistence and limit the training of certain inputs into trained models. We then introduce wonder filters, a new primitive that embeds a persistent bit-sequence into a model, but only at initial training time. Wonder filters enable model owners to embed a bit-sequence generated from their private keys into a model at training time. Attackers cannot remove wonder filters via tuning, and cannot add their own filters to pretrained models. We provide analytical proofs of key properties, and experimentally validate them over a variety of tasks and models. Finally, we explore a number of adaptive counter-measures, and show our watermark remains robust.

View the full playlist: • Stanford EE380-Colloquium on Computer... "

0:00 Introduction
0:39 DNNS ARE INCREASINGLY POPULAR
0:56 DEEP NEURAL NETWORK (DNN)
1:39 DNNS ARE HARD TO TRAIN
2:43 TWO WAYS TO BUY MODELS FROM COMPANIES
4:43 IP PROTECTION FOR MODEL OWNER
5:41 WATERMARKS ARE WIDELY USED FOR OWNERSHIP PROOF
6:53 THREAT MODEL
7:50 ATTACKS ON WATERMARKS
9:35 EMBED WATERMARK BY REGULARIZER
10:56 EMBED WATERMARK USING BACKDOOR
12:15 EMBED WATERMARK USING CRYPTOGRAPHIC COMMITMENTS
12:59 PROPERTIES
13:35 CHALLENGE
14:28 OUTLINE
14:49 TWO NEW TRAINING TECHNIQUES
15:25 WHAT ARE OUT-OF-BOUND VALUES?
15:59 WHY OUT-OF-BOUND VALUES?
17:35 WHAT IS NULL EMBEDDING?
18:14 WHY NULL EMBEDDING?
19:17 USING NULL EMBEDDING
20:27 WONDER FILTERS: HOW TO DESIGN THE PATTERN
20:54 WONDER FILTERS: HOW TO EMBED THE PATTERN
21:42 WATERMARK DESIGN
22:19 WATERMARK - GENERATION
22:50 WATERMARK - INJECTION
23:34 WATERMARK - VERIFICATION
24:56 REQUIREMENTS
26:35 EVALUATION TASKS AND METRICS
27:37 LOW DISTORTION AND RELIABILITY
28:18 NO FALSE POSITIVES
30:34 AUTHENTICATION
31:14 PIRACY RESISTANCE
32:50 PERSISTENCE
37:14 CONCLUSION