S&P 2026 Cycle 2 Paper #983 Reviews and Comments
===========================================================================
Paper #983 Risky Gaze: Understanding and Mitigating Photosensitive Risks
from User-Generated Content on Video Platforms


Review #983A
===========================================================================

Paper Summary
-------------
This paper presents RiskyGaze, a system for detecting videos that may pose photosensitive epilepsy risks by formally encoding existing accessibility guidelines (e.g., WCAG-style flash and color thresholds) and applying them to user-generated video content. It presents a formal specification of seizure-triggering conditions, implements a detection tool based on this specification, and evaluates it by comparing against existing tools, testing platform uploads of guideline-violating videos, and analyzing how creative effects and video transformations can lead to threshold violations. The paper argues that current platforms inadequately detect or mitigate seizure-triggering content.

Technical Correctness
---------------------
3. Fixable Major Issues

Technical Correctness Comments
------------------------------
The analysis of how seizure-triggering effects could be added to videos and the extraction of formal rules for marking a video as unsafe seems sound. However, there seems to be an error in the policy specification for luminance and chromatic sensitivity. Specifically, the definitions of opposing changes for both luminance flashes and red flashes require simultaneously that the signal increases and then decreases, and decreases and then increases across three consecutive frames. As written, these conditions are mutually exclusive and cannot be satisfied. The intended condition appears to be a logical OR, but the specification uses AND.

Scientific Contribution
-----------------------
8. Other

Scientific Contribution Comments
--------------------------------
The paper makes a careful and technically detailed attempt to formalize photosensitive epilepsy guidelines and to apply them to real-world video content from TikTok. From a safety and accessibility perspective, this is a useful contribution.

Presentation
------------
2. Minor Flaws in Presentation

Presentation Comments
---------------------
The paper is generally well organized and readable, but several strong claims are stated categorically despite a limited evaluation scope. In addition, the distinction between security, safety, and accessibility is not always clearly articulated, which contributes to ambiguity about the paper’s goals and positioning. The paper includes extensive detail on how videos are created, edited, and shared on modern platforms. While informative, much of this material describes well-known functionality and does not clearly contribute to the paper’s technical contributions.

Comments to Authors
-------------------
Thank you for the submission to S&P’26. The paper addresses an important problem related to photosensitive epilepsy risks on modern video platforms, and the effort to formally encode accessibility guidelines is thoughtful and technically interesting.

That said, I have concerns about the paper’s framing and assumptions.

First, I'm confused about the paper's threat model. While seizure-triggering videos are framed as attacks, the evaluation largely focuses on guideline violations arising from creative effects or platform-provided transformations, which may be used by benign users. The paper does not distinguish between maliciously crafted content and benign triggering content. From a security perspective, this distinction is important, as accidental violations do not imply adversarial behavior or system compromise. As a result, the problem appears more closely related to accessibility compliance and safety engineering than to security. 

Second, the paper relies heavily on the assumption that faithfully following established guidelines is sufficient to reduce seizure risk for most viewers. This assumption forms the basis for the entire evaluation. Hence, the system ultimately detects policy violations rather than demonstrating real-world harm reduction, and the paper does not discuss how false positives or false negatives translate to actual user safety.

Another issue is that the defense assumes platforms can inspect all delivered videos and operate in a predictable deployment environment where the provider knows supported devices and typical viewing conditions. That’s a reasonable modeling start, but it should be clearly scoped as an assumption that limits guarantees (especially given user-driven brightness/speed, third-party clients, casting, etc.).

Much of the evaluation relies on synthetic or transformed videos (e.g., applying effects, changing speed, brightness, or orientation). While useful for stress-testing thresholds, it is unclear how representative these cases are of real-world harmful content or attacker behavior. The paper would benefit from a discussion of how these transformations map to realistic threats or deployment scenarios.

Lastly, the paper focuses mostly on platform-side automated detection and does not discuss alternative or complementary mitigation strategies, such as user-adjustable playback settings or personalized sensitivity controls, which could address heterogeneity in user risk more directly.

Recommended Decision
--------------------
3. Weak Reject (Can be Convinced by a Champion)

Reviewer Confidence
-------------------
2. Highly Confident

Should this submission be reviewed by the Research Ethics Committee?
--------------------------------------------------------------------
1. No


Review #983B
===========================================================================

Paper Summary
-------------
Unfortunately, social media video content is being exploited by malicious actors to target users with photosensitivity, triggering life threatening seizures. While this risk is known, detection techniques still remain ineffective. To help fill the gap, this research identifies the vulnerabilities in 15 popular platforms that can be used to conduct such attacks. Given the acquired insights, the researchers develop PRISM, a content moderating system to identify seizure triggering videos.

Technical Correctness
---------------------
3. Fixable Major Issues

Technical Correctness Comments
------------------------------
1. Content delivery pipeline and threat model realism
I appreciated the breakdown of the content delivery pipeline in Section 4 and found the explanation clear and well-structured. However, it remains unclear how this pipeline maps to real-world attacker behavior. Is there empirical evidence indicating where malicious actors typically intervene in practice? For example, do attackers primarily upload malicious videos directly, or do they exploit vulnerabilities during processing or playback stages? How frequently does malicious activity occur at each stage of the pipeline?
While the pipeline analysis is helpful conceptually, grounding it in observed attacker behavior or existing data would strengthen the technical validity of the threat model.

2. Dataset construction and realism of seizure-triggering videos
A major concern is the lack of clarity regarding how the evaluation dataset was constructed and whether it realistically represents seizure-triggering videos. While I fully acknowledge the ethical risks involved and understand the decision not to publish epilepsy-triggering content, this choice raises questions about the validity of the evaluation dataset.
The authors state that they generated 40 videos using Python and OpenCV that violate relevant guidelines. Could the authors clarify which specific guidelines were violated and how these violations were assessed? More importantly, how was it determined that these videos would plausibly trigger seizures?
At the beginning of Section 5, the authors state that they employ PEAT, but later it becomes unclear whether PEAT was consistently used throughout the experiments. Clarifying the exact role of PEAT in the evaluation pipeline would help resolve this ambiguity.
The evaluated attack surface would benefit from greater transparency. I recommend including a table that lists each tested visual effect or alteration along with its corresponding outcome. This would make it easier to understand which manipulations were effective and why. Table 4 is a good example of this type of clarity, and a similar approach elsewhere would strengthen the evaluation.

3. Definition and uncertainty of seizure triggers
The paper states that “a video is potentially seizure triggering if there are flashes that exceed the luminance or color threshold, and at the same time, the flashes exceed the area and frequency threshold.” This definition appears broad, and as the authors themselves note, these conditions are not well understood. As a result, it is difficult to assess how reliably PRISM can identify seizure-triggering content if the underlying criteria remain uncertain.
This uncertainty also raises a broader concern: if the precise indicators of seizure-triggering content are largely unknown, how confident can we be in PRISM’s effectiveness? This may also help explain why existing social media platforms have not yet implemented similar detection mechanisms.

4. Threshold selection for PRISM
While the authors do cite prior work to justify the selected thresholds for color, luminance, frequency, and duration, there always exists substantial variation and disagreement in the literature. It is unclear how the authors decided which studies or thresholds to adopt when constructing PRISM.
Could the authors clarify the criteria used to select among prior studies with differing results? For example, were certain thresholds chosen based on recency, empirical validation, conservativeness, or alignment with specific clinical guidelines? Without an explicit rationale for privileging some sources over others, it is difficult to assess whether the chosen thresholds are appropriate or robust. Clarifying this decision making process would strengthen confidence in the technical foundations of PRISM, particularly given how critical these parameters are to the system’s effectiveness.
Finally, it appears that the system was not evaluated with human participants. Were any medical experts or epilepsy researchers involved in defining the criteria or validating the assumptions underlying PRISM? Given the medical severity of the problem being addressed, clarification on whether clinical expertise informed the design and evaluation is important for assessing the system’s correctness and applicability.

Scientific Contribution
-----------------------
3. Creates a New Tool to Enable Future Science
5. Identifies an Impactful Vulnerability

Presentation
------------
1. No Flaws in Presentation

Presentation Comments
---------------------
The paper was well written and easy to follow.

Comments to Authors
-------------------
Thank you authors for submitting this paper. This research addresses a very important topic, that impacts not only a cyber threat but also physical threat of users of common users in social media.T he work is strengthened by a clear breakdown of the content delivery pipeline and a thoughtful discussion of the attack surface. However, key methodological concerns remain regarding the realism and validation of the evaluation dataset, the justification of detection thresholds amid conflicting prior work, and the lack of human or clinical validation, which together limit confidence in the technical correctness and real-world applicability of the proposed system. Please read my detailed comments above for more detail.

Recommended Decision
--------------------
3. Weak Reject (Can be Convinced by a Champion)

Reviewer Confidence
-------------------
3. Fairly Confident

Should this submission be reviewed by the Research Ethics Committee?
--------------------------------------------------------------------
1. No