IEEE Transactions on Dependable and Secure Computing

Download PDF

Keywords

Costs, Steganography, Security, Distortion, Closed Box, Mathematical Models, Minimization, Black Box, Generative Model, Steganography, Volatility, Comparative Method, Baseline Methods, Volatility Models, Normal Distribution, Kolmogorov Smirnov Test, Single Image, Mean Of Distribution, Multiple Images, Generative Adversarial Networks, Natural Images, Image Generation, Spatial Images, Neighboring Pixels, Latent Vector, Average Error Rate, Basic Cost, JPEG Images, Cover Image, Security Performance, Negative Costs, Object Contour, Secret Message, Image Steganography, Grayscale Images, Linear Sum, Experimental Subject, Variation Map

Abstract

The development of generative AI applications has revolutionized the data environment for steganography, providing a new source of steganographic cover. However, existing generative data-based steganography methods typically require white-box access, rendering them unsuitable for black-box generative models. To overcome this limitation, we propose a novel steganography method for generated images, which leverages the volatility of generative models and is applicable in black-box scenarios. The volatility of generative models refers to the ability to generate a series of images with slight variations by fine-tuning the input parameters of the model. These generated images exhibit varying degrees of volatility in different areas. To resist steganalysis, we mask steganographic modifications by confusing them with the inherent volatility of the model. Specifically, by modeling distributions of generated pixels and estimating the parameters of the distributions, the occurrence probabilities of generated pixels can be obtained, which serve as an effective measure for steganographic modification probabilities to render stego images as indistinguishable as possible from the images producible by the model. Moreover, we further combine it with existing costs to develop a more comprehensive steganographic algorithm. Experimental results show that the proposed method significantly outperforms baseline and comparative methods in resisting both feature-based and CNN-based steganalyzers.

I.   Introduction

Steganography is an effective technique for concealing messages within seemingly innocuous media to avoid detection of hidden messages [1]. Distinct from cryptography [2], steganography not only safeguards the communication content but also disguises the communication act itself. Various digital media prevalent on the Internet can serve as cover for message embedding. Natural images, extensively used across social platforms, have become the most prevalent cover for steganography. The most successful scheme of natural image steganography is the content-adaptive steganography based on the framework of the minimization distortion model [3], which tends to embed secret messages into textured and noisy regions, making it challenging for steganalysis [4] to detect. The framework of the minimization distortion model can be divided into two tasks: 1) defining modification costs of modifying the elements of the cover using heuristic approaches or deep-learning-based approaches, such as WOW [5], UNIWARD [6], HILL [7], ASDL-GAN [8], UT-GAN [9], SPAR-RL [10], MCTSteg [11], and JoPoL [12]; 2) devising practical embedding methods that minimize the total modification cost, including Syndrome Trellis Codes (STC) [13] and Steganographic Polar Codes (SPC) [14].

In recent years, the field of deep generative models [15] has made significant progress, with some models capable of generating visually stunning images. Consequently, generated images are gaining immense popularity and extensive usage. Discord [16] alone has witnessed over a billion unique AI-generated images created by its users, and this is just the tip of the iceberg. The widespread use of generated images has changed the data landscape of the Internet, providing a new source of cover for steganography. As a result, steganographic algorithms based on generated images have emerged.

Most steganography methods based on generated images integrate message embedding with the image generation process. Yang et al. [17] utilized PixelCNN [18] and drove the autoregressive sampling process with the secret messages, yet the generated stego images are of suboptimal quality. Liu et al. [19] disentangled the images into structure and texture, and encoded the messages to the structure's latent vector, generating high-quality stego images with the help of StyleGAN2 [20]. Despite this, their method has low steganographic capacity. Zhou et al. [21] encoded the secret information as a latent vector and transformed it into an image using the Glow model [22], achieving high-capacity information embedding. These generative steganography methods pursue stego images indistinguishable from the model's standard output, thus disguising covert communication behavior as popular behavior of the general public.

However, the majority of image generation services provided by enterprises, such as Midjourney [23], operate in black-box conditions. Existing generative steganography methods require access to the generation process, making them useful only for open-source or self-trained models, which are inconsistent with public usage behavior and pose security risks. A more reasonable way is to use popular generative models for steganography. But black-box conditions do not allow the use of the information, such as the probability distribution of the generative process, so the advantages of generative steganography cannot be exploited. An interesting and challenging problem is what favorable conditions can be brought by black-box generative models for steganography.

In this paper, we propose a novel steganography method for generated images based on the volatility of generative models, which can be applied in black-box scenarios. The volatility of generative models refers to the ability to generate a series of images with slight variations by adjusting the accessible input parameters of the model. These generated images exhibit varying degrees of volatility in different areas. To resist steganalysis, we mask steganographic modifications by confusing them with the volatility of the model. Specifically, the pixels of the generated images approximately obey Gaussian distributions according to Central Limit Theorem [24], which is also verified by our experiment. The parameters of the Gaussian distributions can be estimated by the generated images. Once these distributions are obtained, the occurrence probability of each pixel can be obtained by integrating. We use the occurrence probabilities of generated pixels as steganographic modification probabilities to render stego images as indistinguishable as possible from the images producible by the model. Finally, using the Flipping Lemma [13], we can further calculate the steganographic costs of ±1 modifications, denoted as volatility cost. Nonetheless, the volatility cost considers the security of the steganography only in terms of the degree of volatility, which is one side. For comprehensive consideration of steganographic security, we develop a method that combines the volatility cost with existing costs. Experimental results validate that the proposed method significantly outperforms baseline and comparative methods in defending both feature-based and CNN-based steganalyzers. Our contributions are summarized as follows:

  • We are the first to propose disguising steganographic modifications as the inherent volatility of generative models to improve the security of steganography, which is applicable to black-box generative models.
  • Drawing on the aforementioned concept, we propose the volatility cost as a supplementary perspective to the existing costs. Additionally, a cost combination strategy is designed.
  • Experimental results show that the proposed method can enhance the security of the baseline methods, with an improvement of 3.58%8.31% against SRM [25] and 7.38%18.76% against CNN-based steganalyzers [26].

In the remainder of this paper, Section II introduces the preliminaries. In Section III, we present the proposed adaptive steganography framework based on the volatility of the generative model. Experimental results and analysis are elaborated in Section IV. Section V concludes the paper.

II.   Preliminaries

A. Notations

Throughout the paper, matrices, vectors, and sets are written in bold letters. Elements are written in lowercase italics. Table I lists some notations used in the paper.

TABLE I Notations Used in the Paper

B. The Minimization Distortion Model

In adaptive steganography, elements in different regions are assigned different costs. Given a cover image X=(xij)H×W, the cost introduced by modifying xij to yij is denoted by ρij. In regard to additive steganography, the distortion D(X,Y) is the sum costs of all elements: (1)D(X,Y)=i=1Hj=1Wρij,YY, where Y is the stego image and Y is the set of stego images.

The modification probability is denoted as π(X,Y), thus the steganography capacity can be up to H(π(X,Y)) bits with the expected value of the distortion Eπ(D), where (2)H(π(X,Y))=π(X,Y)log2π(X,Y),(3)Eπ(D)=YYπ(X,Y)D(X,Y).

In the minimization distortion model, minimizing the distortion while embedding fixed-length L bits messages can be formulated as the following optimization problem: minπEπ(D),(4) s.t. H(π(X,Y))=L. This problem can be solved using Lagrange multipliers [27]. For additive steganography, the optimal modification probability πλ is given by (5)πλ(xij,yij)=exp(λ×ρij)yijI+xijexp(λ×ρij), where λ is the Lagrange multiplier determined from the message length constraint, and I is the steganographic modifications. For example, the ±1 embedding operation is ternary embedding with I={1,0,+1}, where 0 denotes no modification. As proven in [3], the entropy is decreasing in λ, so λ can be quickly determined by binary search.

The minimization distortion model gives the relationship between the modification probability and the cost. In turn, the cost can be obtained by converting the modification probability based on Flipping Lemma [13]: (6)ρij(k)=ln(π(yij=xij)π(yij=xij+k)),kI, where ρij(+1), ρij(1), and ρij(0) respectively represent the costs of +1, −1, and no modification.

C. Image Generative Models

Deep generative models are neural networks with multiple hidden layers trained to approximate complex, high-dimensional probability distributions using numerous samples [28]. A representative task is image generation. Deep neural networks based on Generative Adversarial Networks (GANs) [29] have enabled end-to-end trainable image generation. To enable image generative models to synthesize higher resolution images, subsequent works are proposed to use multiple, stacked generators, such as StackGAN [30] and its enhanced version StackGAN++ [31]. Attention mechanism [32] has also emerged as a key factor for improving visual quality. AttnGAN [33], which builds upon StackGAN++ [31] and incorporates attention into a multi-stage refinement pipeline, has shown significant improvements in image generation. Additionally, the transformer-based [34] image generative model, DALL-E [35] has also been proposed with considerable performance.

Recently, Stable Diffusion [36], a text-to-image model based on latent diffusion models, has gained considerable interest. Latent diffusion models generate images through iterative denoising in a latent representation space and subsequently decode the representation into a complete image, enabling rapid text-to-image generation within 10 seconds on consumer-grade GPUs. The breakthrough technology has significantly lowered the barrier to entry for generating high-resolution images and has garnered widespread attention on the Internet. Images generated by Stable Diffusion are widely disseminated, making them become ideal candidate cover for steganography.

D. Steganography With Multiple Images of the Same Scene

It is widely acknowledged that incorporating side information at the sender's end can significantly improve steganographic security in practice [37]. Denemark et al. introduced a new form of side information in their work [38], which involves using multiple images of the same scene. When two versions of the cover image are available, one image is randomly chosen as the cover, while the other serves as the side information. The steganographic cost is adjusted as follows: Step1:Setρij(±1)=ρij(0)(±1),(7)Step2:xij(1)xij(2)ρij(sij)=βρij(0)(sij), where sij=sign(xij(2)xij(1)), and β[0,1] is a modulation factor. The essence of this method is to encourage modifications in areas where the two images are inconsistent.

When the sender acquires N (N>2) images of the same scene, Denemark et al. selected the closest pair of images among the N exposures and applied the aforementioned cost adjustment algorithm. Their experimental subject is natural images, which are obtained by taking pictures of the same scene several times. However, it is practically impossible to obtain two independent samplings of one object due to small differences in exposure time, as famously stated by Heraclitus: “No man ever steps in the same river twice, for it is not the same river and he is not the same man.” [38] Therefore, the authors argue that using multiple natural images may result in more bias, and thus, they choose the two-image scheme. When employing generated images as cover images, it is possible to obtain multiple samples with small sampling bias through generative models. Consequently, N (N>2) samples can be employed for cost adjustment. However, the cost adjustment method proposed by Denemark et al. is only applicable when N=2, and necessitates further investigation when N>2.

III.   The Proposed Method

As shown in the left half of Fig. 1, existing methods consider the neighboring pixels of xij when determining its cost, as shown in the equation below: (8)ρijb=Db(Neib(xij)), where Neib(xij) represents neighboring pixels of xij, and Db() is existing distortion functions based on neighborhood pixels. In this paper, we introduce a novel perspective for defining the steganographic cost, which is based on the volatility of the generated pixels, as shown in the right half of Fig. 1. Generative models offer an opportunity for steganographers to generate multiple images with nuances. We postulate that pixels occupying the same position across different images conform to a certain distribution. If the steganographic modifications move towards regions of higher probability density within the distribution, the resulting stego will more closely resemble the images generated by the given model, thereby enhancing security. Building upon this idea, we propose the volatility cost ρijv: (9)ρijv=Dv(xij1,xij2,,xijn,,xijN), where xijn(n=1,2,,N) denotes the pixel located at position (i,j) within the n-th image, and Dv() is the volatility-based distortion function to be defined.

Graphic: Perspectives of defining the steganographic cost: existing methods based on neighborhood pixels (left half) determine the cost through the analysis of neighboring pixels (denoted as $\text{Neib}(x_{ij})$Neib(xij)). In contrast, the proposed method (right half) defines the cost based on the volatility of generated pixels.

Fig. 1. Perspectives of defining the steganographic cost: existing methods based on neighborhood pixels (left half) determine the cost through the analysis of neighboring pixels (denoted as Neib(xij)). In contrast, the proposed method (right half) defines the cost based on the volatility of generated pixels.

Next, in Section III-A, we put forward the hypothesis of Gaussian distributions. The subsequent section, Section III-B, presents an explicit definition of volatility cost. In Section III-C, we propose the approach of combining the volatility cost with existing costs. Lastly, Section III-D comprehensively introduces the steganography procedure based on the volatility of the generative models.

A. The Hypothesis That Generated Pixels Obey Gaussian Distributions

The generated images obey a certain distribution. Unfortunately, when the generative model is opaque, we cannot directly access the distribution. It is worth noting that numerous generative models, including Stable Diffusion [36], involve multiple random sampling procedures during image generation. According to the Central Limit Theorem, which asserts that the sum of a large number of random variables roughly conforms to a Gaussian distribution, we can reasonably assume that each pixel in the generated image approximately adheres to a Gaussian distribution: xijN(μij,σij2).

To test the hypothesis, the following experiment is performed: N images with subtle differences Xn=(xijn)H×W(n=1,2,3,,N) are generated by Stable Diffusion. The mean x¯ij and standard deviation sij can be obtained using the following equation: (10)x¯ij=1Nn=1Nxijn,(11)sij=1N1n=1N(xijnx¯ij)2.

Then, each pixel is normalized by the corresponding mean and standard deviation, i.e., (12)x^ijn=(xijnx¯ij)/sij.

The normalized pixels are visualized as a histogram, as shown in Fig. 2. The dashed blue line represents the probability density curve fitted to the histogram of normalized pixels, while the solid red line represents the standard Gaussian distribution curve as a reference. The histogram of normalized pixels displays a distribution resembling that of the standard Gaussian distribution, thus supporting the hypothesis that each pixel approximately conforms to a Gaussian distribution.

Graphic: Histogram of standardized pixels. A total of 7,864,320 normalized pixels are visualized from 30 slightly different images generated by Stable Diffusion [36] ($N=30$N=30, $512 \times 512$512×512 pixels/image). The dashed blue line represents the probability density curve fitted to the histogram, while the solid red line serves as a reference, illustrating the standard Gaussian distribution curve.

Fig. 2. Histogram of standardized pixels. A total of 7,864,320 normalized pixels are visualized from 30 slightly different images generated by Stable Diffusion [36] (N=30, 512×512 pixels/image). The dashed blue line represents the probability density curve fitted to the histogram, while the solid red line serves as a reference, illustrating the standard Gaussian distribution curve.

In addition, we conduct a Kolmogorov-Smirnov (KS) test [39]. The KS test is a non-parametric statistical method used to assess whether samples conform to a specific probability distribution. Our null hypothesis (H0) states that the samples follow Gaussian distributions. We generate 30 similar images and randomly select one position, resulting in a total of 30 pixel samples. Through calculation, we obtain a p-value of 0.1975. The p-value represents the significance level of the KS test. Since the p-value is greater than 0.05, the null hypothesis is accepted, indicating that the samples follow Gaussian distributions.

B. The Proposed Volatility Cost

Based on the analysis in the previous section, we assume that xijN(μij,σij2). The mean μij and variance σij2 of the Gaussian distribution can be estimated from the mean and variance of the samples, i.e., μijx¯ij,(13)σij2sij2, where x¯ij and sij can be obtained from (10) and (11).

It is worth noting that during the image generation process, the generative model initially produces floating-point pixel values, which are subsequently rounded to integers prior to being presented to users. As a result, the occurrence probability P(xij) of xij is a definite integral of the Gaussian distribution N(μij,σij2) over the interval [xij0.5,xij+0.5): (14)P(xij)=xij0.5xij+0.512πσije(xμij)22σij2dx.

To enhance the security of steganography, it is crucial to render stego images as indistinguishable as possible from the images producible by the model. To this end, steganographic modifications should be steered towards directions exhibiting higher probabilities of pixel occurrences. As such, the occurrence probabilities of generated pixels can serve as an effective metric for characterizing the probabilities of associated steganographic modifications, i.e., (15)π(yij=xij+k)P^(xij+k),k{1,0,+1} where (16)P^(xij+k)=P(xij+k)P(xij+1)+P(xij)+P(xij1), where P(xij) represents the occurrence probability of xij.

Fig. 3 depicts the occurrence probabilities of generated pixels. Considering that the primary steganographic embedding is a ternary embedding with I={1,0,+1}, the probabilities of xij, xij+1, and xij1 are presented. These values are represented by the colored areas beneath the Gaussian distribution curves.

Graphic: Illustration of the occurrence probabilities of generated pixels.

Fig. 3. Illustration of the occurrence probabilities of generated pixels.

Fig. 3(a) and (b) show the case where the pixel value xij is exactly equal to the mean μij of the distribution, resulting in equal occurrence probabilities of xij+1 and xij1. In such situations, it is appropriate to assign symmetrical steganographic modification probabilities. A comparison between Fig. 3(a) and (b) reveals that the variance in Fig. 3(b) is lower than that in Fig. 3(a), implying a lower entropy for the generated pixel. Hence, a lower steganographic modification probability should be assigned to Fig. 3(b). Fig. 3(c) and (d) display situations where the pixel value xij differs from the mean μij of the distribution. In these cases, xij+1 and xij1 occur with different probabilities, necessitating the assignment of asymmetric modification probabilities for these pixels.

The above setting (15) actually implies that the payload is: (17)payload=1H×Wi=1Hj=1WH(P^(xij+1),P^(xij),P^(xij1)), where H() denotes information entropy, and H and W represent the height and width of the image, respectively. To generalize to various payloads, the modification probabilities need to be transformed into steganographic cost ρijv (called volatility cost), which can be obtained using Flipping Lemma [13]: ρijv(+1)=ln(π(yij=xij)π(yij=xij+1))=ln(P(xij)P(xij+1)),(18)ρijv(1)=ln(π(yij=xij)π(yij=xij1))=ln(P(xij)P(xij1)), where ρijv(+1) and ρijv(1) respectively denote the volatility costs associated with modifying the pixel xij by +1 and −1.

C. A Combination of the Volatility Cost and Existing Costs

As depicted in Fig. 1, the volatility cost and existing costs based on neighborhood pixels are derived from distinct perspectives. The volatility cost is exclusively determined by the occurrence likelihood of pixels, making it a one-sided metric. To achieve a more comprehensive steganographic cost, we combine the volatility cost ρv with existing costs based on neighborhood pixels (denoted as basic costs ρb). The combination of costs can be accomplished through various methods, such as cost multiplication and cost addition. Experimental results indicate that the former exerts excessive influence on cost distribution and is ineffective. Consequently, we adopt the cost addition approach for combining the costs. The resulting combined cost, denoted as ρc, is defined as follows: ρc(+1)=λρv(+1)+(1λ)(αρb(+1)),(19)ρc(1)=λρv(1)+(1λ)(αρb(1)), where ρc(+1) and ρc(1) represent the costs associated with +1 modification and −1 modification, λ is the hyperparameter that determines the proportion of volatility cost, and α is the scaling factor that equalizes the mean of ρb with the volatility cost ρv. α is calculated as follows: (20)α=i,j([ρijvwetcost]ρijv)/i,j[ρijvwetcost]i,j([ρijbwetcost]ρijb)/i,j[ρijbwetcost], where α represents the ratio between the average value of the volatility cost and the basic cost, the Iverson bracket [Q] is defined to be 1 if the logical expression Q is true and 0 otherwise, and “wetcost” represents the cost tending towards infinity, which is excluded to avoid substantial impacts on the mean. By multiplying the basic cost ρijb with the scaling factor α, the resulting scaled basic cost can be combined with the volatility cost by selecting an appropriate value for the hyperparameter λ. A notable advantage of this approach is that the value of λ remains constant regardless of the type of basic cost.

D. Full Process of Steganography Based on the Volatility of Generative Models

Fig. 4 illustrates the full process for steganography based on the volatility of generative models. Although many large-scale image generative models remain black-box to us, they provide users with certain controllable parameters p that facilitate image generation. By fixing the random seed, a deterministic parameter value val will generate a deterministic image. By fine-tuning the parameter value, i.e., (21)val+δnp,(n=1,2,,N), where valδn, N similar images with minor differences can be generated. By utilizing (10) and (11), we can derive a mean map and a variance map from the generated images. As discussed in the previous section, we assume that the generated pixels follow Gaussian distributions and estimate parameters of the distribution using the generated samples (N images). In this way, we obtain the occurrence distribution for each pixel. During steganography, one image is randomly selected as the cover from the generated images. The steganographic modification probability is set to the normalized occurrence probability of the corresponding pixel, and the volatility cost can be obtained using the Flipping Lemma. Then, the volatility cost ρv is combined with the existing cost ρb to generate a multi-perspective consideration of the steganographic cost ρc. Finally, stego images can be further obtained using steganographic embedding algorithms [13],[14]. The details are illustrated in Algorithm 1.

Graphic: Proposed steganography framework based on the volatility of generative models, where $\mathbf {X}$X represents the cover image, $\mathbf {\overline{X}}$X¯ represents the mean image, and $\mathbf {s^{2}}$s2 represents the variance image (the whiter the area, the larger the variance). The output of this framework is the combined cost $\boldsymbol{\rho}^{\boldsymbol{c}}$ρc. Stego images can be further obtained using steganographic embedding algorithms [13],[14], which is omitted in the figure.

Fig. 4. Proposed steganography framework based on the volatility of generative models, where X represents the cover image, X¯ represents the mean image, and s2 represents the variance image (the whiter the area, the larger the variance). The output of this framework is the combined cost ρc. Stego images can be further obtained using steganographic embedding algorithms [13],[14], which is omitted in the figure.

Algorithm 1:   The Proposed Steganographic Algorithm Based on the Volatility of Generative Models.

 Input: The generative model G(), the value of input parameter val, the basic distortion function Db(), the number of generated images N, messages to be embedded m, and the steganographic embedding algorithm Emb()[13],[14].
 Output: A stego image Y.
 Fix the random seed of the generative model.
 Generate N images by fine-tuning controllable parameters: Xn=G(val+δn),valδn,n=1,2,,N.
 Select cover: XXran, where ranrandint(1,N).
 Calculate the mean map X¯ and variance map s2 by (10) and (11).
 Assume Gaussian distribution: P(xij)N(μij,σij2), where μij and σij are estimated by (13).
 Compute the occurrence probabilities of P(xij), P(xij+1), P(xij1) by (14), where xijX.
 Set steganographic modification probabilities equal to the normalized occurrence probabilities of the corresponding pixel, as shown in (15).
 Obtain volatility cost ρv by (18).
 Calculate the basic cost: ρb=Db(X).
 Compute the combined cost ρc by (19).
 Generate the stego image: Y=Emb(ρc,X,m).

IV.   Experiments

This section presents experimental results and analysis to demonstrate the feasibility and effectiveness of the proposed method.

A. Experimental Setting

1) The Image Generative Model and Dataset Generation

We select Stable Diffusion [36] as the image generative model due to its outstanding ability to generate high-quality images. Stable Diffusion is a text-to-image model that generates semantically related images based on the provided prompt. A demonstration of Stable Diffusion is shown in Fig. 5.

Graphic: Demonstration of Stable Diffusion. As depicted in (a) and (b), Stable Diffusion can generate corresponding images based on the “prompt”. Altering the random seed results in different content, as demonstrated by the contrast between (a) and (c). Stable Diffusion also offers parameters for users to fine-tune images. By fine-tuning the image-text similarity (denoted as $sim$sim), an image (d) resembling (a) with slight differences is generated. The residual between (a) and (d) is presented in (e), where brighter areas indicate larger residuals. The residuals primarily appear in object contours and areas with complex textures.

Fig. 5. Demonstration of Stable Diffusion. As depicted in (a) and (b), Stable Diffusion can generate corresponding images based on the “prompt”. Altering the random seed results in different content, as demonstrated by the contrast between (a) and (c). Stable Diffusion also offers parameters for users to fine-tune images. By fine-tuning the image-text similarity (denoted as sim), an image (d) resembling (a) with slight differences is generated. The residual between (a) and (d) is presented in (e), where brighter areas indicate larger residuals. The residuals primarily appear in object contours and areas with complex textures.

To evaluate the security of the proposed method, we generate a database using Stable Diffusion. We employ 1,000 categories of ImageNet [40] as prompts, and by varying the random seeds, we generate images with different content but matching prompts. For each prompt, we sequentially set the seed values from 1 to 10 to generate a total of 10,000 images. The images generated by Stable Diffusion are in a color spatial format. The color image steganography algorithm is built on the basis of grayscale images. This study investigates grayscale image steganography by extracting the G-channel from the generated color image and saving it in the “pgm” format. The resulting 10,000 spatial grayscale images, each of 512×512 pixels, comprise the cover database. In addition, to explore the pixel distribution of the generated images, we fine-tune the image-text similarity (denoted as sim, depicted in Fig. 5(a) and (d)), generating 30 additional images with subtle differences for each cover image. The specific adjustment method is as follows: simt=7.5000+0.0001×t, where t=14,13,,14,15. In total, 300,000 (10,000×30) images are generated.

2) Basic Steganographic Algorithms

As detailed in Section III-C, the proposed steganographic cost needs to be integrated with existing steganographic algorithms to comprehensively consider steganography security. To evaluate the effectiveness of our proposed method, we combine it with two basic steganographic algorithms: SUNIWARD [6] and HILL [7]. These two classic spatial steganographic algorithms adhere to the texture complexity principle, assigning low costs to pixels in textured areas and high costs in smooth areas. Payloads are set as {0.1,0.2,0.3,0.4,0.5} bits per pixel (bpp), and embedding is performed by simulated embedding [3].

3) Security Evaluation Metric

In our scenario, we make a stringent assumption that the attacker possesses a certain number of cover-stego pairs generated by the proposed method. Then the attacker can employ steganalysis for detection. Handcrafted feature spatial rich model (SRM [25]) equipped with ensemble classification, CovNet [26], and LWENet [41] are adopted. The detection error rate PE of the testing set is used to evaluate the security of steganographic algorithms: (22)PE=12(PFA+PMD), where PFA and PMD represent the false-alarm (FA) probability and the missed detection (MD) probability, respectively.

In terms of SRM, the ultimate security is qualified by average error rate P¯E, averaged over 10 random 50/50 splits of the database. A larger P¯E signifies stronger security.

The CovNet and LWENet are built on PyTorch [42]. The optimizer “SGD” [43] is used with mini-batches of 32 cover-stego pairs. The database is partitioned into a training set, a validation set, and a testing set (7,000, 500, 2,500 pairs, respectively). As training deep learning-based steganalyzers is time-consuming, we execute only one random division and utilize PE of the testing set as the evaluation metric.

4) The Comparative Method

In existing work, S teganography in the S ame S cene (referred to as SSS [38]) is closest to the setting in this paper. To ensure a fair comparison, the cover used in SSS is identical to the one employed in the proposed method. In SSS, the image nearest to the cover image is selected to provide side information. We choose the reference images whose input parameters are closest to those of the cover. Specifically, images generated with sim=7.5000 are chosen as the cover, while images generated with sim=7.5001 are utilized to provide side information for SSS. The steganographic cost is adjusted by (7). It is worth noting that in [38], SSS is used for JPEG images, and β is related to the quality factor. However, our experimental subject is spatial images. For a fair comparison, we determined the parameter β of the SSS method through steganalysis experiments. SSS demonstrates superior performance when β=0.5. Consequently, β is fixed at 0.5 in the subsequent experiments.

B. Determination of the Hyperparameters

The proposed method involves two hyperparameters: the number of samples N in (9) and the trade-off factor λ in (19). These parameters are determined based on the results of SRM's steganalysis experiments. SUNIWARD is selected as the basic cost, with a payload of 0.5 bpp. To avoid overfitting of hyperparameters, datasets for hyperparameter tuning and datasets for comparative experiments are sourced from different generated image datasets.

Fig. 6 presents the experimental results for hyperparameter determination. The proposed method employs the mean and variance of samples to estimate the distribution. From a statistical standpoint, increasing the number of samples can reduce the estimation bias, which is also indirectly demonstrated by the findings depicted in Fig. 6(a). However, the number of samples cannot be infinitely large due to computational limitations. Moreover, as depicted in Fig. 6(a), security does not improve beyond a sample size of 25. Therefore, we fix N=25 in the subsequent experiments.

Graphic: Determination of hyperparameters $N$N and $\lambda$λ. The average detection error rate $\overline{P}_{\mathrm{E}}$P¯E as a function of hyperparameters $N$N and $\lambda$λ against SRM using SUNIWARD on the generated database.

Fig. 6. Determination of hyperparameters N and λ. The average detection error rate P¯E as a function of hyperparameters N and λ against SRM using SUNIWARD on the generated database.

From Fig. 6(b), it can be observed that the security is highest when λ=0.05. This suggests that the texture complexity principle has a greater impact on the steganographic security performance. Nevertheless, the research presented in this paper remains valuable. Setting λ=0.05 resulted in higher security levels than λ=0, indicating a substantial improvement in the existing method despite a minor volatility cost. In the following experiments, we set λ=0.05.

C. Security Performance

This section presents the security performance of the proposed method, with SUNIWARD and HILL as basic costs. Fig. 7 and Table II display the security performance of the basic costs, the comparative method SSS, and the proposed method against SRM. The proposed method enhances the security of both SUNIWARD and HILL costs, with an improvement range of 3.58% to 8.31%. Although SSS also shows improvements in security, it does not perform as well as our proposed method. SSS relies on the side information provided by a single image, which is not fine-grained and may be potentially inaccurate. We analyze the reasons as follows: Many generative models, such as the Stable Diffusion [36], incorporate sampling procedures. Owing to the inherent randomness of sampling, certain pixels may be sampled from low-probability regions. The side information provided by these pixels will be inaccurate. Notably, SSS, which exclusively relies on a single reference image, is more vulnerable to encountering such scenarios. In contrast, the proposed method uses a batch of images as references. Even if the side information provided by individual images may be inaccurate, when the size of reference images is sufficiently large, the overall distribution is expected to be statistically accurate.

TABLE II Average Detection Errors P¯E of Different Methods With SUNIWARD and HILL Against SRM

Graphic: Curves of SRM's average detection error rate $\overline{P}_{\mathrm{E}}$P¯E w.r.t. payloads ($0.1\!-\!0.5$0.1-0.5 bpp).

Fig. 7. Curves of SRM's average detection error rate P¯E w.r.t. payloads (0.10.5 bpp).

In scenarios with high payloads, the security of the basic steganographic algorithm is relatively weak, so there is greater room for improvement in security performance. Therefore, the advantages of the proposed method become more pronounced.

As illustrated in Figs. 8 and 9 and Tables III and IV, the proposed method improves the security of the baseline costs by 6.68% to 18.76% against CNN-based steganalyzers. The proposed method has a greater impact on the security against CNN-based steganalysis compared to SRM, probably due to the stronger detection ability of CNN-based steganalyzers and the larger room for security performance improvement. In addition, the proposed method is more effective compared to the comparative method, aligning with the conclusion drawn from resisting SRM.

TABLE III Detection Errors PE of Different Methods With SUNIWARD and HILL Against CovNet

TABLE IV Detection Errors PE of Different Methods With SUNIWARD and HILL Against LWENet

Graphic: Curves of CovNet's detection error rate $P_{\mathrm{E}}$PE w.r.t. payloads ($0.1\!-\! 0.5$0.1-0.5 bpp).

Fig. 8. Curves of CovNet's detection error rate PE w.r.t. payloads (0.10.5 bpp).

Graphic: Curves of LWENet's detection error rate $P_{\mathrm{E}}$PE w.r.t. payloads ($0.1\!-\!0.5$0.1-0.5 bpp).

Fig. 9. Curves of LWENet's detection error rate PE w.r.t. payloads (0.10.5 bpp).

D. Ablation Study

1) Cost Combination Methods

In this paper, we combine basic costs and volatility costs using linear sums. We also explore alternative combination methods, such as their product. Experiments reveal that the cost-product combination method is ineffective. When combined with SUNIWARD by cost-product, the average detection error rate at 0.5 bpp against SRM is only 0.0044. Multiplying costs significantly impacts the distribution of costs. Moreover, as shown in (5), the probability of modification is exponentially related to the cost, resulting in a greater degree of change in the probability of modification. Given these findings, we opt to combine costs using linear sums.

2) Negative Costs

The majority of steganographic costs in existing studies exhibit non-negative values. However, as shown in (18), a negative steganographic cost occurs when P(xij+1)>P(xij) or P(xij1)>P(xij), exemplified in Fig. 3(d). Statistical analysis revealed that approximately 3% of the volatility cost is negative, indicating that approximately 3% of the pixels are encouraged to be modified, contrary to common belief. To investigate the role of encouraging modifications further, we conduct an ablation experiment where we set the negative volatility costs to 0. Results show a slight decrease in security, with the average error rate P¯E declining from 0.2758 to 0.2703. This indicates that encouraging modifications in certain pixels can have a positive effect on steganographic security. Image generation involves some random sampling processes, and certain pixels may be randomly sampled to low-probability density areas. By applying steganographic modifications, these pixels can be moved to high-probability density areas. In such cases, encouraging modifications is justified.

E. Steganographic Modification Patterns

Fig. 10 illustrates modification plots resulting from various steganographic algorithms. Fig. 10(a) displays the cover image produced by Stable Diffusion, and Fig. 10(e) presents the corresponding variance map. The remaining plots depict steganographic modification outcomes.

Graphic: Figures of steganographic modifications from different steganographic algorithms. (a) is the cover image, while (e) is the corresponding variance map obtained by (11), where the white areas indicate large variances. The rest of the images are steganographic modification maps (payload = 0.5 bpp), where cyan indicates no modification, and yellow and dark indicate +1 and −1 modifications, respectively. The values in parentheses represent the ratio of modification pixels. Both the comparative method and the proposed method are asymmetrical costs, leading to an increase of modification pixels.

Fig. 10. Figures of steganographic modifications from different steganographic algorithms. (a) is the cover image, while (e) is the corresponding variance map obtained by (11), where the white areas indicate large variances. The rest of the images are steganographic modification maps (payload = 0.5 bpp), where cyan indicates no modification, and yellow and dark indicate +1 and −1 modifications, respectively. The values in parentheses represent the ratio of modification pixels. Both the comparative method and the proposed method are asymmetrical costs, leading to an increase of modification pixels.

The steganographic modification plots demonstrate that our proposed method produces more distinct modifications in the outline of the dog, especially around the face, compared to the baseline and the comparison method. The modifications generated by our method almost precisely outline the dog's face, whereas the other methods yield a blurred shape. Fig. 10(e) illustrates that the generated image has a larger variance at the object contour. This indicates that the generative model has greater uncertainty at the object contour. When steganographic modifications are performed at the contour, they are more likely to be confused with pixel fluctuations arising from model uncertainty, thus making it harder for steganalyzers to differentiate them. In conclusion, our approach encourages steganographic modifications in areas of high uncertainty of generative models, such as object contours.

V.   Conclusion

In this paper, we propose an effective steganographic method for generated images based on the volatility of generative models. The generative models make it convenient for steganographers to generate similar images, which can provide valuable side information for steganography. We model the generated pixels as Gaussian distributions and estimate the parameters of the distributions by the generated images. The steganographic modification probabilities are determined by the relative magnitude of pixel occurrence probabilities, and the volatility cost is ultimately derived using the Flipping Lemma. The volatility cost offers a supplementary perspective for existing costs. We further propose a cost combination approach that combines the volatility cost with existing costs. Experimental results validate that the proposed method surpasses baseline and comparative methods in resisting both feature-based and CNN-based steganalyzers. However, the proposed method is only applicable to spatial images. In the future, we will delve into how to enhance the steganographic security of other forms of generative cover, such as JPEG images and audio.

References


  • [1]A. Cheddad, J. Condell, K. Curran, and P. Mc Kevitt, “Digital image steganography: Survey and analysis of current methods,” Signal Process., vol. 90, no. 3, pp. 727–752, 2010.
  • [2]J. Katz and Y. Lindell, Introduction to Modern Cryptography, Boca Raton, FL, USA: CRC Press, 2020.
  • [3]T. Filler and J. Fridrich, “Gibbs construction in steganography,” IEEE Trans. Inf. Forensics Secur., vol. 5, no. 4, pp. 705–720, Dec.2010.
  • [4]K. Karampidis, E. Kavallieratou, and G. Papadourakis, “A review of image steganalysis techniques for digital forensics,” J. Inf. Secur. Appl., vol. 40, pp. 217–235, 2018.
  • [5]V. Holub and J. J. Fridrich, “Designing steganographic distortion using directional filters,” in Proc. IEEE Int. Workshop Inf. Forensics Secur., Costa Adeje, Tenerife, Spain, 2012, pp. 234–239.
  • [6]V. Holub, J. Fridrich, and T. Denemark, “Universal distortion function for steganography in an arbitrary domain,” EURASIP J. Inf. Secur., vol. 2014, no. 1, pp. 1–13, 2014.
  • [7]B. Li, M. Wang, J. Huang, and X. Li, “A new cost function for spatial image steganography,” in Proc. IEEE Int. Conf. Image Process., 2014, pp. 4206–4210.
  • [8]W. Tang, S. Tan, B. Li, and J. Huang, “Automatic steganographic distortion learning using a generative adversarial network,” IEEE Signal Process. Lett., vol. 24, no. 10, pp. 1547–1551, Oct.2017.
  • [9]J. Yang, D. Ruan, J. Huang, X. Kang, and Y.-Q. Shi, “An embedding cost learning framework using GAN,” IEEE Trans. Inf. Forensics Secur., vol. 15, pp. 839–851, 2019.
  • [10]W. Tang, B. Li, M. Barni, J. Li, and J. Huang, “An automatic cost learning framework for image steganography using deep reinforcement learning,” IEEE Trans. Inf. Forensics Secur., vol. 16, pp. 952–967, 2020.
  • [11]X. Mo, S. Tan, B. Li, and J. Huang, “MCTSteg: A Monte Carlo tree search-based reinforcement learning framework for universal non-additive steganography,” IEEE Trans. Inf. Forensics Secur., vol. 16, pp. 4306–4320, 2021.
  • [12]W. Tang, B. Li, W. Li, Y. Wang, and J. Huang, “Reinforcement learning of non-additive joint steganographic embedding costs with attention mechanism,” Sci. China Inf. Sci., vol. 66, no. 3, 2023, Art. no. 132305.
  • [13]T. Filler, J. Judas, and J. Fridrich, “Minimizing additive distortion in steganography using syndrome-trellis codes,” IEEE Trans. Inf. Forensics Secur., vol. 6, no. 3, pp. 920–935, Sep.2011.
  • [14]W. Li, W. Zhang, L. Li, H. Zhou, and N. Yu, “Designing near-optimal steganographic codes in practice based on polar codes,” IEEE Trans. Commun., vol. 68, no. 7, pp. 3948–3962, Jul.2020.
  • [15]S. Bond-Taylor, A. Leach, Y. Long, and C. G. Willcocks, “Deep generative modelling: A comparative review of vaes, GANs, normalizing flows, energy-based and autoregressive models,” IEEE Trans. Pattern Anal. Mach. Intell., vol. 44, no. 11, pp. 7327–7347, Nov.2022.
  • [16]Discord, 2015. [Online]. Available: https://discord.com
  • [17]K. Yang, K. Chen, W. Zhang, and N. Yu, “Provably secure generative steganography based on autoregressive model,” in Proc. 17th Int. Workshop, Jeju Island, Korea, 2019, pp. 55–68.
  • [18]A. Van Den Oord, N. Kalchbrenner, and K. Kavukcuoglu, “Pixel recurrent neural networks,” in Proc. Int. Conf. Mach. Learn., 2016, pp. 1747–1756.
  • [19]X. Liu, Z. Ma, J. Ma, J. Zhang, G. Schaefer, and H. Fang, “Image disentanglement autoencoder for steganography without embedding,” in Proc. IEEE/CVF Conf. Comput. Vis. Pattern Recognit., 2022, pp. 2303–2312.
  • [20]T. Karras, S. Laine, M. Aittala, J. Hellsten, J. Lehtinen, and T. Aila, “Analyzing and improving the image quality of StyleGAN,” in Proc. IEEE/CVF Conf. Comput. Vis. Pattern Recognit., 2020, pp. 8110–8119.
  • [21]Z. Zhou , “Secret-to-image reversible transformation for generative steganography,” IEEE Trans. Dependable Secure Comput., vol. 20, no. 5, pp. 4118–4134, Sep./Oct.2023.
  • [22]D. P. Kingma and P. Dhariwal, “Glow: Generative flow with invertible 1x1 convolutions,” in Proc. Adv. Neural Inf. Process. Syst., 2018, pp. 10236–10245.
  • [23]Midjourney, 2022. [Online]. Available: https://midjourney.com
  • [24]P. S. Marquis de Laplace, Théorie Analytique Des Probabilités, Rockland, ME, USA: Courcier, 1820.
  • [25]J. Fridrich and J. Kodovsky, “Rich models for steganalysis of digital images,” IEEE Trans. Inf. Forensics Secur., vol. 7, no. 3, pp. 868–882, Jun.2012.
  • [26]X. Deng, B. Chen, W. Luo, and D. Luo, “Fast and effective global covariance pooling network for image steganalysis,” in Proc. ACM Workshop Inf. Hiding Multimedia Secur., 2019, pp. 230–234.
  • [27]D. P. Bertsekas, Constrained Optim. and Lagrange Multiplier Methods, Cambridge, MA, USA: Academic Press, 2014.
  • [28]L. Ruthotto and E. Haber, “An introduction to deep generative modeling,” GAMM-Mitteilungen, vol. 44, no. 2, 2021, Art. no. e202100008.
  • [29]S. Reed, Z. Akata, X. Yan, L. Logeswaran, B. Schiele, and H. Lee, “Generative adversarial text to image synthesis,” in Proc. Int. Conf. Mach. Learn., 2016, pp. 1060–1069.
  • [30]H. Zhang , “StackGAN: Text to photo-realistic image synthesis with stacked generative adversarial networks,” in Proc. IEEE Int. Conf. Comput. Vis., 2017, pp. 5907–5915.
  • [31]H. Zhang , “StackGAN++: Realistic image synthesis with stacked generative adversarial networks,” IEEE Trans. Pattern Anal. Mach. Intell., vol. 41, no. 8, pp. 1947–1962, Aug.2019.
  • [32]M.-H. Guo , “Attention mechanisms in computer vision: A survey,” Comput. Vis. Media, vol. 8, no. 3, pp. 331–368, 2022.
  • [33]T. Xu , “AttnGAN: Fine-grained text to image generation with attentional generative adversarial networks,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit., 2018, pp. 1316–1324.
  • [34]N. Parmar , “Image transformer,” in Proc. Int. Conf. Mach. Learn., 2018, pp. 4055–4064.
  • [35]A. Ramesh , “Zero-shot text-to-image generation,” in Proc. Int. Conf. Mach. Learn., 2021, pp. 8821–8831.
  • [36]R. Rombach, A. Blattmann, D. Lorenz, P. Esser, and B. Ommer, “High-resolution image synthesis with latent diffusion models,” in Proc. IEEE/CVF Conf. Comput. Vis. Pattern Recognit., 2022, pp. 10684–10695.
  • [37]T. Denemark and J. Fridrich, “Side-informed steganography with additive distortion,” in Proc. IEEE Int. Workshop Inf. Forensics Secur., 2015, pp. 1–6.
  • [38]T. Denemark and J. Fridrich, “Steganography with multiple JPEG images of the same scene,” IEEE Trans. Inf. Forensics Secur., vol. 12, no. 10, pp. 2308–2319, Oct.2017.
  • [39]F. J. MasseyJr, “The kolmogorov-smirnov test for goodness of fit,” J. Amer. Statist. Assoc., vol. 46, no. 253, pp. 68–78, 1951.
  • [40]J. Deng, W. Dong, R. Socher, L.-J. Li, K. Li, and L. Fei-Fei, “ImageNet: A large-scale hierarchical image database,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit., 2009, pp. 248–255.
  • [41]S. Weng, M. Chen, L. Yu, and S. Sun, “Lightweight and effective deep image steganalysis network,” IEEE Signal Process. Lett., vol. 29, pp. 1888–1892, 2022.
  • [42]A. Paszke , “Automatic differentiation in pytorch,” NIPS Workshop Autodiff, Long Beach, California, USA, 2017.
  • [43]L. Bottou, “Large-scale machine learning with stochastic gradient descent,” in Proc. 19th Int. Conf. Comput. Statist., Paris France, 2010, pp. 177–186.

Graphic:
Jiansong Zhang received the BS degree from the University of Science and Technology of China (USTC), in 2019. He is currently working toward the graduate degree with the University of Science and Technology of China. His research interests include steganography, steganalaysis, and deep learning.
Graphic:
Kejiang Chen received the BS degree from Shanghai University (SHU), in 2015, and the PhD degree from the University of Science and Technology of China (USTC), in 2020. Currently, he is an associate research fellow with the University of Science and Technology of China. His research interests include information hiding, image processing, and deep learning.
Graphic:
Weixiang Li received the BS degree from Xidian University (XDU), Xi’an, China, in 2016, and the PhD degree from the University of Science and Technology of China (USTC), Hefei, China, in 2021. He is currently a postdoctoral researcher with Shenzhen University (SZU), Shenzhen, China. His research interests include steganography, steganalysis, and multimedia forensics. He was the recipient of the Best Student Paper Award at the 6th ACM IH&MMSec in 2018.
Graphic:
Weiming Zhang received the MS and PhD degrees from the Zhengzhou Information Science and Technology Institute, China. He is currently a professor with the School of Information Science and Technology, University of Science and Technology of China. His research interests include information hiding and multimedia security.
Graphic:
Nenghai Yu received the BS degree from the Nanjing University of Posts and Telecommunications, in 1987, the ME degree from Tsinghua University, in 1992, and the PhD degree from the University of Science and Technology of China, in 2004, where he is currently a professor. His research interests include multimedia security, multimedia information retrieval, video processing and information hiding.
Like what you’re reading?
Already a member?
Get this article FREE with a new membership!

Related Articles