学理工科的,受现代科技高度分化与深入细节的特性所困,往往满脑子被公式,理论,模型,算法,代码……等塞满,推崇理性,但往往并不感到多少幸福,因为理性并不必然带来幸福,更有可能带来中性的“冷静”,或者偏副面的“冷酷”。

人其实还是偏感性的生物,幸福更是一种主观感受而非客观事物,更多来自于关系和社会,科技带不来这些。

但理工科的许多人,大部分时间都被科技所占用,社交圈子不大,生活方式比较单调,真没多少时间去“感悟”和“体会”人生与社会万象。

无法行万里路,见百般人,但可以读万卷书。

我觉得吧,应该多看书,种类要多,还可以看一些优秀的影视文学作品,尤其是那些反映现实而不是构建幻境的优秀作品,还可以多听多看一些智者及过来人的观点和感悟。

在这信息丰富的时代,找到这些其实并不困难。

想办法拥有一个通透的心境,其实比许多外在的东西,比如名利之类,价值更高,更容易让人感到平和与幸福。

1.WaterPool: A Watermark Mitigating Trade-offs among Imperceptibility, Efficacy and Robustness(WaterPool:在不可感知性、有效性和稳健性之间权衡利弊的水印缓释器)

随着大型语言模型(LLM)在日常生活中的使用越来越多,人们开始关注其潜在的滥用和社会影响。有人提出了水印技术,通过在生成的文本中注入模式来追踪特定模型的使用情况。理想的水印应该产生与原始 LLM 几乎没有区别的输出(不可感知性),同时确保高检测率(有效性),即使文本被部分修改(鲁棒性)。尽管已经提出了很多方法,但没有一种方法能同时实现这三个特性,这就暴露了内在的权衡问题。本文利用以密钥为中心的方案,将水印分解为两个不同的模块:密钥模块和标记模块,从而统一了现有的水印技术。通过这种分解,我们首次证明了密钥模块在很大程度上导致了先前方法中出现的权衡问题。具体来说,这反映了生成过程中密钥采样空间的规模与检测过程中密钥恢复的复杂性之间的冲突。为此,我们引入了 \textbf{WaterPool},这是一种简单而有效的密钥模块,它保留了不可感知性所需的完整密钥采样空间,同时利用基于语义的搜索来改进密钥还原过程。WaterPool 可以作为插件与大多数水印集成。我们用三种著名的水印技术进行的实验表明,WaterPool显著提高了它们的性能,达到了接近最佳的不可感知性,并明显提高了功效和鲁棒性(KGW提高了12.73%,EXP提高了20.27%,ITS提高了7.27%)。

With the increasing use of large language models (LLMs) in daily life, concerns have emerged regarding their potential misuse and societal impact. Watermarking is proposed to trace the usage of specific models by injecting patterns into their generated texts. An ideal watermark should produce outputs that are nearly indistinguishable from those of the original LLM (imperceptibility), while ensuring a high detection rate (efficacy), even when the text is partially altered (robustness). Despite many methods having been proposed, none have simultaneously achieved all three properties, revealing an inherent trade-off. This paper utilizes a key-centered scheme to unify existing watermarking techniques by decomposing a watermark into two distinct modules: a key module and a mark module. Through this decomposition, we demonstrate for the first time that the key module significantly contributes to the trade-off issues observed in prior methods. Specifically, this reflects the conflict between the scale of the key sampling space during generation and the complexity of key restoration during detection. To this end, we introduce \textbf{WaterPool}, a simple yet effective key module that preserves a complete key sampling space required by imperceptibility while utilizing semantics-based search to improve the key restoration process. WaterPool can integrate with most watermarks, acting as a plug-in. Our experiments with three well-known watermarking techniques show that WaterPool significantly enhances their performance, achieving near-optimal imperceptibility and markedly improving efficacy and robustness (+12.73\% for KGW, +20.27\% for EXP, +7.27\% for ITS).

2.Enhancing Watermarked Language Models to Identify Users(增强水印语言模型以识别用户)

零位水印语言模型生成的文本与底层模型的文本无法区分,但可以通过密钥检测出是机器生成的。但是,仅仅把人工智能生成的垃圾邮件检测为带水印的垃圾邮件可能无法防止未来的滥用。如果我们能额外追踪到垃圾邮件发送者的 API 标记,就能切断他们对模型的访问。
我们引入了多用户水印,这样就可以追踪到个人或串通用户群体的模型生成文本。我们从不可检测的零位水印方案中构建了多用户水印方案。重要的是,我们的方案同时提供了零位和多用户保证:既能像原始方案一样检测到较短的片段,又能追踪到个人的较长摘录。同时,我们还给出了将长信息嵌入生成文本的水印方案的通用结构。
我们的方案是语言模型水印方案之间的首次黑箱还原。黑盒还原的一个主要挑战是缺乏统一的鲁棒性抽象–即标记文本在编辑后仍能被检测到。现有的工作基于对语言模型输出和用户编辑的定制要求,给出了不可比拟的鲁棒性保证。我们引入了一个新的抽象概念–AEB-鲁棒性–来克服这一挑战。AEB-robustness 规定,只要编辑的文本 “接近模型生成输出的足够块”,水印就能被检测到。指定稳健性条件相当于定义近似、足够和区块。利用我们新的抽象方法,我们将我们构建的鲁棒性特性与底层零位方案的鲁棒性特性联系起来。之前的研究只能保证针对单个提示生成的单个文本的鲁棒性,而我们的方案对自适应提示–一种更强的对抗模型–具有鲁棒性。

A zero-bit watermarked language model produces text that is indistinguishable from that of the underlying model, but which can be detected as machine-generated using a secret key. But merely detecting AI-generated spam, say, as watermarked may not prevent future abuses. If we could additionally trace the text to a spammer’s API token, we could then cut off their access to the model.
We introduce multi-user watermarks, which allow tracing model-generated text to individuals or to groups of colluding users. We construct multi-user watermarking schemes from undetectable zero-bit watermarking schemes. Importantly, our schemes provide both zero-bit and multi-user assurances at the same time: detecting shorter snippets as well as the original scheme and tracing longer excerpts to individuals. Along the way, we give a generic construction of a watermarking scheme that embeds long messages into generated text.
Ours are the first black-box reductions between watermarking schemes for language models. A major challenge for black-box reductions is the lack of a unified abstraction for robustness — that marked text is detectable after edits. Existing works give incomparable robustness guarantees, based on bespoke requirements on the language model’s outputs and the users’ edits. We introduce a new abstraction — AEB-robustness — to overcome this challenge. AEB-robustness provides that the watermark is detectable whenever the edited text “approximates enough blocks” of model-generated output. Specifying the robustness condition amounts to defining approximates, enough, and blocks. Using our new abstraction, we relate the robustness properties of our constructions to that of the underlying zero-bit scheme. Whereas prior works only guarantee robustness for a single text generated in response to a single prompt, our schemes are robust against adaptive prompting, a stronger adversarial model.

3.MarkLLM: An Open-Source Toolkit for LLM Watermarking(MarkLLM:用于 LLM 水印的开源工具包)

LLM 水印在模型输出中嵌入了不易察觉但可通过算法检测的信号,以识别 LLM 生成的文本,这对于减少大型语言模型的潜在滥用已变得至关重要。然而,大量的 LLM 水印算法、复杂的机制以及复杂的评估程序和视角给研究人员和社区带来了挑战,使他们难以轻松地尝试、理解和评估最新进展。为了解决这些问题,我们推出了用于 LLM 水印的开源工具包 MarkLLM。MarkLLM 为实现 LLM 水印算法提供了一个统一且可扩展的框架,同时还提供了用户友好的界面,以确保访问的便捷性。此外,它还支持这些算法底层机制的自动可视化,从而增强了对这些算法的理解。在评估方面,MarkLLM 提供了由 12 种工具组成的综合套件,涵盖三个方面,以及两种类型的自动评估管道。通过 MarkLLM,我们旨在为研究人员提供支持,同时提高公众对 LLM 水印技术的理解和参与,促进共识,推动研究和应用的进一步发展。我们的代码可在此 https URL 获取。

LLM watermarking, which embeds imperceptible yet algorithmically detectable signals in model outputs to identify LLM-generated text, has become crucial in mitigating the potential misuse of large language models. However, the abundance of LLM watermarking algorithms, their intricate mechanisms, and the complex evaluation procedures and perspectives pose challenges for researchers and the community to easily experiment with, understand, and assess the latest advancements. To address these issues, we introduce MarkLLM, an open-source toolkit for LLM watermarking. MarkLLM offers a unified and extensible framework for implementing LLM watermarking algorithms, while providing user-friendly interfaces to ensure ease of access. Furthermore, it enhances understanding by supporting automatic visualization of the underlying mechanisms of these algorithms. For evaluation, MarkLLM offers a comprehensive suite of 12 tools spanning three perspectives, along with two types of automated evaluation pipelines. Through MarkLLM, we aim to support researchers while improving the comprehension and involvement of the general public in LLM watermarking technology, fostering consensus and driving further advancements in research and application. Our code is available at this https URL.

4.Stylometric Watermarks for Large Language Models(大型语言模型的风格计量水印)

大型语言模型(LLM)的快速发展使得区分人类和机器撰写的文本变得越来越困难。为此,我们提出了一种新颖的水印生成方法,在生成过程中策略性地改变标记概率。与以往的方法不同,这种方法独特地采用了文体测量等语言特征。具体来说,我们在 LLM 中引入了 acrostica 和 sensorimotor 规范。此外,这些特征由一个密钥参数化,该密钥每句话都会更新。为了计算这个密钥,我们使用了语义 “零镜头 “分类,从而增强了复原能力。在我们的评估中,我们发现对于三个或更多句子,我们的方法实现了 0.02 的假阳性和假阴性率。对于循环翻译攻击,我们观察到七个或更多句子的类似结果。这项研究对于促进问责制和防止社会危害的专有 LLM 尤为重要。

The rapid advancement of large language models (LLMs) has made it increasingly difficult to distinguish between text written by humans and machines. Addressing this, we propose a novel method for generating watermarks that strategically alters token probabilities during generation. Unlike previous works, this method uniquely employs linguistic features such as stylometry. Concretely, we introduce acrostica and sensorimotor norms to LLMs. Further, these features are parameterized by a key, which is updated every sentence. To compute this key, we use semantic zero shot classification, which enhances resilience. In our evaluation, we find that for three or more sentences, our method achieves a false positive and false negative rate of 0.02. For the case of a cyclic translation attack, we observe similar results for seven or more sentences. This research is of particular of interest for proprietary LLMs to facilitate accountability and prevent societal harm.

5.Explanation as a Watermark: Towards Harmless and Multi-bit Model Ownership Verification via Watermarking Feature Attribution(作为水印的解释:通过水印特征归属实现无害和多位模型所有权验证)

所有权验证是目前保护模型版权最关键和最广泛采用的事后方法。一般来说,模型所有者利用这种方法,通过检查某个可疑的第三方模型是否具有从其发布的模型 “继承 “的特定属性,来识别该模型是否是从他们那里窃取的。目前,基于后门的模型水印是在已发布模型中植入此类属性的最主要和最先进的方法。然而,基于后门的方法有两个致命缺点,包括有害性和模糊性。前者是指在发布的水印模型中引入恶意可控的误分类行为(即后门)。后者表示恶意用户可以通过寻找其他错误分类样本轻松通过验证,从而导致所有权模糊。
在本文中,我们认为这两种限制都源于现有水印方案的 “零位 “性质,即利用预测的状态(即误分类)进行验证。基于这种认识,我们设计了一种新的水印范例,即 “解释即水印”(EaaW),它将验证行为植入特征归属的解释中,而不是模型预测中。具体来说,EaaW 将 “多位 “水印嵌入特定触发样本的特征归因解释中,而不改变原始预测。受可解释人工智能的启发,我们相应地设计了水印嵌入和提取算法。特别是,我们的方法可用于不同的任务(如图像分类和文本生成)。广泛的实验验证了我们的 EaaW 的有效性和无害性,以及对潜在攻击的抵御能力。

Ownership verification is currently the most critical and widely adopted post-hoc method to safeguard model copyright. In general, model owners exploit it to identify whether a given suspicious third-party model is stolen from them by examining whether it has particular properties `inherited’ from their released models. Currently, backdoor-based model watermarks are the primary and cutting-edge methods to implant such properties in the released models. However, backdoor-based methods have two fatal drawbacks, including harmfulness and ambiguity. The former indicates that they introduce maliciously controllable misclassification behaviors (i.e., backdoor) to the watermarked released models. The latter denotes that malicious users can easily pass the verification by finding other misclassified samples, leading to ownership ambiguity. 
In this paper, we argue that both limitations stem from the `zero-bit’ nature of existing watermarking schemes, where they exploit the status (i.e., misclassified) of predictions for verification. Motivated by this understanding, we design a new watermarking paradigm, i.e., Explanation as a Watermark (EaaW), that implants verification behaviors into the explanation of feature attribution instead of model predictions. Specifically, EaaW embeds a `multi-bit’ watermark into the feature attribution explanation of specific trigger samples without changing the original prediction. We correspondingly design the watermark embedding and extraction algorithms inspired by explainable artificial intelligence. In particular, our approach can be used for different tasks (e.g., image classification and text generation). Extensive experiments verify the effectiveness and harmlessness of our EaaW and its resistance to potential attacks.

6.WateRF: Robust Watermarks in Radiance Fields for Protection of Copyrights(WateRF:用于保护版权的鲁棒辐射场水印)

神经辐射场(NeRF)研究的进展为不同领域提供了广泛的应用,但对其版权保护的研究尚未深入。最近,NeRF 水印被认为是安全部署基于 NeRF 的三维表示的关键解决方案之一。然而,现有方法仅适用于隐式或显式 NeRF 表示法。在这项工作中,我们引入了一种创新的水印方法,可同时用于两种 NeRF 表示法。这是通过微调 NeRF,在渲染过程中嵌入二进制信息来实现的。具体来说,我们建议利用 NeRF 空间中的离散小波变换进行水印处理。此外,我们还采用了一种延迟反向传播技术,并引入了一种与片段式损耗相结合的方法,以最小的权衡来提高渲染质量和比特精度。我们从三个不同方面评估了我们的方法:二维渲染图像中嵌入水印的容量、隐蔽性和鲁棒性。与同类先进方法相比,我们的方法以更快的训练速度实现了最先进的性能。

The advances in the Neural Radiance Fields (NeRF) research offer extensive applications in diverse domains, but protecting their copyrights has not yet been researched in depth. Recently, NeRF watermarking has been considered one of the pivotal solutions for safely deploying NeRF-based 3D representations. However, existing methods are designed to apply only to implicit or explicit NeRF representations. In this work, we introduce an innovative watermarking method that can be employed in both representations of NeRF. This is achieved by fine-tuning NeRF to embed binary messages in the rendering process. In detail, we propose utilizing the discrete wavelet transform in the NeRF space for watermarking. Furthermore, we adopt a deferred back-propagation technique and introduce a combination with the patch-wise loss to improve rendering quality and bit accuracy with minimum trade-offs. We evaluate our method in three different aspects: capacity, invisibility, and robustness of the embedded watermarks in the 2D-rendered images. Our method achieves state-of-the-art performance with faster training speed over the compared state-of-the-art methods.

7.Are Watermarks Bugs for Deepfake Detectors? Rethinking Proactive Forensics(水印是 Deepfake 检测器的漏洞吗?反思主动取证)

人工智能生成的内容加速了媒体合成的话题,尤其是 Deepfake,它可以出于积极或恶意的目的操纵我们的肖像。在发布这些具有威胁性的人脸图像之前,一个很有前景的取证解决方案是注入稳健的水印来追踪其来源。然而,我们认为,当前的水印模型最初是为真实图像设计的,但直接应用于伪造图像时,可能会损害已部署的 Deepfake 检测器,因为水印容易与用于检测的伪造信号重叠。为了弥补这一缺陷,我们代表主动取证技术提出了 AdvMark,以充分利用被动检测器的对抗性弱点。具体来说,AdvMark 是一种即插即用的程序,可将任何稳健水印微调为对抗性水印,以提高水印图像的取证可探测性;同时,水印仍可提取用于来源跟踪。广泛的实验证明了所提出的 AdvMark 的有效性,它可以利用稳健水印来欺骗 Deepfake 检测器,这有助于提高下游 Deepfake 检测的准确性,而无需调整现场检测器。我们相信,这项工作将为针对 Deepfake 的无害主动取证带来一些启示。

AI-generated content has accelerated the topic of media synthesis, particularly Deepfake, which can manipulate our portraits for positive or malicious purposes. Before releasing these threatening face images, one promising forensics solution is the injection of robust watermarks to track their own provenance. However, we argue that current watermarking models, originally devised for genuine images, may harm the deployed Deepfake detectors when directly applied to forged images, since the watermarks are prone to overlap with the forgery signals used for detection. To bridge this gap, we thus propose AdvMark, on behalf of proactive forensics, to exploit the adversarial vulnerability of passive detectors for good. Specifically, AdvMark serves as a plug-and-play procedure for fine-tuning any robust watermarking into adversarial watermarking, to enhance the forensic detectability of watermarked images; meanwhile, the watermarks can still be extracted for provenance tracking. Extensive experiments demonstrate the effectiveness of the proposed AdvMark, leveraging robust watermarking to fool Deepfake detectors, which can help improve the accuracy of downstream Deepfake detection without tuning the in-the-wild detectors. We believe this work will shed some light on the harmless proactive forensics against Deepfake.

8.CodeIP: A Grammar-Guided Multi-Bit Watermark for Large Language Models of Code(CodeIP:语法引导的大型语言代码模型多位水印)

随着大语言模型(LLM)越来越多地用于自动生成代码,人们往往希望知道代码是否是人工智能生成的,以及是由哪个模型生成的,尤其是为了保护工业领域的知识产权(IP)和防止教育领域的学术不端行为。在机器生成的内容中加入水印是提供代码出处的一种方法,但现有的解决方案仅限于单个比特或缺乏灵活性。我们介绍的 CodeIP 是一种新的水印技术,适用于基于 LLM 的代码生成。CodeIP 能够插入多位信息,同时保留生成代码的语义,提高内嵌水印的强度和多样性。这是通过训练类型预测器来预测下一个标记的后续语法类型,从而提高生成代码的语法和语义正确性。在真实世界数据集上对五种编程语言进行的实验证明了 CodeIP 的有效性。

As Large Language Models (LLMs) are increasingly used to automate code generation, it is often desired to know if the code is AI-generated and by which model, especially for purposes like protecting intellectual property (IP) in industry and preventing academic misconduct in education. Incorporating watermarks into machine-generated content is one way to provide code provenance, but existing solutions are restricted to a single bit or lack flexibility. We present CodeIP, a new watermarking technique for LLM-based code generation. CodeIP enables the insertion of multi-bit information while preserving the semantics of the generated code, improving the strength and diversity of the inerseted watermark. This is achieved by training a type predictor to predict the subsequent grammar type of the next token to enhance the syntactical and semantic correctness of the generated code. Experiments on a real-world dataset across five programming languages showcase the effectiveness of CodeIP.

9.Deep Learning-based Text-in-Image Watermarking(基于深度学习的图文并茂水印技术)

在这项工作中,我们介绍了一种基于深度学习的图像中文本水印新方法,这是一种在图像中嵌入和提取文本信息以增强数据安全性和完整性的方法。利用深度学习的能力,特别是通过使用基于变换器的架构进行文本处理和使用视觉变换器进行图像特征提取,我们的方法在该领域树立了新的标杆。所提出的方法是深度学习在图像文本水印中的首次应用,它提高了适应性,使模型能够根据特定图像特征和新出现的威胁进行智能调整。通过测试和评估,与传统的水印技术相比,我们的方法表现出了卓越的鲁棒性,实现了增强的不可感知性,确保水印在各种图像内容中都不会被检测到。

In this work, we introduce a novel deep learning-based approach to text-in-image watermarking, a method that embeds and extracts textual information within images to enhance data security and integrity. Leveraging the capabilities of deep learning, specifically through the use of Transformer-based architectures for text processing and Vision Transformers for image feature extraction, our method sets new benchmarks in the domain. The proposed method represents the first application of deep learning in text-in-image watermarking that improves adaptivity, allowing the model to intelligently adjust to specific image characteristics and emerging threats. Through testing and evaluation, our method has demonstrated superior robustness compared to traditional watermarking techniques, achieving enhanced imperceptibility that ensures the watermark remains undetectable across various image contents.

10.Topic-based Watermarks for LLM-Generated Text(基于主题的 LLM 生成文本水印)

大型语言模型(LLMs)的最新进展已经产生了与人工生成文本相当的不可分辨文本输出。水印算法是一种潜在的工具,通过在 LLM 生成的输出中嵌入可检测的签名,提供了一种区分 LLM 和人工生成文本的方法。然而,目前的水印方案对已知的针对水印算法的攻击缺乏鲁棒性。此外,考虑到 LLM 每天会生成数以万计的文本输出,而水印算法需要记住其生成的每个输出才能进行检测,因此这些方案并不实用。在这项工作中,针对当前水印方案的局限性,我们提出了针对 LLM 的 “基于主题的水印算法 “概念。建议的算法根据输入提示或非水印 LLM 输出的提取主题来决定如何为水印 LLM 输出生成标记。受以前工作的启发,我们建议使用一对列表(根据指定的提取主题生成),在生成 LLM 的水印输出时指定包含或排除某些标记。利用所提出的水印算法,我们展示了水印检测算法的实用性。此外,我们还讨论了针对 LLM 的水印算法可能出现的各种攻击,以及建议的水印方案对潜在攻击者进行建模的可行性(考虑其得失)的益处。

Recent advancements of large language models (LLMs) have resulted in indistinguishable text outputs comparable to human-generated text. Watermarking algorithms are potential tools that offer a way to differentiate between LLM- and human-generated text by embedding detectable signatures within LLM-generated output. However, current watermarking schemes lack robustness against known attacks against watermarking algorithms. In addition, they are impractical considering an LLM generates tens of thousands of text outputs per day and the watermarking algorithm needs to memorize each output it generates for the detection to work. In this work, focusing on the limitations of current watermarking schemes, we propose the concept of a “topic-based watermarking algorithm” for LLMs. The proposed algorithm determines how to generate tokens for the watermarked LLM output based on extracted topics of an input prompt or the output of a non-watermarked LLM. Inspired from previous work, we propose using a pair of lists (that are generated based on the specified extracted topic(s)) that specify certain tokens to be included or excluded while generating the watermarked output of the LLM. Using the proposed watermarking algorithm, we show the practicality of a watermark detection algorithm. Furthermore, we discuss a wide range of attacks that can emerge against watermarking algorithms for LLMs and the benefit of the proposed watermarking scheme for the feasibility of modeling a potential attacker considering its benefit vs. loss.

11.Bypassing LLM Watermarks with Color-Aware Substitutions(用色彩感知替代法绕过 LLM 水印)

有人提出了水印方法,以识别正在流传的文本是人类生成的还是大语言模型(LLM)生成的。Kirchenbauer 等人(2023a)提出的最先进的水印策略使大语言模型偏向于生成特定(”绿色”)标记。然而,确定这种水印方法的鲁棒性是一个未决问题。现有的攻击方法无法躲避较长文本片段的检测。我们克服了这一局限,提出了{/em Self Color Testing-based Substitution (SCTS)},这是第一种 “颜色感知 “攻击。SCTS 通过策略性地提示水印 LLM 并比较输出标记频率来获取颜色信息。它利用这些信息确定标记的颜色,并用非绿色标记替换绿色标记。在我们的实验中,与相关工作相比,SCTS 使用更少的编辑次数成功地躲避了水印检测。此外,我们还从理论和经验两方面证明,SCTS 可以去除任意长的水印文本的水印。

Watermarking approaches are proposed to identify if text being circulated is human or large language model (LLM) generated. The state-of-the-art watermarking strategy of Kirchenbauer et al. (2023a) biases the LLM to generate specific (“green”) tokens. However, determining the robustness of this watermarking method is an open problem. Existing attack methods fail to evade detection for longer text segments. We overcome this limitation, and propose {\em Self Color Testing-based Substitution (SCTS)}, the first “color-aware” attack. SCTS obtains color information by strategically prompting the watermarked LLM and comparing output tokens frequencies. It uses this information to determine token colors, and substitutes green tokens with non-green ones. In our experiments, SCTS successfully evades watermark detection using fewer number of edits than related work. Additionally, we show both theoretically and empirically that SCTS can remove the watermark for arbitrarily long watermarked text.

12.An Entropy-based Text Watermarking Detection Method(基于熵的文本水印检测方法)

目前,针对大型语言模型(LLM)的文本水印算法可以在 LLM 生成的文本中嵌入隐藏特征,以方便后续检测,从而缓解 LLM 被滥用的问题。虽然目前的文本水印算法在大多数高熵场景下表现良好,但在低熵场景下的性能仍有待提高。在这项工作中,我们提出在水印检测过程中应充分考虑标记熵的影响,即在水印检测过程中应根据标记熵调整每个标记的权重,而不是像以前的方法那样将所有标记的权重设置为相同的值。具体来说,我们提出了一种基于熵的水印检测(EWD)方法,即在水印检测过程中赋予熵值越高的标记越大的影响权重,从而更好地反映水印的程度。此外,所提出的检测过程无需训练,完全自动化。在实验中,我们发现我们的方法可以在低熵场景中实现更好的检测性能,而且我们的方法还具有通用性,可以应用于不同熵分布的文本。我们的代码和数据将在网上公布。

Currently, text watermarking algorithms for large language models (LLMs) can embed hidden features to texts generated by LLMs to facilitate subsequent detection, thus alleviating the problem of misuse of LLMs. Although the current text watermarking algorithms perform well in most high-entropy scenarios, its performance in low-entropy scenarios still needs to be improved. In this work, we proposed that the influence of token entropy should be fully considered in the watermark detection process, that is, the weight of each token during watermark detection should be adjusted according to its entropy, rather than setting the weights of all tokens to the same value as in previous methods. Specifically, we proposed an Entropy-based Watermark Detection (EWD) that gives higher-entropy tokens higher influence weights during watermark detection, so as to better reflect the degree of watermarking. Furthermore, the proposed detection process is training-free and fully automated. In the experiment, we found that our method can achieve better detection performance in low-entropy scenarios, and our method is also general and can be applied to texts with different entropy distributions. Our code and data will be available online.

13.Duwak: Dual Watermarks in Large Language Models(Duwak: 大语言模型中的双重水印)

随着大型语言模型(LLM)越来越多地用于文本生成任务,对其使用进行审核、管理其应用并减少其潜在危害至关重要。现有的水印技术在嵌入单一的人类无法感知和机器可检测的模式方面非常有效,而且不会对生成文本的质量和语义产生重大影响。然而,检测水印的效率,即需要最少多少标记才能确保检测的重要性和对后期编辑的鲁棒性,仍有待商榷。在本文中,我们提出了 Duwak 方案,通过在标记概率分布和采样方案中嵌入双重秘密模式,从根本上提高水印的效率和质量。为了减轻因偏向某些令牌而导致的表达能力下降,我们设计了一种对比搜索来对采样方案进行水印,从而最大限度地减少令牌重复,提高多样性。我们从理论上解释了 Duwak 中两种水印的相互依存关系。我们在 Llama2 上广泛评估了各种编辑后攻击下的 Duwak,并与四种最先进的水印技术及其组合进行了对比。我们的结果表明,Duwak 标记的文本以最低的检测所需标记数达到了最高的水印文本质量,比现有方法最多可减少 70% 的标记数,尤其是在后期仿写的情况下。

As large language models (LLM) are increasingly used for text generation tasks, it is critical to audit their usages, govern their applications, and mitigate their potential harms. Existing watermark techniques are shown effective in embedding single human-imperceptible and machine-detectable patterns without significantly affecting generated text quality and semantics. However, the efficiency in detecting watermarks, i.e., the minimum number of tokens required to assert detection with significance and robustness against post-editing, is still debatable. In this paper, we propose, Duwak, to fundamentally enhance the efficiency and quality of watermarking by embedding dual secret patterns in both token probability distribution and sampling schemes. To mitigate expression degradation caused by biasing toward certain tokens, we design a contrastive search to watermark the sampling scheme, which minimizes the token repetition and enhances the diversity. We theoretically explain the interdependency of the two watermarks within Duwak. We evaluate Duwak extensively on Llama2 under various post-editing attacks, against four state-of-the-art watermarking techniques and combinations of them. Our results show that Duwak marked text achieves the highest watermarked text quality at the lowest required token count for detection, up to 70% tokens less than existing approaches, especially under post paraphrasing.

14.Optimizing watermarks for large language models(优化大型语言模型的水印)

随着大型语言模型(LLM)的兴起以及对潜在滥用的担忧,生成式 LLM 的水印最近引起了广泛关注。此类水印的一个重要方面是其可识别性和对生成文本质量的影响之间的权衡。本文介绍了一种针对多目标优化问题进行权衡的系统方法。对于一大类稳健、高效的水印,相关的帕累托最优解被识别并显示出优于当前默认水印。

With the rise of large language models (LLMs) and concerns about potential misuse, watermarks for generative LLMs have recently attracted much attention. An important aspect of such watermarks is the trade-off between their identifiability and their impact on the quality of the generated text. This paper introduces a systematic approach to this trade-off in terms of a multi-objective optimization problem. For a large class of robust, efficient watermarks, the associated Pareto optimal solutions are identified and shown to outperform the currently default watermark.

15.Cross-Attention Watermarking of Large Language Models(大型语言模型的交叉注意力水印)

提出了一种新的语言模型语言水印方法,其中信息不知不觉地插入到输出文本中,同时保留其可读性和原始含义。交叉注意力机制用于在推理过程中在文本中嵌入水印。提出了两种使用交叉注意力的方法,可以最大限度地减少水印对预训练模型性能的影响。对优化水印的不同训练策略的探索以及在现实场景中应用这种方法的挑战和影响阐明了水印鲁棒性和文本质量之间的权衡。水印选择很大程度上影响高熵句子的生成输出。这种主动的水印方法在未来的模型开发中具有潜在的应用。

A new approach to linguistic watermarking of language models is presented in which information is imperceptibly inserted into the output text while preserving its readability and original meaning. A cross-attention mechanism is used to embed watermarks in the text during inference. Two methods using cross-attention are presented that minimize the effect of watermarking on the performance of a pretrained model. Exploration of different training strategies for optimizing the watermarking and of the challenges and implications of applying this approach in real-world scenarios clarified the tradeoff between watermark robustness and text quality. Watermark selection substantially affects the generated output for high entropy sentences. This proactive watermarking approach has potential application in future model development.

16.Can Watermarks Survive Translation? On the Cross-lingual Consistency of Text Watermark for Large Language Models(水印可以翻译吗?大语言模型文本水印跨语言一致性研究)

文本水印技术旨在标记和识别大型语言模型 (LLM) 生成的内容,以防止滥用。在本研究中,我们在文本水印中引入了“跨语言一致性”的概念,它评估文本水印在翻译成其他语言后保持其有效性的能力。两个法学硕士和三种水印方法的初步实证结果表明,当前的文本水印技术在文本翻译成各种语言时缺乏一致性。基于这一观察,我们提出了一种跨语言水印去除攻击(CWRA)来绕过水印,首先从主语言的法学硕士那里获得响应,然后将其翻译成目标语言。 CWRA 可以通过将曲线下面积 (AUC) 从 0.95 降低到 0.67 来有效去除水印,而不会造成性能损失。此外,我们分析了有助于文本水印跨语言一致性的两个关键因素,并提出了一种防御方法,将 CWRA 下的 AUC 从 0.67 提高到 0.88。

Text watermarking technology aims to tag and identify content produced by large language models (LLMs) to prevent misuse. In this study, we introduce the concept of ”cross-lingual consistency” in text watermarking, which assesses the ability of text watermarks to maintain their effectiveness after being translated into other languages. Preliminary empirical results from two LLMs and three watermarking methods reveal that current text watermarking technologies lack consistency when texts are translated into various languages. Based on this observation, we propose a Cross-lingual Watermark Removal Attack (CWRA) to bypass watermarking by first obtaining a response from an LLM in a pivot language, which is then translated into the target language. CWRA can effectively remove watermarks by reducing the Area Under the Curve (AUC) from 0.95 to 0.67 without performance loss. Furthermore, we analyze two key factors that contribute to the cross-lingual consistency in text watermarking and propose a defense method that increases the AUC from 0.67 to 0.88 under CWRA.

17.WaterMax: breaking the LLM watermark detectability-robustness-quality trade-off(WaterMax:打破LLM水印可检测性-稳健性-质量的权衡)

水印是阻止大型语言模型的不当使用的一种技术手段。本文提出了一种新颖的水印方案,即所谓的 WaterMax,该方案具有高可检测性,同时保持原始 LLM 生成文本的质量。其新设计使法学硕士保持不变(未修改权重、逻辑、温度或采样技术)。 WaterMax 平衡了鲁棒性和复杂性,这与文献中本质上引起质量和鲁棒性之间权衡的水印技术相反。其性能已得到理论证明和实验验证。在最完整的基准测试套件下,它的性能优于所有 SotA 技术。

Watermarking is a technical means to dissuade malfeasant usage of Large Language Models. This paper proposes a novel watermarking scheme, so-called WaterMax, that enjoys high detectability while sustaining the quality of the generated text of the original LLM. Its new design leaves the LLM untouched (no modification of the weights, logits, temperature, or sampling technique). WaterMax balances robustness and complexity contrary to the watermarking techniques of the literature inherently provoking a trade-off between quality and robustness. Its performance is both theoretically proven and experimentally validated. It outperforms all the SotA techniques under the most complete benchmark suite.

18.Lost in Overlap: Exploring Watermark Collision in LLMs(迷失在重叠中:探索大语言模型中的水印碰撞)

大型语言模型(LLM)在内容生成中的激增引起了人们对文本版权的担忧。水印方法,特别是基于逻辑的方法,将难以察觉的标识符嵌入文本中以应对这些挑战。然而,水印在不同的法学硕士中的广泛使用导致了在问答和释义等常见任务中不可避免的问题,称为水印冲突。本研究重点关注双水印碰撞,即同一文本中同时存在两个水印。研究表明,水印冲突对上游和下游水印算法的检测器的检测性能构成威胁。

The proliferation of large language models (LLMs) in generating content raises concerns about text copyright. Watermarking methods, particularly logit-based approaches, embed imperceptible identifiers into text to address these challenges. However, the widespread use of watermarking across diverse LLMs has led to an inevitable issue known as watermark collision during common tasks like question answering and paraphrasing. This study focuses on dual watermark collisions, where two watermarks are present simultaneously in the same text. The research demonstrates that watermark collision poses a threat to detection performance for detectors of both upstream and downstream watermark algorithms.

19.Learning to Watermark LLM-generated Text via Reinforcement Learning(学习通过强化学习为大语言模型生成的文本添加水印)

我们研究如何对 LLM 输出添加水印,即将算法可检测信号嵌入到 LLM 生成的文本中以跟踪滥用情况。与当前使用固定 LLM 的主流方法不同,我们通过在水印管道中包含 LLM 调整阶段来扩展水印设计空间。虽然之前的工作主要关注将信号嵌入到输出中的令牌级水印,但我们设计了一种将信号嵌入到 LLM 权重中的模型级水印,并且此类信号可以由配对检测器检测到。我们提出了一种基于强化学习的协同训练框架,该框架迭代地(1)训练检测器以检测生成的带水印文本,以及(2)调整 LLM 以生成检测器易于检测的文本,同时保持其正常实用性。我们的经验表明,我们的水印更加准确、稳健且适应性强(针对新的攻击)。它还允许带水印的模型开源。此外,如果与对齐一起使用,引入的额外开销很低——只需训练额外的奖励模型(即我们的检测器)。我们希望我们的工作能够为研究更广泛的水印设计带来更多的努力,而不仅仅是与固定的法学硕士合作。我们开源了代码:这个 https URL 。

We study how to watermark LLM outputs, i.e. embedding algorithmically detectable signals into LLM-generated text to track misuse. Unlike the current mainstream methods that work with a fixed LLM, we expand the watermark design space by including the LLM tuning stage in the watermark pipeline. While prior works focus on token-level watermark that embeds signals into the output, we design a model-level watermark that embeds signals into the LLM weights, and such signals can be detected by a paired detector. We propose a co-training framework based on reinforcement learning that iteratively (1) trains a detector to detect the generated watermarked text and (2) tunes the LLM to generate text easily detectable by the detector while keeping its normal utility. We empirically show that our watermarks are more accurate, robust, and adaptable (to new attacks). It also allows watermarked model open-sourcing. In addition, if used together with alignment, the extra overhead introduced is low – only training an extra reward model (i.e. our detector). We hope our work can bring more effort into studying a broader watermark design that is not limited to working with a fixed LLM. We open-source the code: this https URL .

20.Provably Robust Multi-bit Watermarking for AI-generated Text via Error Correction Code(通过纠错码为人工智能生成的文本提供可靠的多位水印)

大型语言模型(LLM)因其生成类似于人类语言的文本的卓越能力而被广泛部署。然而,它们可能被犯罪分子滥用来创建欺骗性内容,例如假新闻和网络钓鱼电子邮件,这引发了道德问题。水印是减少 LLM 滥用的一项关键技术,它将水印(例如,位字符串)嵌入到 LLM 生成的文本中。因此,这使得能够检测法学硕士生成的文本以及将生成的文本追踪到特定用户。现有水印技术的主要限制是它们不能准确或有效地从文本中提取水印,特别是当水印是长比特串时。这一关键限制阻碍了它们在实际应用中的部署,例如将生成的文本跟踪到特定用户。
这项工作引入了一种基于 \textbf{纠错码} 的 LLM 生成文本的新颖水印方法,以应对这一挑战。我们提供了强有力的理论分析,证明在有界对抗性单词/令牌编辑(插入、删除和替换)下,我们的方法可以正确提取水印,提供可证明的稳健性保证。我们广泛的实验结果也证明了这一突破。实验表明,我们的方法在基准数据集上的准确性和鲁棒性方面都远远优于现有基线。例如,当将长度为 12 的位串嵌入到由 200 个标记生成的文本中时,我们的方法获得了 98.4% 的令人印象深刻的匹配率,超过了 Yoo 等人的性能。 (最先进的基线)为 85.6%。当受到涉及向生成的 200 个单词的文本注入 50 个标记的复制粘贴攻击时,我们的方法保持了 90.8% 的匹配率,而 Yoo 等人的匹配率则保持在 90.8%。减少到65%以下。

Large Language Models (LLMs) have been widely deployed for their remarkable capability to generate texts resembling human language. However, they could be misused by criminals to create deceptive content, such as fake news and phishing emails, which raises ethical concerns. Watermarking is a key technique to mitigate the misuse of LLMs, which embeds a watermark (e.g., a bit string) into a text generated by a LLM. Consequently, this enables the detection of texts generated by a LLM as well as the tracing of generated texts to a specific user. The major limitation of existing watermark techniques is that they cannot accurately or efficiently extract the watermark from a text, especially when the watermark is a long bit string. This key limitation impedes their deployment for real-world applications, e.g., tracing generated texts to a specific user.
This work introduces a novel watermarking method for LLM-generated text grounded in \textbf{error-correction codes} to address this challenge. We provide strong theoretical analysis, demonstrating that under bounded adversarial word/token edits (insertion, deletion, and substitution), our method can correctly extract watermarks, offering a provable robustness guarantee. This breakthrough is also evidenced by our extensive experimental results. The experiments show that our method substantially outperforms existing baselines in both accuracy and robustness on benchmark datasets. For instance, when embedding a bit string of length 12 into a 200-token generated text, our approach attains an impressive match rate of 98.4%, surpassing the performance of Yoo et al. (state-of-the-art baseline) at 85.6%. When subjected to a copy-paste attack involving the injection of 50 tokens to generated texts with 200 words, our method maintains a substantial match rate of 90.8%, while the match rate of Yoo et al. diminishes to below 65%.

21.Three Bricks to Consolidate Watermarks for Large Language Models(巩固大型语言模型水印的三块砖)

区分生成文本和自然文本的任务越来越具有挑战性。在这种情况下,水印作为一种有前途的技术而出现,用于将生成的文本归因于特定模型。它改变了采样生成过程,以便在生成的输出中留下不可见的痕迹,以便于以后的检测。这项研究基于三个理论和实证考虑整合了大型语言模型的水印。首先,我们引入了新的统计测试,这些测试提供了强大的理论保证,即使在假阳性率较低(低于 10-6)的情况下,这些保证仍然有效。其次,我们使用自然语言处理领域的经典基准来比较水印的有效性,深入了解它们的现实世界适用性。第三,我们为可以访问 LLM 的场景以及多位水印开发了先进的检测方案。

The task of discerning between generated and natural texts is increasingly challenging. In this context, watermarking emerges as a promising technique for ascribing generated text to a specific model. It alters the sampling generation process so as to leave an invisible trace in the generated output, facilitating later detection. This research consolidates watermarks for large language models based on three theoretical and empirical considerations. First, we introduce new statistical tests that offer robust theoretical guarantees which remain valid even at low false-positive rates (less than 10-6). Second, we compare the effectiveness of watermarks using classical benchmarks in the field of natural language processing, gaining insights into their real-world applicability. Third, we develop advanced detection schemes for scenarios where access to the LLM is available, as well as multi-bit watermarking.

22.Watermarking Generative Tabular Data(生成表格数据加水印)

在本文中,我们介绍了一种简单而有效的具有统计保证的表格数据水印机制。我们从理论上证明,所提出的水印可以被有效检测,同时忠实地保持数据保真度,并且还表现出针对加性噪声攻击的有吸引力的鲁棒性。总体思路是通过基于简单数据分箱的策略嵌入来实现水印。具体来说,它将特征的值范围划分为精细分段的区间,并将水印嵌入到选定的“绿名单”区间中。为了检测水印,我们开发了一个原则性的统计假设检验框架,其中的假设最少:只要基础数据存在,它就保持有效。数据分布具有连续的密度函数。通过严格的理论分析和实证验证证明了水印功效,突出了其在增强合成和现实数据集的安全性方面的实用性。

In this paper, we introduce a simple yet effective tabular data watermarking mechanism with statistical guarantees. We show theoretically that the proposed watermark can be effectively detected, while faithfully preserving the data fidelity, and also demonstrates appealing robustness against additive noise attack. The general idea is to achieve the watermarking through a strategic embedding based on simple data binning. Specifically, it divides the feature’s value range into finely segmented intervals and embeds watermarks into selected “green list” intervals. To detect the watermarks, we develop a principled statistical hypothesis-testing framework with minimal assumptions: it remains valid as long as the underlying data distribution has a continuous density function. The watermarking efficacy is demonstrated through rigorous theoretical analysis and empirical validation, highlighting its utility in enhancing the security of synthetic and real-world datasets.

23.A Watermark for Low-entropy and Unbiased Generation in Large Language Models(大型语言模型中低熵和无偏生成的水印)

大型语言模型 (LLM) 的最新进展凸显了滥用的风险,引发了人们对准确检测 LLM 生成的内容的担忧。检测问题的一个可行解决方案是将难以察觉的标识符注入 LLM,称为水印。之前的工作表明,无偏水印通过维持 LLM 输出概率分布的期望来确保不可伪造性并保持文本质量。然而,以前的无偏水印方法对于本地部署来说是不切实际的,因为它们依赖于对白盒LLM的访问和检测期间的输入提示。而且,这些方法无法为水印检测的II类错误提供统计保证。本研究提出了先采样后接受(STA-1)方法,这是一种无偏水印,不需要访问LLM,也不需要在检测过程中进行提示,并且对II类错误有统计保证。此外,我们提出了无偏水印中水印强度和文本质量之间的新颖权衡。我们表明,在低熵场景中,无偏水印面临水印强度和输出不令人满意的风险之间的权衡。低熵和高熵数据集上的实验结果表明,STA-1 实现了与现有无偏水印相当的文本质量和水印强度,并且输出不令人满意的风险较低。本研究的实施代码可在线获取。

Recent advancements in large language models (LLMs) have highlighted the risk of misuse, raising concerns about accurately detecting LLM-generated content. A viable solution for the detection problem is to inject imperceptible identifiers into LLMs, known as watermarks. Previous work demonstrates that unbiased watermarks ensure unforgeability and preserve text quality by maintaining the expectation of the LLM output probability distribution. However, previous unbiased watermarking methods are impractical for local deployment because they rely on accesses to white-box LLMs and input prompts during detection. Moreover, these methods fail to provide statistical guarantees for the type II error of watermark detection. This study proposes the Sampling One Then Accepting (STA-1) method, an unbiased watermark that does not require access to LLMs nor prompts during detection and has statistical guarantees for the type II error. Moreover, we propose a novel tradeoff between watermark strength and text quality in unbiased watermarks. We show that in low-entropy scenarios, unbiased watermarks face a tradeoff between watermark strength and the risk of unsatisfactory outputs. Experimental results on low-entropy and high-entropy datasets demonstrate that STA-1 achieves text quality and watermark strength comparable to existing unbiased watermarks, with a low risk of unsatisfactory outputs. Implementation codes for this study are available online.

这里仅讨论相同长度的两个单词之间的最小编辑距离。(两个不同长度的单词之间的最小编辑距离待探索)

对于一个字符来说,插入操作的cost为1,删除操作的cost为1,替换操作的cost为2(等同于先删除再插入)

以stay修改为play为例,我们可以构建如下的二维数组储存实时的最小的cost。

#   # s t a y
# # 0 1 2 3 4
# p 1 2 3 4 5
# l 2 3 4 5 6
# a 3 4 5 4 5
# y 4 5 6 5 4

首先,第一行代表从空插入到某个str1字符串的代价,第一列代表从空一直插入到某个str2字符串的代价,显然这个代价与当前考虑到的字符串长度相同。(从空插入一个长度为n的字符串,代价为n)

其次,状态转移方程为:

  • 插入代价 = 当前步考虑到的两个字符串其中任意一个去掉最后一位的代价 + 1
  • 删除代价 = 当前步考虑到的两个字符串其中任意一个去掉最后一位的代价 + 1
  • 替换代价 = 当前步考虑到的两个字符串分别去掉最后一位 + 2 *(当前步考虑到的两个字符串最后一位是否相同)

根据上述状态转移方程可以写出如下实时cost代码:

#求出动态规划矩阵matrix,并返回最小cost

def min_edit_distance_cost(str1 , str2):
    matrix = []
    length = len(str1)
    #初始化矩阵
    for i in range(length + 1):
        matrix.append([0] * 5)
    for i in range(length + 1):
        matrix[0][i] = i
        matrix[i][0] = i

    for i in range(1 , length + 1):
        for j in range(1 , length + 1):
            insert_cost = matrix[i][j - 1] + 1
            delete_cost = matrix[i - 1][j] + 1
            replace_cost = 0
            if(str1[i - 1] == str2[j - 1]):
                replace_cost = matrix[i - 1][j - 1]
            else:
                replace_cost = matrix[i - 1][j - 1] + 2
            matrix[i][j] = min(insert_cost , delete_cost , replace_cost)
    print(matrix)
    cost = matrix[length][length]
    return cost

安装VSCode

下载地址:https://code.visualstudio.com/

直接download即可,下载好之后打开exe一路使用默认设置下一步即可安装成功

VScode 安装汉化包很简单,打开 VScode,点击安装扩展,在搜索框输入 Chinese,然后点 Install 就可以:

安装Python

下载地址:https://www.python.org/

直接download最新的版本即可

下载好后开始安装,安装的时候务必勾选 Add Python x.x to PATH

安装好之后打开VSCode,点击侧边栏的Manage extensions图标,搜索Python,选择第一个点击install即可。

如果你在使用安装程序安装Python的时候没有勾选 Add Python x.x to PATH ,请

右键此电脑 > 属性 > 高级系统设置 > 高级 > 环境变量将Python的路径添加到Path栏即可。

全部安装好之后,新建一个.py文件,尝试写一个print(‘helloworld’)并运行看看是不是把Python环境配置好了~

安装Anaconda

下载地址:https://www.anaconda.com/

直接点击Download即可。

直接一路Next完成安装即可。

安装完成之后,将Anacoda所安装路径以及路径下的一些文件夹添加到环境变量中的系统变量的Path栏中。

比如我添加的路径如下:

C:\Users\dxf\anaconda3
C:\Users\dxf\anaconda3\Scripts
C:\Users\dxf\anaconda3\Library\mingw-w64\bin
C:\Users\dxf\anaconda3\Library\usr\bin
C:\Users\dxf\anaconda3\Library\bin

配置好环境变量之后,打开控制台cmd,输入conda –version,如图就是conda环境配置好了~

如果显示conda不是内部或外部命令,那么一般就是添加环境变量这一步出了问题。

若conda安装成功,我们可以直接在Win10的搜索框搜索Anaconda,打开如下图所示的软件

经过漫长的自动配置之后,出现了下图所示界面

因为我们已经提前安装好了VSCode了,所以在此可以直接点击Launch打开VSCode,在VSCode中按Ctrl+Shift+P,输入Python Select,选择Python的解释器为Anaconde所在目录的Python解释器,然后就可以开始写代码惹~

最近ai火热,对于出门的需求越来越大,使用公共机场容易被ai提供方识别ip一网打尽。只能付出相当高的费用买更高等级的魔法。而我这里提供我自己正在使用的一种方案,自行购买云服务器,对你使用ai带来便利,而且价格相当良心,19.95欧元买到1核心2g运存,30g存储空间,带宽1gbps,每个月流量2t,经过实际测试裸连速度7-9mb/s,虽然没宣传那么夸张,但是毕竟这是国外的服务器。在经过各种操作后稳定下来的速度在3mb/s之内,速度不快,但是刷个视频啥的妥妥够用。缺点是要有折腾的能力,具备一些基本的计算机知识,但为了一个真正意义上独享的ip这是值得的。

(1)首先你需要访问云服务器提供商v.ps的网站https://vps.hosting/cart/nano-kvm-vps/

目前只有19.95欧元的了,9.9已经卖断货了。
系统这里选择centos7就可以了

在选择好后点击order如果没注册会要求你注册,注册之后会提示你订单没有付款,你就选择支付宝付款就好了,这一步可能有点繁琐,因为你的邮箱会有大量的注册邮件,你需要找到其中一个验证邮件激活你的账号,但总之细心一点就能做完。

(2)下载ssh工具连接到我们的服务器,我推荐Xhell,中文界面,对新手友好。在下载的空闲时间你可以打开cmd,输入ping+空格+你购买服务器的ip地址

ipaddress是你服务器的IP地址

在cmd工具中测试你的ip通畅后就可以进入下一步了,打开ssh工具新建连接,密码在主机详情页的password,点击show就可以查看。

(3)安装宝塔面板,宝塔面板部署后易用性会大大增强,我们配置防火墙和编辑配置文件都会轻松很多,具体安装方法进入宝塔面板官网查看,有一键安装的命令,在xshell中粘贴命令为shift+insert。

4)成功进行到这一步之后就完成了一半,剩下的步骤无法在这里说明,请参考https://www.4spaces.org/780.html    或者https://www.wxiou.cn/index.php/archives/54/

另外这两部文章介绍的方法会要求你有个域名,其实不用域名也可以,但注册一个域名会更保险,防止你的服务器ip被墙。此文章主要是介绍购买服务器。因为这家厂商的云服务器一个月之前全都有,现在只剩法兰克福了,所以时间仓促我赶紧写文希望有需要的朋友购买。后续具体的教程我会更新在博客中。

@45°夹角的考研力

1.只出现一次的数字

class Solution {
public:
    int singleNumber(vector<int>& nums) {
        int result = nums[0];
        for(int i = 1; i < nums.size() ;i++){
            result = result ^ nums[i];
        }
        return result;
    }
};

2.多数元素

class Solution {
public:
    int majorityElement(vector<int>& nums) {
        //出现次数超过一半的数字一定是出现最多的数字
        map<int , int>mp;
        int max = 0;//出现次数
        int result = 0;//出现次数最多的数字
        for(int i = 0; i < nums.size() ; i++){
            if(mp.find(nums[i]) != mp.end()){//map中找得到该数字
                mp[nums[i]] = ++mp[nums[i]];
                if(mp[nums[i]] > max){
                    max = mp[nums[i]];
                    result = nums[i];
                }
            }
            else{//map中找不到该数字
                mp[nums[i]] = 1;
                if(1 > max){
                    max = 1;
                    result = nums[i];
                }
            }
        }
        return result;
    }
};

3.搜索二维矩阵 II

class Solution {
public:
    bool findTarget(vector<vector<int>>& matrix , int i , int j , int target){
        if(matrix[i][j] == INT_MIN){//递归可能会超时,需要将递归经过的地方设置为INT_MIN,之后递归再次遇到该值直接跳过;
            return false;
        }
        else if(matrix[i][j] == target){
            return true;
        }
        else if(matrix[i][j] > target){
            return false;
        }
        else{
            matrix[i][j] = INT_MIN;
        }

        if(i == matrix.size()-1 && j == matrix[i].size()-1){//最后一行最后一列
            return false;
        }
        else if(i == matrix.size()-1){//最后一行
            return findTarget(matrix , i , j+1 , target);
        }
        else if(j == matrix[i].size()-1){//最后一列
            return findTarget(matrix , i+1 , j , target);
        }
        if(matrix[i][j] < target && matrix[i+1][j] > target && matrix[i][j+1] > target){
            return false;
        }
        return findTarget(matrix , i+1 , j , target) || findTarget(matrix , i , j+1 , target);
    }

    bool searchMatrix(vector<vector<int>>& matrix, int target) {
        return findTarget(matrix , 0 , 0 , target);    
    }
};

4.合并两个有序数组

class Solution {
public:
    void merge(vector<int>& nums1, int m, vector<int>& nums2, int n) {
        vector<int> nums3;
        int p1 = 0;
        int p2 = 0;
        while(p1 < m && p2 < n){
            if(nums1[p1] <= nums2[p2]){
                nums3.push_back(nums1[p1]);
                p1++;
            }
            else{
                nums3.push_back(nums2[p2]);
                p2++;
            }
        }
        if(p1 == m){
            for(int i = p2 ; i < n ; i++){
                nums3.push_back(nums2[i]);
            }
        }
        else if(p2 == n){
            for(int i = p1 ; i < m ; i++){
                nums3.push_back(nums1[i]);
            }
        }
        nums1.resize(0);
        for(int i = 0; i < m+n ;i++){
            nums1.push_back(nums3[i]);
        }
    }
};

5.验证回文串

class Solution {
public:
    bool isPalindrome(string s) {
        //大写转换为小写
        for(int i = 0; i < s.length() ;i++){
            if(s[i] >= 'A' && s[i] <= 'Z'){
                s[i] = s[i] + 32;
            }
        }
        int i = 0;
        int j = s.length() - 1;
        while(i < j){
            //筛选掉非数字和字母的部分
            while(!((s[i] >= 'a' && s[i] <= 'z') || (s[i] >= '0' && s[i] <= '9'))){
                i++;
                if(i >= j){
                    return true;
                }
            }
            while(!((s[j] >= 'a' && s[j] <= 'z') || (s[j] >= '0' && s[j] <= '9'))){
                j--;
                if(i >= j){
                    return true;
                }
            }
            if(s[i] != s[j]){
                return false;
            }
            else{
                i++;
                j--;
            }
        }
        return true;
    }
};

6.有效的字母异位词

class Solution {
public:
    bool isAnagram(string s, string t) {
        int flag[26];
        memset(flag , 0 , 26*sizeof(int));
        for(int i = 0; i < s.length() ; i++){
            flag[s[i]-'a']++;
        }
        for(int i = 0; i < t.length(); i++){
            flag[t[i]-'a']--;
        }
        for(int i = 0; i < 26 ;i++){
            if(flag[i] != 0){
                return false;
            }
        }
        return true;
    }
};

7.字符串中的第一个唯一字符

class Solution {
public:
    int firstUniqChar(string s) {
        int flag[26];
        memset(flag , 0 , 26*sizeof(int));
        for(int i = 0; i < s.length() ; i++){
            flag[s[i]-'a']++;
        }
        for(int i = 0; i < s.length() ; i++){
            if(flag[s[i]-'a'] == 1){
                return i;
            }
        }
        return -1;
    }
};

8.反转字符串

class Solution {
public:
    void swap(char &s1 , char &s2){
        char tmp = s1;
        s1 = s2;
        s2 = tmp;
    }

    void reverseString(vector<char>& s) {
        int i = 0;
        int j = s.size() - 1;
        while(i < j){
            swap(s[i] , s[j]);
            i++;
            j--;
        }
    }
};

为什么我们总是被说一到晚上就活的矫情?
大家都有一段黑暗的日子,就是会突然觉得活的好累,生活处处不如意,没有人理解你,你也找不到一个可以说说话的人,大家都很忙,哪有人会关心你的情绪呢?
撑了很久很久的坚强一下子就崩溃了。
夜深人静就要去睡觉啊!瞎想什么呢。
可是,失落的人,心里有一个圈,里面住满了羊,数了一遍又一遍,我知道羊很多很多,可是弄丢一只,也会很难过的。
我只是在等我的羊回家。
也只有在晚上,你突然发现你活的才像自己,你不必讨好任何人,有的人追剧,有的人吃夜宵,有的人胡思乱想,只有这一刻,活的安静而热闹,活的痛快淋漓,成年人的衣柜里,谁还没几幅面具呢,闹钟一响,就开始从柜子里挑,今天扮演哪一种开心呢,这幅努力向上的青年面相挺好,要不,就这幅吧!
没有人的时候,摘了面具,你活的多自在啊!
可是,有一天,你也害怕,害怕有那么一刻,站在镜子面前,怎么都摘不掉面具,翻遍了所有的衣柜,找不到自己那副真实的面孔了。
以前我看一个短片,里面有句话挺震撼,说,你还能活两万来天,去做点有意义的事儿吧。就像两万块钱,去买点自己喜欢的东西吧。时间和钱一样,一旦用起来,很快就消耗完了。
小时候,时间好像是一天一天的过,再大一点,一个星期一个星期的过,好像过了二十几岁,你开始上班,时间是一个月一个月的过,你回头看父母,他们的时间是一年一年的过,所以啊,还是多盯着自己吧,去做点想做的事儿,别总是被“过来人”带偏了方向,哪有那么多的人生大道理,说到底,都是各活各的,图一个开心而已。
唉。(ó﹏ò。)

我在夏天凌晨的街道,看见一个四十多岁的男人蹲坐在马路上,脚下一堆空的酒瓶子七零八落散落在地上。我走过去看了看他,他竟招呼我坐下,我摇了摇头,从他身边走过,他的眼神没落没有丝毫挽留,觉得我的离开他意料之中。可向来喜欢听故事的我,鬼事神差的又去买了2瓶冰啤酒,坐在了他身边!

他看着我,眼神里突然出现了泪水,模糊着他的视线,我都没有看清他的样子,只是看到满含热泪的他眼神模糊而沧桑。

也可能是凌晨总会让人变得多愁善感,也可能是酒精让人能获得短暂放松,我竟开始和他喝了起来。

“为什么还不回家啊?”

“你呢?”中年男子沙哑的声音反问着我。

“我从公司离职跑去考研了,出来溜达一下透透气。”未预料到他问我,只能猝不及防的如实回答。

人与人之间,很久没有如此的信任了,我参加工作以后,总觉得和人尤其是陌生人之间交流,所说的每一句话都要经过深思熟虑,对每一句进到耳朵里的别人的话都要猜测许久。为的是体面,丢了真诚。

“真好啊,还能离职读书,一定是财务自由了吧?”

“呵呵,什么财务自由,都是谋生路,为了生活。你还没回答我问题呢?”我好奇的说道。

“我没有家,回什么家!”

“你没有结婚吗?”

“结了,离了!”中年男子,此时面容变得冷峻。

“你没工作吗?”我接着问道。

“我要有工作就不会离婚了,我二十三岁大学毕业,那时候和你现在一样,觉得生活对于我来说,是无尽的希望和机遇,大学培养了我正直的性格,可到了社会发现,我的正直可笑而不自量力,我本来在学校工作,可是看不惯领导的虚假,对待学生如同对待挣钱的工具,我愤怒的在一次会议上骂了他一个小时!”说到这,中年男子停顿了一下,眼神仿佛重新年轻,眼角的皱纹都没了。

“他还调戏女老师,在每年他组织的聚餐后总会找女老师去唱歌,我把他所有的丑事在全体师生会议上说了!”我禁不住佩服他的勇气和正直。然后问到“后来呢?”

“后来,没有后来了,哦,有!我被辞退了。”

“为什么?”我不解的问道。

“没有为什么,要说有,就是我的大学坑了我,它教会了我正直,却忘了教我审时度势和推卸。”

“那再后来呢?”我继续追问。

“再后来,我找了份临时工,一个月三四千的工资,父母觉得我到岁数了,就让我赶紧结婚。我也想着,结了婚就是完成任务了,我娶的那个媳妇很好,人家没有图我任何东西,也不嫌弃我的工作,我们过的很快乐。”

“那为什么离婚!”

“有了孩子以后,我的工作并没有太多起色,毕竟人的性格很难改变,我懒得去讨好领导,也无意去卑躬屈膝,可生活的压力慢慢压弯了我的腰,就这么,矛盾越来越多,终于在结婚的第五年,我们离婚了。”

“明白了,没有钱!”我低声说道。

“钱,说来也是大学坑了我,它教会了我理想,却忘了教我钱是理想的前提,没有钱,就没有一切!”

“我什么都没有!”中年男子望着天空满脸泪水,喃喃说道!

“这个社会到处都是欲望的奴隶,而我这一辈子都是理想的宠儿,不同流合污,就不该存在这个社会!”

我摇了摇头,起身走向远方,黑色的夜空下霓虹灯闪烁,突然踩过一片水洼惊醒了我,回头看看刚刚与中年男子交流的地方,地上却只有2个散落的酒瓶。

整个说来,我是惶恐不安的,不知道该往哪儿去。面对社会和传统的价值定位,曾经,他们说,你要好好学习,专心读书。我说,好。他们说,你要好好工作,为人处世,我说,行。他们说,你该买房买车,结婚生子了。可我想,为什么呢?我的角色没法转换得这么快啊!更为关键的是,为什么我必须成家立业,凡此种种?似乎,我并没有一定要这样做的理由。因为别人都这样?所以我的人生,竟也只能这样了吗?现在整天只能盘算着下一顿吃什么,以后又盘算着孩子孙子喜欢吃什么,好提前做了等着他们来享用。如此,糊口养家,日复一日,苟且余生?活着只是为了活下去、等死,那为什么还要活着呢?

我不想变得如此,只是无根之木,随波逐流,就这么不明不白的活着,再莫名其妙地死去。不想按部就班,在不同的阶段,成为不同的人。这样的我不就因此而永远成了分裂的了吗?我的一生不应该是统一的吗?就没有个一直持续的标准吗?物质、名利都将能够获取、也会离我而去,我想要把握那永恒的东西,我活着的意义。

上帝用7天创造了世界,而我花了700天,希望知道自己能成为什么样的人。

从苏格拉底到弗洛伊德;

从牛顿到爱因斯坦;

从语言学到计算机科学;

从基督教到儒释道;

我竭力搜寻着我活着的意义。

生而为人,我很抱歉。

总觉得自己快要死了,是害怕吗?还是期待?害怕,是怕死还是怕痛?恐怕是怕痛多一些,被斩首的人,他会怕痛吗?
我肯定是怕痛的,死却不怕。
因为有来世啊,或者修得好,就能免去轮回之苦。
得道就能成仙吗?或者来世做个火星人?总之在一个干净,善良的环境里,继续做一些事情。
我竟没想到,我这辈子的终极目标竟是成仙?我可笑吗?不,为什么不可以 ?
我能悟至此,已然得到点化了呀。
凡尘事,有什么好留恋?只不过是一座桥,一撑船,一阶石梯,一踩脚蹬。
我每想到此,又深觉自己太过无情。
难道我是如此不念今生恩情吗!
今生,我的今生,也就只是我的,他人今生也只事关他人。
你我之隔,虽隔两字,但却隔山隔水,隔三千世界隔万里浮云隔亿兆光年。又有什么来往呢?我修我今世,无关他人,只关风月,只是你我有缘,多少有情相牵。论及父母子女,论及朋友同学,论及过客生人,兴许都是如此。
若是羽化飞仙,定能蜕去这皮囊。皮囊带来色欲声相,带来疾病痛苦,带来一切一切…若来世山水之间化作一阵清风,一阵细雨,没有力量缠身,只是思绪,那岂非逍遥游?
可如何才能修身至此?怕是要无欲无心。
可我有欲成仙,又怎能说自己无欲?
是了,庄子所谓“犹有所待者也”即使如此。
若无所待,方是入境。
那如何才能无所待?
无所待,无所待,不怕痛,不往死,不贪生。
不澎湃于内心,不伤于孤寂。
心如悬浮,不沉不浮,不急不妒。
仿佛只在呼吸。
逍遥如何往?

距死亡很近的时候
会发现自己的生活和活着的人类越发偏离
像是冥界和现世间的河道
虽然在一个世界
却感到次元分离的恍惚
世界线间的交集愈来愈少
在这个世界上再无归属感
迷离的,不真切的感觉会蒙蔽内心
自觉划出与生者的界限
自发的走向死亡

(想象了一下临近死亡的感觉,特指身边人都不在了的时候,被世界抛弃的感觉)