LLM Contribution Policy: Attributing AI In Code & PR Checklist

by Editorial Team 63 views
Iklan Headers

Hey everyone, let's talk about something super important when we're coding: making sure we properly attribute any help we get from Large Language Models (LLMs) and stick to our licensing rules. This is all about keeping things transparent, legally sound, and giving credit where it's due. So, here's the lowdown on how we're going to make sure every new or tweaked file in our project gets the right headers and attribution.

The Lowdown: Why We're Doing This

So, why the big fuss about LLM attribution and license headers, right? Well, it's all about playing by the rules and giving props where they're deserved. Our project's policy says that if an AI, like an LLM, lends a hand in creating or modifying a file, we gotta give it a shout-out. Plus, we need to make sure every file has the right license header. It’s like, imagine you got help with your homework – you’d give credit to your friend, right? Same vibe here. This ensures that everyone knows who helped, and that we're all on the same page legally. This is more than just a formality, guys; it's about being upfront about the origins of our code and making sure we're following the law. It’s about building trust within our team and with anyone who uses our code.

This isn't just a random requirement, either. It’s a core part of our commitment to open-source principles. When we're open about the tools we use, including AI, we make our project more trustworthy and easier for others to understand. Think of it this way: clear attribution helps other developers understand how a file was created. They can then evaluate and use it more effectively. Plus, good documentation and clear licensing are just good practice. So, whether you're a coding newbie or a seasoned pro, this policy applies to everyone. It's about maintaining consistency across our project. It ensures that everyone is following the same standards. Therefore, we avoid confusion down the line and keep everything above board. So, let’s get into the specifics of how we’re going to get this done.

Diving into the Details: What Needs to Happen

Alright, let’s get into the nitty-gritty of what this means for you, the awesome coder. Whenever you're working on new or updated source files, here's what you need to do. First off, you gotta slap on the SPDX license header. This is a standard way to declare the license that governs how the code can be used. Think of it as the code's official permission slip. Then, right after that, you'll need to include the LLM attribution line. It's a simple statement that lets everyone know that AI had a hand in the file's creation or modification. It also includes a reminder to review it for correctness and security. This is to make sure we've done our due diligence. The attribution line looks something like this: “This file was created or modified with the assistance of an AI (Large Language Model). Review for correctness and security.” This is the AI equivalent of a footnote, acknowledging its contribution.

We are going to make this process super easy for everyone. We're going to update the PR_BODY.md template with a checklist that covers the LLM attribution and validation commands. So, when you make a pull request (PR), you'll have all the info you need right there. This checklist will include things like checking for the correct license header, making sure the LLM attribution is present, and running tests to ensure everything works as expected. The validation commands will be there to make sure everything's shipshape. Things like linting (checking for code style issues), running tests, and formatting the code will all be part of the checklist. We want to be sure that the code is well-formatted, follows our style guidelines, and doesn't introduce any bugs.

The Checklist and Commands: Your Coding Toolkit

Now, let's talk about the super-helpful tools we're going to use to make sure everything's on the up-and-up. The PR_BODY.md template is going to be your best friend. It’s where you'll find the checklist, and, more importantly, the validation commands. These are your coding toolkit. When you're ready to submit a PR, use these commands to make sure your changes are ready to go.

Here's a sneak peek at what you might find in the PR_BODY.md template:

  • Ruff Check & Format: ruff check . && ruff format . – This is your first line of defense. Ruff checks your code for style issues and automatically formats it to meet our standards. It's like having a coding assistant who keeps things tidy.
  • Mypy: mypy . – This command checks for type errors in your Python code. It's a lifesaver, helping you catch potential bugs before they even happen.
  • Pytest: pytest tests/ -v --cov=. --cov-report=term-missing – Run tests using pytest. This command runs your tests and gives you detailed coverage reports. This ensures that your code works and that you haven't broken anything else.
  • Flake8: flake8 . --count --select=E9,F63,F7,F82 --show-source --statistics – This checks for style issues and potential errors in your code, providing a detailed analysis.
  • Pylint: pylint . --output-format=text – It analyzes your Python code for errors, style issues, and potential problems. It gives a detailed report, helping you improve your code quality.
  • Make Native (if applicable): make native 2>&1 | tee native_build.txt – This command builds your code, and if there are any issues, it directs the output to native_build.txt file so you can debug them.

These commands are your insurance policy. They will catch style issues, type errors, and bugs before they make it into the main codebase. Using them is part of your responsibility when contributing to the project. Don’t skip them, guys!

Implementation: How to Get It Done

Okay, let’s get practical. Here’s a simple, step-by-step guide to adding the right headers and attribution to your files:

  1. Review the Files: Go through all the new or modified files in your PR. Make sure you haven't missed anything.

  2. Insert Headers: For Python files, put this at the very top of each file:

    # SPDX-License-Identifier: Apache-2.0
    # Copyright YEAR OWNER
    # This file was created or modified with the assistance of an AI (Large Language Model). Review for correctness and security.
    

    Replace YEAR with the current year, and OWNER with the copyright holder’s name. For other languages, use the appropriate syntax for comments. It's just like how you'd normally add comments to your code, but these comments have a specific purpose.

  3. Update PR_BODY.md: Open up the PR_BODY.md file. Add your checklist to the file.

  4. Commit Your Changes: Use the following commands to add the changes, and then commit them:

    git add <files> PR_BODY.md
    git commit -m