Version for Students/Collaborators: Minimal Team Standards
Story Setup: From “Fighting Alone” to “Dragging Each Other Down”
Your research team has three people: you (a PhD student), a junior master’s student, and an undergraduate intern. You are working on the same project and should be collaborating.
But reality looks like this:
Monday Morning Stand-up
You: “I ran a new model over the weekend. The results look good—95% accuracy.”
Junior: “Great! Can you send me the code? I want to keep improving it based on that.”
You: “Uh… the code is on my computer and it’s kind of messy. I’ll organize it and send it to you.” (In fact, you have no idea where to start organizing.)
Intern: “Did you use the data preprocessing I did last week?”
You and the junior: “Uh… which version did you use?”
Intern: “The link I posted in the group chat…” (All three scroll through the chat history and can’t find it.)
Advisor: “What exactly are you three doing? Why are the numbers each of you reports different?”
Wednesday Code Conflicts
Junior: “Senior, I pushed the code. Pull it.”
You pull the code and get:
CONFLICT (content): Merge conflict in src/model.py
CONFLICT (content): Merge conflict in configs/default.yaml
CONFLICT (content): Merge conflict in train.py
You open the code and see that the junior modified almost every file. And many changes are incomprehensible to you—he added a bunch of new parameters without any comments.
You spend two hours resolving conflicts, only to discover in the end: your code was overwritten, and last week’s good results can no longer be reproduced.
Friday Data Disaster
Intern: “Senior, I accidentally deleted the data/ directory. Do you have a backup?”
You: “What?! That directory is 20GB—data I spent three days processing!”
Intern: “I thought it was temporary… Git wasn’t tracking it…”
Neither you nor the junior has a complete backup. You have to re-download the raw data and rerun three days of preprocessing.
A week passes. Not only has the project made no progress—it has actually regressed.
Why “Individually Strong” Does Not Equal “Team Efficient”
The counterintuitive part of teamwork is this: each individual may be highly capable, yet the team’s output is very low.
Three Major Collaboration Traps
Trap 1: Dependence on Tacit Knowledge
Everyone carries a lot of information in their head that “only they know”:
- Why this parameter is set to this value
- Why this code is written this way
- Why this experiment failed
- Why this dataset must be processed in this manner
When collaboration is required, this tacit knowledge becomes a bottleneck—others can only “wait until you have time to explain,” while you are interrupted repeatedly.
Trap 2: Duplicated Work
Without clear division of labor and interfaces, you end up with:
- Two people writing functionally identical code with different implementations
- Two people processing the same data in different ways
- Two people running the same experiment but recording it differently
On the surface it looks like “parallel work,” but in reality it is “wasted compute and time.”
Trap 3: Exploding Integration Costs
Everyone “does well” on their own branch, but when merging you find:
- Incompatible interfaces
- Dependency version conflicts
- Inconsistent configuration methods
- Inconsistent evaluation criteria
In the end, the time spent on “merging and alignment” exceeds the time spent on development.
Root Cause: Lack of “Team Standards”
A personal project can be “anything goes”—after all, only you need to understand it.
But a team project requires explicit standards:
- How code should be written
- How experiments should be recorded
- How changes should be merged
- How issues should be communicated
No standards = everyone uses their own approach = collaboration becomes impossible.
Minimal Standard 1: Coding Standards (From Chaos to Readability)
Naming Conventions: Make Code Self-Explanatory
File Naming
# ❌ Bad naming
test.py
new.py
model2.py
train_final.py
# ✅ Clear naming
train_baseline.py # Train the baseline model
train_with_attention.py # Train the model with attention
evaluate_on_testset.py # Evaluate on the test set
preprocess_raw_data.py # Preprocess raw data
Variable and Function Naming
# ❌ Bad naming
def f(x, y):
z = x + y
return z
# ✅ Clear naming
def compute_weighted_loss(prediction, target, weight):
"""
Compute weighted loss
Args:
prediction: Model predictions (batch_size, num_classes)
target: Ground-truth labels (batch_size,)
weight: Class weights (num_classes,)
Returns:
loss: Weighted cross-entropy loss
"""
raw_loss = cross_entropy(prediction, target)
weighted_loss = raw_loss * weight[target]
return weighted_loss.mean()
Naming principles:
- Use full words; avoid abbreviations (unless they are widely accepted, such as num, max, avg)
- Start function names with verbs (compute, load, save, train, evaluate)
- Use nouns for variable names (model, dataset, config, metrics)
- Prefix boolean variables with is/has/should (is_training, has_attention, should_save)
Commenting Standards: Written for Future You and Your Teammates
Places That Must Be Commented
-
Function docstrings (every function must have one)
def train_one_epoch(model, dataloader, optimizer, device): """ Train for one epoch Args: model: PyTorch model dataloader: Training data loader optimizer: Optimizer device: Device ('cuda' or 'cpu')
Returns: avg_loss: Average loss accuracy: Training accuracy “”“ …
-
Non-obvious logic
# Apply temperature scaling to attention weights to prevent softmax saturation attention_scores = attention_scores / temperature # Use gradient clipping to prevent exploding gradients # The threshold is set to 1.0 based on preliminary experiments torch.nn.utils.clip_grad_norm_(model.parameters(), max_norm=1.0) -
Known issues and TODOs
# TODO: The implementation here is inefficient and needs optimization # Currently O(n^2) complexity; it will be slow when the dataset is large # FIXME: Crashes when batch size = 1 # Temporary workaround: check at the outer level and skip # NOTE: This hyperparameter has a large impact on the results # Be sure to test on a small dataset before making changes
Where you should not add comments
# ❌ Do not comment on the obvious
x = x + 1 # Add 1 to x
# ❌ Do not use comments to "explain bad code"; refactor instead
# This function is complex, does many things—read it slowly...
def complicated_function():
...
# ✅ Refactor into clear functions
def load_data():
...
def preprocess_data(data):
...
def train_model(data):
...
PR template: use it even when working alone
A Pull Request (or Merge Request) template forces you to answer key questions before merging.
Create a PR template
# .github/pull_request_template.md
## Summary of changes
[Describe in one sentence what this PR does]
## Type of change
-
New feature (feature)
-
Bug fix (fix)
-
Refactor (refactor)
-
Documentation (docs)
-
Test (test)
Details
[Explain in detail what changed and why]
How to verify
# Provide verification steps make test python train.py --config configs/test.yamlPotential risks
[What issues might this change introduce?]
Related experiments
-
Run ID: [If relevant, provide run_id]
-
Results comparison: [Metric comparison between the new/old implementations]
Checklist
-
Code passes lint checks
-
Necessary tests added
-
Relevant documentation updated
-
All tests pass
-
Changes do not break existing functionality (regression testing)
Why “use it even when working alone”?
-
Forces you to think about “what exactly does this change do?”
-
Leaves a clear record for future teammates (or future you)
-
Once it becomes a habit, team collaboration naturally becomes standardized
Minimal standard 2: Experiment standards (from verbal to auditable)
Standardize run_id naming
Problem: Everyone names experiments differently, causing confusion.
Solution: The team standardizes a run_id format.
# Team-wide standard format
YYYY-MM-DD_HHMM_<person>_<experiment>
For example:
2026-02-15_1030_zhangsan_baseline
2026-02-15_1400_lisi_attention_ablation
2026-02-16_0900_wangwu_data_augmentation
Benefits:
-
Sortable by time
-
You know whose experiment it is (so you know who to ask when something goes wrong)
-
You know what the experiment is about
Standardize config management
Problem: Everyone uses different config formats, making comparisons impossible.
Solution: The team shares a single base config; individuals write only the differences.
configs/
base.yaml # Team baseline configuration
people/
zhangsan_*.yaml # Zhangsan's personal experiment configs
lisi_*.yaml # Lisi's personal experiment configs
paper/ # "Official" configs related to the paper
baseline.yaml
main_method.yaml
ablation_*.yaml
base.yaml example
# configs/base.yaml
# Team baseline configuration; do not modify casually
# If changes are necessary, they must be discussed in a team meeting
model:
type: "transformer"
hidden_dim: 512
num_layers: 6
data:
path: "/data/shared/project_data_v3"
batch_size: 32
num_workers: 4
training:
epochs: 100
learning_rate: 3e-4
optimizer: "adam"
evaluation:
metric: "accuracy"
eval_every: 1000
Personal config example
# configs/people/zhangsan_attention_test.yaml
# Inherit the baseline configuration
base: ../base.yaml
# Specify only the differences
model:
attention_type: "multi_head" # Change point
num_heads: 8 # New parameter added
Tagging Experiment Metadata
experiment:
owner: "Zhang San"
hypothesis: "Multi-head attention can improve accuracy by 2–3%"
related_runs:
- "2026-02-15_1030_zhangsan_baseline"
Standardize the logging format
Problem: Everyone records experiments differently, making it impossible to aggregate and compare.
Solution: Use standardized run.json and run.md templates (see Chapter 6).
Team conventions:
-
Within 10 minutes after each experiment finishes, you must: - Generate run.json (automatic) - Fill in run.md (manually, 5 lines)
-
Required fields in run.md: - Hypothesis: what this experiment aims to validate - Change: what was changed compared to what - Conclusion: what the result is (one sentence) - Next step: what to do next based on the result - @ Advisor: whether advisor attention is needed
-
If you forget to record: - You will be reminded at the group meeting that week - If you forget twice consecutively: you must backfill all missing records
Minimum Standard 3: Communication Norms (from verbal to traceable)
Weekly meeting norm: only review reproducible runs
Problem: Weekly group meetings turn into a “verbal reporting contest”—claims sound impressive but lack evidence.
Solution: Discuss only experiments with a run_id.
Weekly meeting template
# Weekly Group Meeting (30–45 minutes)
## 1. Round-robin updates (5–10 minutes per person)
Format:
-
Experiments completed this week: [list run_id]
-
Key findings: [based on experimental data, not speculation]
-
Issues encountered: [specific issues, not “not great”]
-
Plan for next week: [specific goals, not “keep tuning hyperparameters”]
❌ Unacceptable updates:
-
“I think this method should work” (no experimental support)
-
“I ran many experiments” (without listing run_id)
-
“The results are okay” (without concrete metrics)
✅ Good example: “I ran three experiments:
- 2026-02-15_1030_zhangsan_baseline: 92.0%
- 2026-02-15_1400_zhangsan_attention: 93.5% (+1.5%)
- 2026-02-16_0900_zhangsan_attention_v2: 93.2% (+1.2%) Conclusion: the attention mechanism is effective, but the improvements introduced in v2 instead reduced performance. Plan for next week: investigate why v2 is worse and try to fix it.“
2. Team decisions (10 minutes)
-
Which experiments should be merged into the mainline?
-
Which directions should be abandoned?
-
Who is responsible for what next week?
3. Debt check (5 minutes)
-
Check whether last week’s TODOs are completed
-
Check whether there are unlogged experiments
-
Check whether there is conflicting code that needs to be merged
4. Next-week assignments (5 minutes)
Clarify each person’s tasks and deliverables:
-
Zhang San: complete ablation experiments (estimated 3 runs)
-
Li Si: improve data augmentation (estimated 2 runs)
-
Wang Wu: organize scripts for paper figures and tables
Asynchronous communication norm: use documents rather than verbal exchanges
Problem: Teammates interrupt you at inappropriate times to ask questions.
Solution: Build a culture of “read the docs first, then ask people.”
Team documentation structure
docs/
README.md # Project overview
SETUP.md # Environment setup guide
WORKFLOW.md # Workflow
FAQ.md # Frequently asked questions
EXPERIMENTS.md # Experiment tracking
DECISIONS.md # Record of important decisions
CONTACTS.md # Who owns what
FAQ.md example
# Frequently Asked Questions
## Q: How do I set up the environment?
See SETUP.md
## Q: Where is the data stored?
`/data/shared/project_data_v3`
Do not modify this directory; it is read-only.
## Q: How do I submit code?
-
Create a branch: git checkout -b exp/your-name-feature
-
Complete changes and test
-
Submit a PR (use the template)
-
Wait for review
-
Delete the branch after merging
Q: What should I do if an experiment crashes?
-
Check outputs/<run_id>/error.log
-
Search the FAQ to see whether there is a similar issue
-
If you cannot find an answer, ask in an issue (do not message directly on WeChat)
Q: How do I reproduce someone else’s experiment?
make reproduce RUN=<run_id>
Q: Can I modify base.yaml?
No. It can be changed only after discussion in a team meeting. For temporary testing, you may create your own config.
Q: My experimental results differ from someone else’s—what should I do?
-
Check whether you used the same config
-
Check whether you used the same data version
-
Check whether you used the same random seed
-
Discuss at the weekly meeting
Issue tracking: make problems and tasks searchable
Problem: People discuss issues in the WeChat group, and three days later no one can find them.
Solution: Important issues must be filed as an issue (GitHub/GitLab/Jira).
Issue template
# Bug Report
**Problem description**: [briefly describe the issue]
**Steps to reproduce**:
-
Run the command:
python train.py --config ... -
Observe: [screenshots or logs]
Expected behavior: [what should happen]
Actual behavior: [what actually happens]
Environment information:
-
Branch: [branch name]
-
Commit: [commit hash]
-
Python: [version]
-
CUDA: [version]
Related run_id: [if any]
# Feature Request
**Request description**: [what functionality you want]
**Use case**: [why it is needed]
**Proposed implementation**: [optional]
# Task
**Task description**: [what needs to be done]
**Acceptance criteria**: [what counts as done]
**Owner**: [@someone]
**Due date**: [YYYY-MM-DD]
**Dependencies**: [does it depend on other tasks?]
Advisor’s perspective: how to help students build good habits
Day 1: Set expectations
Do not assume students will “naturally do it right.” You must explicitly communicate your expectations.
Onboarding checklist
# New Member Onboarding Checklist
Day 1 (Environment Setup)
- Set up the development environment (see SETUP.md)
- Obtain access to the code repository
- Obtain server access
- Obtain access to shared data
- Read README, WORKFLOW, FAQ
Week 1 (Familiarizing Yourself with the Workflow)
- Reproduce an existing experiment (verify the environment setup is correct)
- Run a small experiment (verify your understanding of the workflow)
- Record experiments comprehensively (run.json + run.md)
- Submit your first PR (even if it is very small)
Month 1 (Working Independently)
- Independently complete one exploratory direction
- Proactively identify and solve problems
- Be able to help onboard new members
Regular Reviews: Focus Not Only on Results, but Also on Process
Do not only ask “How are the results?” in weekly meetings; check “Is the process standardized?”
Weekly Code Review Checklist
Checklist:
- Are all experiments from this week fully documented?
- Does the run_id follow the naming convention?
- Have the config files been updated?
- Are commit messages clear?
- Are there improvements that should be merged?
- Is there any technical debt that needs to be addressed?
If issues are found, point them out immediately and require corrections. Do not say “Don’t do it again,” otherwise bad habits will become entrenched.
Reward Good Habits
Explicitly praise what was done well:
-
“Your experiment records this week are very clear; I can understand them at a glance.”
-
“Your PR description is very detailed; the review went smoothly.”
-
“You proactively supplemented the documentation, which helped everyone.”
Make good habits part of the team culture.
From a Student’s Perspective: How to Survive in a Chaotic Project
If the Project Is Already Very Chaotic
You may encounter:
- Your advisor’s code has no documentation
- A senior student’s code does not run
- No one knows where the data is stored
- No one knows how a certain experiment was run
Survival strategies:
Strategy 1: Create a “Clean Zone” for Yourself
# Create your own subproject within a chaotic project
my_workspace/
src/ # Your code (independent of the chaotic parts)
configs/ # Your configurations
outputs/ # Your experiment records
docs/ # Your documentation
README.md # Description of your work
Even if the overall project is messy, at least your part is clear.
Strategy 2: Write Down Your Understanding
# docs/MY_UNDERSTANDING.md
## Project Goals
[What I understand the project goals to be]
## Existing Code
[Key files I found and their roles]
## Known Issues
[Issues I encountered and temporary workarounds]
## My Work
[What I am responsible for and my progress]
This document:
- Helps you clarify your thinking
- Provides a record for future handoffs
- Enables you to align understanding with your advisor
Strategy 3: Proactively Establish Standards
Even if the team has no standards, you can establish standards for yourself:
- All your experiments have a run_id and documentation
- Your code has clear comments
- All your commits have descriptions
Good habits will be noticed and may influence the team.
How to Ask for Help
❌ A poor way to ask for help:
“Senior, my code won’t run—can you take a look?” (no context at all)
✅ A good way to ask for help:
Senior, I ran into an issue while reproducing the baseline experiment. Could you help take a look?
**Issue**: It crashes at epoch 10 with the error CUDA out of memory
**My setup**:
-
Branch: main
-
Commit: a1b2c3d
-
Config: configs/baseline.yaml
-
GPU: V100 16GB
What I tried:
-
Reduce batch size to 16 (still crashes)
-
Reduce model hidden_dim to 256 (it runs, but the results are incorrect)
Logs: see attached error.log
Question: Are there other configurations that need adjustment, or do I need a larger GPU?
Clear help requests receive faster and more useful responses.
Conflict Resolution: Common Team Issues
Issue 1: Inconsistent Code Style
Solution: Enforce consistency with automated tools.
# Install formatter and linter
pip install black flake8 isort
# .pre-commit-config.yaml
repos:
- repo: https://github.com/psf/black
rev: 23.1.0
hooks:
- id: black
- repo: https://github.com/PyCQA/flake8
rev: 6.0.0
hooks:
- id: flake8
# Install the pre-commit hook
pre-commit install
# Now formatting will run automatically before each commit
Issue 2: Experimental Results Do Not Match
Troubleshooting checklist:
- Same code version? (commit hash)
- Same configuration? (config diff)
- Same data? (data hash)
- Same random seed? (seed)
- Same environment? (Python/CUDA versions)
- Same evaluation script? (eval script)
Check item by item; you will always find the cause.
Issue 3: Unclear Responsibilities Lead to Duplicate Work
Solution: Use a Kanban board to visualize tasks.
# You can use GitHub Projects or Trello
Board columns:
-
To Do (TODO)
-
In Progress (In Progress)
-
Waiting for Review (Review)
-
Done (Done)
Each task card:
-
Title: [brief description]
-
Owner: [@someone]
-
Due date: [date]
-
Dependencies: [dependent tasks]
-
Status: [current progress]
Everyone can see “who is doing what,” avoiding duplication.
Issue 4: Misaligned Expectations Between Advisor and Student
Solution: Write expectations down explicitly and realign regularly.
# docs/EXPECTATIONS.md
## Advisor Expectations
-
At least 3 documented experiments per week
-
All code passes linting and tests
-
Attend weekly meetings on time and report progress
-
Report issues within 24 hours
Student Expectations
-
One 1-on-1 Q&A session per week
-
Key decisions discussed in advance
-
Code reviews completed within 48 hours
-
Requirements stabilized one month before the paper deadline
Mutual Commitments
- Communicate honestly and do not conceal problems
- Respect each other’s time
- Share responsibility for the project’s success
Making expectations explicit can prevent many latent conflicts.
10-Minute Action: Establish the Team’s First Standard
If you do only one thing right now: establish the team’s first minimal standard.
- If you are a mentor/project lead (10 minutes)
# Create the team standards document
mkdir docs
cat > docs/TEAM_RULES.md <<EOF
# Minimal Team Standards
## Experiment Logging Standards
- Every experiment must have a run_id: YYYY-MM-DD_HHMM_name_experiment
- Every experiment must have run.json (automatic) and run.md (manual, 5 lines)
## Code Commit Standards
- Commit message format: `<type>`: `<description>`
- type: feat | fix | refactor | docs | test
- Run make test before committing
## Weekly Meeting Standards
- Fixed time: every Monday at 10:00
- Must prepare in advance: list run_id and key findings
- Discuss only content supported by experimental evidence
## Communication Standards
- Open a GitHub issue for important matters (do not discuss only on WeChat)
- Urgent matters may be handled on WeChat, but file an issue afterward
## Effective Date
Effective starting next Monday.
EOF
# Announce and discuss at the next group meeting
- If you are a student/team member (10 minutes)
# Establish a personal working standard
cat > MY_WORKFLOW.md <<EOF
# My Workflow
## After each experiment (5 minutes)
- [ ] Generate run.json
- [ ] Fill in the five elements in run.md
- [ ] Commit relevant code changes
## Every Friday (15 minutes)
- [ ] Organize the list of this week’s experiments
- [ ] Prepare materials for the weekly meeting report
- [ ] Check for any missing records
## Before each code commit (2 minutes)
- [ ] Run make test
- [ ] Write a clear commit message
- [ ] If the change is important, open a PR
## Whenever I encounter a problem
- [ ] Check the FAQ and documentation first
- [ ] If no answer is found, open an issue (describe clearly)
- [ ] After the problem is solved, update the FAQ
EOF
# Start executing from today
- Proposal for the next team meeting (prepare a 5-minute statement)
Suggest saying in the meeting:
“I’ve noticed that our team is somewhat chaotic in experiment logging and code management. I suggest we establish some minimal standards, such as:
- A unified run_id format
- A unified approach to config management
- A weekly check of experiment record completeness
I drafted an initial version (see docs/TEAM_RULES.md), and everyone can discuss and add to it.
I suggest we pilot it starting next week and evaluate the results after one month.“
After completing this 10-minute action, you will find:
- The team has a “starting point”—a starting point for moving from chaos to order
- Everyone has a shared language—knowing what it means to “do it well”
- Subsequent improvements can be iterative—rather than remaining chaotic indefinitely
Chapter Summary: Standards Are Not Constraints, but the Foundation of Efficient Collaboration
The paradox of teamwork:
- No standards: everyone is “free,” but team efficiency is extremely low
- Over-standardization: processes become cumbersome and constrain creativity
- Minimal standards: just enough—ensuring quality while maintaining flexibility
Three levels of minimal standards:
- Coding standards: make code readable, maintainable, and collaborative
- Experiment standards: make experiments traceable, comparable, and reproducible
- Communication standards: make issues searchable, decisions traceable, and responsibilities clear
Remember:
- Standards are not “management” tools, but “collaboration” tools
- Standards are not “restrictions,” but “friction reduction”
- Standards are not “one-time setup,” but “continuous improvement”
Finally: If your team is chaotic right now, do not be discouraged. Start with a minimal standard and improve step by step. Even if you are the only one to begin, good habits will gradually influence the team.
High-performing teams are not “born”; they are built by establishing standards and executing them consistently.