Last reviewed: 2025-01-11 Purpose: Principles for writing high-quality, maintainable tests that verify behavior without creating false confidence.
- Test behavior, not internals - Assert on return values and exceptions, never on
obj._private_attr - Mock at boundaries - Mock HTTP/network calls, not internal methods like
_extract_frames() - Use Given/When/Then - Every test needs these comments; they enforce behavioral thinking
- Prefer real over fake - Use
tmp_pathfor files, real entry points, real pure functions - Capture API contracts - When mocking, verify request structure, not just "was called"
Rule: Verify what the code does, not how it does it. Tests should pass if behavior is correct, even if implementation changes. If the only public surface is a library callback or hook, treat that callback as the boundary and assert on observable effects.
Bad - Testing internal state:
async def test_shutdown_is_idempotent(self):
notifier = MQTTNotifier(config)
await notifier.shutdown()
# BAD: Checking private attribute
assert notifier._shutdown_called is True
assert notifier._connected is FalseGood - Testing observable behavior:
async def test_shutdown_is_idempotent(self):
notifier = MQTTNotifier(config)
await notifier.shutdown()
await notifier.shutdown() # Should not raise
# GOOD: Verify behavior via public interface
result = await notifier.ping()
assert result is False # Shutdown state observable via ping
# GOOD: Verify operations fail as expected
with pytest.raises(RuntimeError, match="shut down"):
await notifier.send(alert)Rule: Mock external dependencies (HTTP, databases) at their boundary, not internal methods. This tests more of your actual code.
Filesystem is a special case: prefer real tmp_path usage when cheap and deterministic.
Bad - Mocking internal methods:
async def test_analyze_video(self):
vlm = OpenAIVLM(config)
# BAD: Mocking internal implementation details
vlm._extract_frames = MagicMock(return_value=[...])
vlm._resize_image = MagicMock(return_value=...)
vlm._call_api = AsyncMock(return_value={"response": "..."})
result = await vlm.analyze(video_path)Good - Mocking at HTTP boundary:
async def test_analyze_video(self, tmp_path: Path, monkeypatch: pytest.MonkeyPatch):
# Create real test video
video_path = tmp_path / "test.mp4"
_create_test_video(video_path, frames=5)
# Mock only at HTTP boundary
mock_response = AsyncMock()
mock_response.status = 200
mock_response.json = AsyncMock(return_value={
"choices": [{"message": {"content": '{"risk_level": "low"}'}}]
})
async_cm = AsyncMock()
async_cm.__aenter__ = AsyncMock(return_value=mock_response)
async_cm.__aexit__ = AsyncMock(return_value=None)
mock_session = MagicMock()
mock_session.post = MagicMock(return_value=async_cm)
mock_session.close = AsyncMock()
# Patch the module under test, not the global aiohttp import
monkeypatch.setattr("homesec.plugins.analyzers.openai.aiohttp.ClientSession", lambda **kw: mock_session)
# Now real frame extraction happens, only HTTP is mocked
vlm = OpenAIVLM(config)
result = await vlm.analyze(video_path)
assert result.risk_level == "low"Rule: When mocking external services, verify that your code calls them correctly by capturing and asserting on the request structure.
Bad - Just checking call count:
async def test_sends_email(self):
notifier = SendGridNotifier(config)
await notifier.send(alert)
# BAD: Only verifies something was called
mock_session.post.assert_called_once()Good - Verifying request contract:
async def test_sends_email_with_correct_structure(self):
captured_request: dict[str, Any] = {}
async_cm = AsyncMock()
async_cm.__aenter__ = AsyncMock(return_value=mock_response)
async_cm.__aexit__ = AsyncMock(return_value=None)
def capture_post(url: str, json: dict, headers: dict) -> AsyncMock:
captured_request["url"] = url
captured_request["json"] = json
captured_request["headers"] = headers
return async_cm
mock_session.post = capture_post
notifier = SendGridNotifier(config)
await notifier.send(alert)
# GOOD: Verify the actual API contract
assert captured_request["url"] == "https://api.sendgrid.com/v3/mail/send"
assert "Authorization" in captured_request["headers"]
assert captured_request["json"]["personalizations"][0]["to"][0]["email"] == "user@example.com"
assert "subject" in captured_request["json"]Rule: Prefer real implementations over mocks when they're fast, deterministic, and don't require external resources.
Use real:
- Filesystem operations via
tmp_pathfixture - In-memory data structures
- Pure functions (validators, formatters, parsers)
- Entry points behavior via stubbed
entry_points()(deterministic; no packaging dependency)
Mock:
- Network calls (HTTP, MQTT, etc.)
- Databases (unless using testcontainers)
- GPU/ML inference
- Time-sensitive operations
Example - Using real filesystem:
async def test_put_get_delete_roundtrip(self, tmp_path: Path):
# GOOD: Real filesystem via tmp_path
storage = LocalStorage(LocalStorageConfig(root=str(tmp_path / "storage")))
source_file = tmp_path / "source.mp4"
source_file.write_bytes(b"video content")
# Real operations on real filesystem
result = await storage.put_file(source_file, "clips/test.mp4")
assert await storage.exists(result.storage_uri)
download_path = tmp_path / "downloaded.mp4"
await storage.get(result.storage_uri, download_path)
assert download_path.read_bytes() == b"video content"Example - Using stubbed entry points (deterministic):
def test_returns_stubbed_entry_points(self, monkeypatch):
# GOOD: Deterministic entry points without relying on installation metadata
fake_eps = [
metadata.EntryPoint(name="homesec", value="homesec.cli:main", group="console_scripts")
]
class _FakeEntryPoints(list):
def select(self, group: str):
return [ep for ep in self if ep.group == group]
monkeypatch.setattr(
"homesec.plugins.utils.metadata.entry_points",
lambda: _FakeEntryPoints(fake_eps),
)
result = list(iter_entry_points("console_scripts"))
assert [ep.name for ep in result] == ["homesec"]Rule: When faking external clients, track what methods were called rather than setting internal state flags.
Bad - State flags in fake:
class _FakeDropboxClient:
def __init__(self):
self.session_started = False
self.session_finished = False
self.session_appends = 0
def files_upload_session_start(self, chunk):
self.session_started = True # Internal state flag
return _FakeSession("session_1")
# Test checks internal state
assert client.session_started is True
assert client.session_appends >= 1Good - Track API calls:
class _FakeDropboxClient:
def __init__(self):
self.api_calls: list[str] = [] # Track what was called
def files_upload_session_start(self, chunk):
self.api_calls.append("files_upload_session_start")
return _FakeSession("session_1")
def files_upload_session_append_v2(self, chunk, cursor):
self.api_calls.append("files_upload_session_append_v2")
cursor.offset += len(chunk)
# Test verifies contract
assert "files_upload_session_start" in client.api_calls
assert "files_upload_session_append_v2" in client.api_calls
assert "files_upload_session_finish" in client.api_callsRule: Use fixtures to save and restore global state between tests. Never let one test pollute another.
Bad - Tests pollute each other:
def test_sets_camera_name(self):
set_camera_name("front_door")
# Global state now modified for all subsequent tests!Good - Fixture restores state:
@pytest.fixture(autouse=True)
def reset_logging_state():
"""Reset global logging state before each test."""
import homesec.logging_setup as module
original_camera = module._CURRENT_CAMERA_NAME
original_recording = module._CURRENT_RECORDING_ID
yield
# Restore after test completes
module._CURRENT_CAMERA_NAME = original_camera
module._CURRENT_RECORDING_ID = original_recording
def test_sets_camera_name(self):
set_camera_name("front_door")
# State automatically restored after testRule: If the test name says it tests something specific, the test must actually test that thing.
Bad - Misleading test name:
def test_handles_unicode_filenames(self):
# Claims to test unicode but uses ASCII!
result = await storage.put_file(source, "clips/test_file.mp4")
assert await storage.exists(result.storage_uri)Good - Test matches name:
def test_handles_unicode_filenames(self):
# Actually tests unicode (Cyrillic and Japanese)
result = await storage.put_file(source, "clips/камера_передняя.mp4")
assert await storage.exists(result.storage_uri)
result2 = await storage.put_file(source, "clips/玄関カメラ.mp4")
assert await storage.exists(result2.storage_uri)Rule: All tests must use Given/When/Then comments. This structure enforces behavioral thinking and makes tests self-documenting.
The pattern forces you to think about tests as behavior specifications:
- Given = Preconditions (setup state, create dependencies)
- When = Action (the single thing being tested)
- Then = Postconditions (observable outcomes)
This naturally leads to behavioral testing because:
- Given focuses on what state exists, not how to create internal state
- When tests one action, not implementation steps
- Then verifies observable outcomes, not internal changes
async def test_filter_detects_person(self):
# Given: A video with a person visible
video_path = tmp_path / "person.mp4"
_create_test_video(video_path, frames=10)
filter_plugin = YOLOFilter(config)
# When: Running detection
result = await filter_plugin.detect(video_path)
# Then: Person should be detected with high confidence
assert "person" in result.detected_classes
assert result.confidence > 0.5Focus on what state exists, not how to manipulate internals:
Bad:
# Given: A notifier with _connected set to False
notifier = MQTTNotifier(config)
notifier._connected = False # Manipulating internal state!Good:
# Given: A notifier that failed to connect
fake_client = _FakeClientNoConnect()
monkeypatch.setattr("...mqtt.Client", lambda: fake_client)
notifier = MQTTNotifier(config) # Naturally not connectedTest one action. If you need multiple actions, you're testing a workflow (which may be valid for integration tests).
Bad:
# When: Uploading and then checking
await storage.put_file(source, "test.mp4")
exists = await storage.exists(uri)
await storage.delete(uri)Good:
# When: Uploading a file
result = await storage.put_file(source, "test.mp4")Or for lifecycle tests, make the workflow explicit:
async def test_put_get_delete_roundtrip(self):
"""Full lifecycle: put -> exists -> get -> delete."""
# Given: A storage instance and source file
...
# When/Then: Each step of the lifecycle
result = await storage.put_file(source, "test.mp4")
assert await storage.exists(result.storage_uri)
...Assert on observable behavior, not internal state:
Bad:
# Then: Notifier should be shut down
assert notifier._shutdown_called is True
assert notifier._client is NoneGood:
# Then: Notifier should reject new operations
with pytest.raises(RuntimeError, match="shut down"):
await notifier.send(alert)
# Then: Ping should return False
assert await notifier.ping() is False| Section | Behavioral Focus | Anti-Pattern |
|---|---|---|
| Given | Set up via public APIs or realistic fakes | Directly setting obj._private = value |
| When | Call public methods | Call internal helpers or set flags |
| Then | Assert return values, exceptions, side effects | Assert on obj._internal_state |
The key insight: if you can't express your test in Given/When/Then without touching internals, you're testing implementation, not behavior.
| Test Type | What to Verify | What to Mock |
|---|---|---|
| Unit tests | Return values, exceptions raised, side effects | External services |
| Integration | Component interactions, data flow | Nothing (or just external APIs) |
| Contract | Request/response structure | Network layer only |
| Observable Behavior | Internal Implementation |
|---|---|
| Return values | Private attributes (_foo) |
| Raised exceptions | State flags |
| Side effects (files created, APIs called) | Method call order |
| Public method behavior | Internal helper methods |
- Testing implementation details - If refactoring breaks tests but behavior is unchanged, tests are too coupled
- Mocking everything - If you mock 5+ things per test, you're not testing real behavior
- State flag assertions -
assert obj._internal_flag is Truetests nothing useful - Misleading test names - Test name must match what's actually being tested
- Missing cleanup - Global state must be restored between tests
- Elaborate fakes that mirror implementation - Fakes should be simple recorders, not reimplementations
AGENTS.md- Development guidelines and patternsDESIGN.md- Architecture overviewtests/homesec/conftest.py- Shared fixtures and mockstests/homesec/mocks/- Mock implementations