webserv/.github/copilot-instructions.md
2025-11-04 10:22:52 +01:00

11 KiB

Webserv - AI Coding Agent Instructions

Project Overview

A C++20 HTTP/1.1 web server implementing epoll-based event-driven architecture. Core components: configuration parser, HTTP request/response handling, CGI execution, static file serving, and routing.

Architecture Fundamentals

Event Loop & Request Flow

Client → epoll_wait → Server::handleEvent → Client → Router → Handler → Response

Critical pattern: The server uses a single epoll instance (Server::epoll_fd_) to multiplex I/O:

  1. Server::run() contains the main event loop calling epoll_wait() with 10ms timeout
  2. Events trigger specific handlers: EPOLLIN → request reading, EPOLLOUT → response writing
  3. Each Client manages its own sockets and handler state machines
  4. Sockets transition through states tracked in ASocket::IoState (READ/WRITE)

Why this matters: All I/O is non-blocking. Never call blocking operations. Use socket->setIOState() and server.update(socket) to change epoll interest masks.

Configuration System

Three-tier hierarchy: GlobalConfigServerConfigLocationConfig

Directive resolution: Uses inheritance with AConfig::get<T>(name) - searches current config, falls back to parent. Example:

auto maxBodySize = locationConfig->get<size_t>("client_max_body_size")
    .value_or(serverConfig->get<size_t>("client_max_body_size").value_or(1048576));

Validation architecture: Two-stage validation in ConfigValidator:

  1. Structural rules (AStructuralValidationRule): Check block-level requirements (e.g., RequiredDirectivesRule)
  2. Directive rules (AValidationRule): Validate individual directive values (e.g., PortValidationRule)

Rules are registered in ConfigValidator constructor and executed by ValidationEngine.

Context-aware directives: The DirectiveFactory uses a context string ("GSL" = Global/Server/Location) to restrict where directives can appear. Check DirectiveFactory::supportedDirectives when adding new directives.

CGI Execution Pipeline

Process model: fork()pipe2() for stdin/stdout/stderr → execve() in child

Critical implementation details:

  1. Use pipe2(O_CLOEXEC | O_NONBLOCK) - flags prevent fd leaks and blocking
  2. Child process: dup2() pipes to std streams, call Log::clearChannels() before execve()
  3. Parent: Wrap pipe fds in CgiSocket objects, register with Client::addSocket()
  4. Environment: CgiEnvironment class builds CGI/1.1 compliant env vars (required: GATEWAY_INTERFACE, SERVER_PROTOCOL, REQUEST_METHOD, etc.)
  5. Timeout handling: TimerSocket with timerfd_create() registered in epoll

State machine: CgiHandler writes request body → reads response headers → parses headers → reads body → waitpid(WNOHANG) to check status.

HTTP Request Parsing

State machine in HttpRequest::State: RequestLine → Headers → Body/Chunked → Complete/ParseError

Chunked transfer encoding: Implemented in parseBufferforChunkedBody():

  • Read chunk size (hex) → validate → read chunk data → repeat until size=0
  • Parse errors set State::ParseError and call response.setError(400)

Critical validation: Host header is mandatory (HTTP/1.1). Checked in setState(State::Complete).

Build & Test System

Build Configuration

  • CMake build types: Release (default), Debug, ASAN (AddressSanitizer)
  • Makefile wrapper: make release/debug/asan builds specific configurations
  • Environment detection: Makefile tracks container vs local builds in build/.build-env, auto-cleans on switch

Test Commands

make test              # Build + run unit tests (Google Test)
make test_verbose      # Run with detailed output
make coverage          # Generate coverage report (requires lcov or gcovr)
./webserv-tester/bin/run_tests.py [--suite SUITE] [--test TEST]

Test structure:

  • Unit tests: tests/ directory, organized by component
  • Integration tests: webserv-tester/ Python test framework
  • Test config: webserv-tester/data/conf/test.conf (port 8080)

Integration Testing with webserv-tester

The webserv-tester/ directory contains a comprehensive Python-based integration test framework that validates HTTP/1.1 compliance, configuration handling, and feature implementation.

Running the tester:

# Run all tests (automatically starts/stops server)
./run_test.sh

# Run specific test suite(s)
./run_test.sh basic
./run_test.sh http
./run_test.sh cgi

Available test suites (in webserv-tester/tests_suites/):

  • basic (basic_tests.py): Smoke tests for fundamental functionality (server start, static files, basic requests)
  • http (http_tests.py): HTTP/1.1 protocol compliance (headers, status codes, chunked encoding, keep-alive, malformed requests)
  • cgi (cgi_tests.py): CGI/1.1 execution (environment variables, stdin/stdout handling, timeouts, error handling)
  • method (method_tests.py): HTTP method support per location (GET, POST, DELETE validation against config)
  • config (config_tests.py): Configuration directives (inheritance, root, index, autoindex, error pages, redirects, location matching)
  • invalid (invalid_config_tests.py): Error handling for malformed configs (missing directives, invalid contexts, syntax errors)
  • upload (upload_tests.py): File upload functionality
  • uri (uri_tests.py): URI parsing and handling
  • redirect (redirect_tests.py): HTTP redirect handling
  • cookie (cookie_tests.py): Cookie handling
  • security (security_tests.py): Security-related tests
  • performance (performance_tests.py): Performance benchmarks

Test framework architecture:

  • core/test_case.py: Base class for all tests with assertion helpers
  • core/server_manager.py: Manages server process lifecycle (start/stop/restart)
  • core/test_runner.py: HTTP request utilities and response validation
  • data/conf/test.conf: Test server configuration (port 8080, multiple locations)
  • data/www/: Test web content (HTML, CGI scripts, static files)

Writing new tests: Tests inherit from TestCase class and follow this pattern:

class MyTests(TestCase):
    def test_my_feature(self):
        response = self.runner.send_request('GET', '/path')
        self.assert_equals(response.status_code, 200, "Expected 200 OK")
        self.assert_true('Content-Type' in response.headers, "Missing header")

Find test source in webserv-tester/tests_suites/ to understand test scenarios or add new tests for your features.

Code Conventions

Include Order (enforced by .clang-format)

  1. Own header ("Class.hpp")
  2. Project headers (<webserv/path/Header.hpp>)
  3. C++ standard library (<string>)
  4. C headers (<unistd.h>)

Logging Pattern

Use Log::trace(LOCATION) at function entry for debugging. Available levels: trace, debug, info, warning, error, fatal.

Important: Always log before throwing exceptions or returning errors.

Error Handling

  • HTTP errors: Call ErrorHandler::createErrorResponse(statusCode, response, config) - handles custom error pages
  • Validation errors: Throw RequestValidator::ValidationException{statusCode} in Router
  • Config errors: Throw std::runtime_error with descriptive message during parsing
  • CGI errors: Check cgiProcess_->getExitCode(), set response.setStatus(500) if non-zero

Memory Management

  • Use std::unique_ptr for ownership (e.g., Client owns ClientSocket)
  • Pass raw pointers for non-owning references (e.g., Server& in Client)
  • Socket ownership: Server owns ServerSocket, Client owns ClientSocket and CgiSocket

Common Patterns & Gotchas

Adding a New Handler

  1. Inherit from AHandler, implement handle() and handleTimeout()
  2. Register in Router::handleRequest() based on URI properties
  3. Use startTimer() from base class if operation may block
  4. Set response complete: response_.setComplete()

Adding a Configuration Directive

  1. Add to DirectiveFactory::supportedDirectives with context string
  2. Create validation rule implementing AValidationRule
  3. Register in ConfigValidator constructor: engine_->addServerRule(name, std::make_unique<Rule>())
  4. Access in code: config->get<Type>("directive_name")

Socket State Management

Critical: After modifying socket interest (read→write or vice versa):

socket->setIOState(ASocket::IoState::WRITE);
socket->markDirty();  // Flags for epoll update
// Server polls dirty sockets in pollSockets() and calls update()

URI Resolution

URI class handles path resolution:

  • matchConfig(): Longest prefix match for location blocks
  • getFullPath(): Resolves root + location path + request path
  • isCgi(): Checks if path matches cgi_ext directive
  • isRedirect(): Checks for redirect directive

Testing Best Practices

Unit Test Structure

Follow GTest patterns in tests/:

  • Use test fixtures inheriting from ::testing::Test
  • Name tests descriptively: TEST_F(ClassTest, MethodName_Scenario_ExpectedBehavior)
  • One assertion per logical check
  • Mock external dependencies (sockets, file I/O)

Integration Test Organization

webserv-tester/tests_suites/ contains:

  • basic_tests.py: Smoke tests (server start, static files)
  • http_tests.py: Protocol compliance (headers, status codes, chunked encoding)
  • cgi_tests.py: CGI execution and environment variables
  • method_tests.py: HTTP method support per location
  • config_tests.py: Directive inheritance and validation
  • invalid_config_tests.py: Error handling for malformed configs

Key Files Reference

  • webserv/main.cpp: Entry point, signal handling
  • webserv/server/Server.{hpp,cpp}: Event loop, epoll management
  • webserv/client/Client.{hpp,cpp}: Per-connection state
  • webserv/config/ConfigManager.hpp: Singleton config access
  • webserv/config/validation/ConfigValidator.cpp: Validation rule registration
  • webserv/router/Router.cpp: Request routing logic
  • webserv/handler/CgiProcess.cpp: fork/exec implementation
  • webserv/http/HttpRequest.cpp: State machine for parsing
  • CMakeLists.txt: Build configuration and test setup

Debugging Tips

AddressSanitizer Build

make asan
./build/webserv config/default.conf

Use for memory leaks, use-after-free, double-free detection.

CGI Debugging

CGI child process stderr goes to CgiSocket read in CgiHandler::error(). Check logs for script output.

Epoll Issues

Enable trace logging: modify Log::setLevel(Log::Level::TRACE) in main.cpp. Watch for socket fd lifecycle in logs.

Config Validation

Run ConfigValidator checks before starting server. Errors print to stderr with context (global/server/location and directive name).