Threat Hunting III: HTTP Honeypot
Introduction
I set out to build a honeypot that captures HTTP attack traffic and forwards it directly to Elasticsearch for analysis. Instead of reinventing the wheel, I built on top of honeyhttpd by bocajspear1 and added structured logging, credential extraction, and proper sanitization.
The result is a production-ready honeypot that simulates an Apache server protected by HTTP Basic Authentication, capturing attacker credentials and request metadata in queryable Elasticsearch documents.
Architecture Overview
The honeypot works in three layers:
ApachePasswordServer— Demands Basic Auth on every request, parses HTTP headers, and collects metadataElasticSearchLogger— Sanitizes logs and indexes them into Elasticsearch- Docker Container — Runs the entire stack in an isolated environment
Containerizing with Docker
I packaged the honeypot as a Docker container for easy deployment:
FROM python:3
ARG APP_NAME=honeyhttpd
ARG USER_ID="10001"
ARG GROUP_ID="app"
ARG HOME="/app"
ENV HOME=${HOME}
# Create unprivileged user
RUN groupadd --gid ${USER_ID} ${GROUP_ID} && \
useradd --create-home --uid ${USER_ID} --gid ${GROUP_ID} --home-dir /app ${GROUP_ID}
# Install dependencies
RUN apt-get update && \
apt-get install -y --no-install-recommends \
file gcc libwww-perl curl unzip && \
apt-get clean && apt-get autoremove -y
WORKDIR ${HOME}
# Copy application files
COPY ./requirements.txt .
COPY ./config.json .
COPY ./server*.pem .
COPY ./ca.crt .
COPY honeyhttpd logs servers util .
COPY start.py .
# Install Python dependencies
RUN pip3 install --upgrade pip setuptools wheel && \
pip3 install --no-cache-dir elasticsearch==8.13.0 && \
pip3 install --no-cache-dir -r ./requirements.txt
# Remove build tools to reduce image size
RUN apt-get remove gcc --purge -y
# Set permissions and drop root
RUN chown -R ${USER_ID}:${GROUP_ID} ${HOME}
USER ${USER_ID}
EXPOSE 8443
CMD ["python3", "start.py", "--config", "config.json"]Build and run:
docker build -t honeyhttpd .
docker run --hostname honeyhttpd -p 8443:8443 honeyhttpdConfiguration
Point the honeypot at your Elasticsearch instance via config.json:
{
"loggers": {
"ElasticSearchLogger": {
"active": true,
"config": {
"server": "https://elasticsearch.example.com:9200",
"verify_certs": true,
"username": "elastic",
"password": "your-password",
"index": "honeypot-http"
}
}
},
"servers": [
{
"handler": "ApachePasswordServer",
"mode": "https",
"port": 8443,
"domain": "target.example.com",
"timeout": 10,
"cert_path": "server_cert.pem",
"key_path": "server_key.pem"
}
]
}Code Improvements
ApachePasswordServer.py
The server now properly simulates HTTP Basic Authentication and captures credentials in a structured way.
Key features:
on_request()— Enforces Basic Auth on every request. Returns 401 if Authorization header is missingon_POST()— Stashes POST bodies for logging (critical for capturing login attempts)on_complete()— Parses HTTP metadata: method, URL, request/response headers, and decodes Basic Auth credentials
Helper functions:
def _decode_basic_auth(b64_string):
"""Decode Base64 Basic-Auth credentials into 'user:pass'."""
try:
decoded_bytes = base64.b64decode(b64_string, validate=True)
decoded_str = decoded_bytes.decode('utf-8')
return decoded_str if ':' in decoded_str else "[invalid format]"
except Exception as e:
return f"[decode error: {e}]"
def _extract_post_body(raw_request):
"""Extract POST body from raw HTTP request."""
try:
if '\r\n\r\n' in raw_request:
return raw_request.split('\r\n\r\n', 1)[1].strip()
if '\n\n' in raw_request:
return raw_request.split('\n\n', 1)[1].strip()
except Exception:
pass
return ''The on_complete() method collects:
- HTTP method and URL
- Request/response headers (User-Agent, Accept, Content-Type, etc.)
- HTTP status code
- Decoded credentials (username:password)
- POST body (for form submissions)
ElasticSearchLogger.py
The logger sanitizes all input before indexing to prevent injection attacks and ensure clean Elasticsearch documents.
Sanitization functions:
def sanitize_string(s, max_length=1000):
"""Remove control characters and truncate strings."""
if not isinstance(s, str):
s = str(s)
s = re.sub(r'[\x00-\x08\x0B\x0C\x0E-\x1F\x7F]', '', s)
if len(s) > max_length:
s = s[:max_length] + "...[cut]..."
return s
def sanitize_dict(d, max_length=1000):
"""Recursively sanitize dictionaries and lists."""
# ... sanitizes nested structuresElasticsearch indexing:
log_entry = {
"@timestamp": datetime.datetime.now(datetime.timezone.utc).isoformat(),
"remote_ip": remote_ip,
"remote_port": remote_port,
"protocol": "https" if is_ssl else "http",
"port": port,
"request": request,
"response": response,
"http.response.status_code": status_code,
"http.request.method": method,
"user_agent.original": user_agent,
"http.request.body.content": post_body,
"creds": decoded_credentials,
"host.name": hostname
}These fields are compatible with Elasticsearch’s ECS (Elastic Common Schema), making queries and alerts straightforward.
Advantages Over the Original
| Feature | Original | Improved |
|---|---|---|
| Credential Capture | Basic string parsing | Base64 decoding + validation |
| POST Body Handling | Not captured | Properly extracted and logged |
| Input Sanitization | None | Removes control chars, truncates |
| Error Handling | Minimal | Comprehensive logging |
| Elasticsearch Integration | Manual logging | Direct indexing with ECS schema |
Testing
Once deployed, test the honeypot:
curl -k -u attacker:password https://localhost:8443/This should trigger a Basic Auth challenge. When credentials are provided, they get captured and indexed in Elasticsearch.
Query Elasticsearch:
curl -X GET "elasticsearch:9200/honeypot-http/_search?q=creds:*" \
-u elastic:passwordNext Steps
- TODO: Automated testing with OWASP ZAP or similar tools
- TODO: Deploy to production honeypot server for live monitoring
- TODO: Submit improvements as pull request to original honeyhttpd project
- TODO: ELK Stack setup guide for visualization and alerting
Summary
This enhanced honeypot transforms a simple HTTP challenge responder into a structured threat hunting tool. By capturing credentials, request metadata, and response data in Elasticsearch, you gain visibility into attack patterns and attacker behavior.
The honeypot is production-ready: it handles edge cases, sanitizes malicious input, and integrates seamlessly with existing SIEM infrastructure.