Use Case 06: Data Analysis Example with Google Colab

Use Case: Sea‑Cargo Traffic Analysis with AI (Google Colab)

Use Case: Sea‑Cargo Traffic Analysis with AI (Google Colab)

Analyze Baltic Sea cargo vessel traffic using ChatGPT and Google Colab—clean data, detect anomalies, and build clear visualizations.

🤖 AI‑assisted Analysis 📊 Anomaly Detection 📈 Visualizations

Overview

This use case demonstrates how to analyze sea‑cargo traffic with ChatGPT and Google Colab, including data cleanup, anomaly detection, and visualization. The dataset covers arrivals to Baltic Sea ports over a 12‑month period.

Dataset Details

  • Countries: Sweden, Finland, Estonia, Latvia, Lithuania, Poland
  • Period: 2021‑01‑01 00:00 UTC → 2022‑04‑31 23:59 UTC
  • Vessel types: All cargo and all tankers; Ship size: length ≥ 65 m
  • Events: 87,624 arrival records
  • Fields: Port ID, Port name, LOCODE, MMSI, IMO, vessel name, destination, vessel type, arrival/departure timestamps

Step‑by‑Step Guide

Step 1 Preliminary Data Analysis with ChatGPT

Prompt example: “You are a maritime cargo shipping expert. Summarize this dataset for me. What trends or anomalies do you see?”

AI findings: Identified Short Sea Shipping patterns, regional trade and bulk logistics corridors; stable contract‑based flows; and data inconsistencies in destination fields.

Issues observed: Same port expressed differently (e.g., “LVVNT”, “LV VNT”, “VENTSPILS”); multi‑port routes like “SE STO FI SKV”; and unrealistically short dwell times.

AI‑recommended next steps: (1) data engineering pass, (2) normalize port and destination fields, (3) validate raw data.

Prompt example: “Harmonize arrival and departure destination fields.”

Normalize to UN/LOCODE, remove spaces (e.g., SE GOT → SEGOT), and retain raw columns for traceability.

Prompts used: (1) Create a script that corrects destinations in vesselDestinationArrival and vesselDestinationDeparture to UN/LOCODE; (2) Build a Python dictionary mapping portNameportLocode by scanning the CSV.

Colab workflow: Upload CSV to /content/ais/, set variables like INPUT_CSV="/content/ais/PRJ896.csv" and OUTPUT_PY="/content/ais/port_name_to_unlocode.py", then run the generated scripts.

Result: ~124 unique port mappings created; dataset cleaned and destinations standardized.

Prompt example: “Propose Python code for time‑based anomaly filtering.”

The script calculates port dwell time and flags anomalies via a function such as apply_time_anomaly_flags(), plus a usage snippet to apply and test the logic in Colab.

Prompt example: “Provide a Python script to visualize port_dwell_hours and time_anomaly_flag.”

Outputs included a reusable Colab‑ready script and adaptations to plot distributions, counts, anomaly categories, and port‑level comparisons.

Cargo‑Traffic Anomalies Visualization

The workflow produces multiple charts, including: (1) Distribution of Port Dwell Time, (2) Time‑Based Anomaly Counts, (3) Port Dwell Time by Anomaly Type, (4) Dwell Time by Port Name , and (5) Port Dwell Time Distribution by Port.

Histogram showing distribution of port dwell time in hours on a log scale
Figure 1. Distribution of Port Dwell Time
Bar chart showing counts of time-based anomalies: short dwell, long dwell, overlapping calls
Figure 2. Time‑Based Anomaly Counts
Box plot showing port dwell time by anomaly type: long dwell, overlapping calls, short dwell
Figure 3. Port Dwell Time by Anomaly Type
Bar chart of average port dwell time by port for top ports
Figure 4. Average Port Dwell Time by Port (Top Ports)
Box plot distributions of port dwell time by individual port names
Figure 5. Port Dwell Time Distribution by Port

© aiknowit.eu — Use Case Library • “Sea‑Cargo Traffic Analysis with AI (Google Colab)”

How useful was this post?

Click on a star to rate it!

Average rating 5 / 5. Vote count: 4

No votes so far! Be the first to rate this post.