Hướng dẫn làm dự án AI hoàn chỉnh từ A tới Z (Data → Training → Deploy)

Nội dung chính

Tổng quan luồng dự án

User → (Upload / Data collection) → Preprocess → Train → Eval → Export model → Serve via API → Client (Web/Mobile) → Feedback → Retrain.

Sơ đồ luồng (ASCII)

[User Data] --> [Ingestion/Storage] --> [Preprocessing Pipeline] --> [Train + Val + Test]
| |
\--------------------(Feedback & Labels)------------------/
|
v
[Model Registry / Versioning]
|
v
[Model Export (.h5/.tflite/.onnx)]
|
v
[Model Serving (Flask/FastAPI / TF-Serving / ONNXRuntime)]
|
v
[Clients: Web / Mobile / CLI]

1) Lên ý tưởng & xác định scope

Mục tiêu rõ ràng

  • Xác định bài toán: classification/regression/detection/segmentation.

  • Ví dụ: “Nhận diện tên cây cảnh (multi-class classification) từ ảnh RGB một cây trong khung”.

  • Xác định output: tên cây + confidence. Không thêm chăm sóc/tư vấn (theo yêu cầu).

Yêu cầu phi chức năng (non-functional)

  • Response time mục tiêu (local demo): <1s inference (model nhẹ).

  • Mức accuracy mục tiêu (ban đầu): >= 85% trên tập test thực tế.

  • Triển khai ban đầu: chạy local bằng Flask.

Deliverables

  • Dataset chuẩn, scripts train, model export, Flask API, README + hướng dẫn cài đặt.


2) Thiết kế dữ liệu (Data plan)

Xác định classes & số lượng cần cho mỗi class

  • Bắt đầu với 20–50 loài phổ biến.

  • Mỗi class tối thiểu 100 ảnh (tốt nhất 300–1000 ảnh/class nếu được).

Metadata cần thu

  • filename, class_label, source, date_collected, camera_exif (nếu có), location (opt-in), user_feedback.

Lưu trữ

  • Dùng cấu trúc thư mục chuẩn:

dataset/
train/
class_a/
class_b/
val/
test/
raw/
annotations.csv
  • Dùng object storage (S3/minio) nếu dữ liệu lớn.


3) Thu thập & chuẩn hóa dữ liệu

Thu thập

  • Tự chụp, lấy từ iNaturalist/Flickr/Kaggle (chú ý license), crowdsourcing.

  • Viết script scraper (requests + selenium nếu cần) hoặc dùng google-images-download/bing-image-downloader.

Kiểm tra & lọc

  • Loại bỏ ảnh mờ, watermark, quá nhỏ. Dùng script kiểm tra resolution/size.

  • Mở từng class kiểm tra chất lượng.

Ví dụ script kiểm tra kích thước tối thiểu (python)

</div>
</div>
</div>
<div class="overflow-y-auto p-4" dir="ltr"><code class="whitespace-pre! language-python"><code class="whitespace-pre! language-python"><span class="hljs-keyword">from</span> PIL <span class="hljs-keyword">import</span> Image<br />
<span class="hljs-keyword">import</span> os</code></code>MIN_W, MIN_H = <span class="hljs-number">100</span>, <span class="hljs-number">100</span><br />
bad = []<br />
<span class="hljs-keyword">for</span> root, _, files <span class="hljs-keyword">in</span> os.walk(<span class="hljs-string">“dataset/raw”</span>):<br />
<span class="hljs-keyword">for</span> f <span class="hljs-keyword">in</span> files:<br />
path = os.path.join(root, f)<br />
<span class="hljs-keyword">try</span>:<br />
w,h = Image.<span class="hljs-built_in">open</span>(path).size<br />
<span class="hljs-keyword">if</span> w < MIN_W <span class="hljs-keyword">or</span> h < MIN_H:<br />
bad.append(path)<br />
<span class="hljs-keyword">except</span>:<br />
bad.append(path)<br />
<span class="hljs-built_in">print</span>(<span class="hljs-string">“Bad images:”</span>, <span class="hljs-built_in">len</span>(bad))</div>
<div dir="ltr">

4) Exploratory Data Analysis (EDA)

Mục đích

  • Hiểu phân bố classes, imbalance, outliers.

  • Quan sát sample images, histogram sizes, color distributions.

Công cụ

  • Jupyter notebook, pandas, matplotlib, seaborn.

Ví dụ code

</div>
</div>
</div>
<div class="overflow-y-auto p-4" dir="ltr"><code class="whitespace-pre! language-python"><code class="whitespace-pre! language-python"><span class="hljs-keyword">import</span> os, random<br />
<span class="hljs-keyword">from</span> PIL <span class="hljs-keyword">import</span> Image<br />
<span class="hljs-keyword">import</span> matplotlib.pyplot <span class="hljs-keyword">as</span> plt</code></code>classes = os.listdir(<span class="hljs-string">“dataset/train”</span>)<br />
<span class="hljs-keyword">for</span> c <span class="hljs-keyword">in</span> classes[:<span class="hljs-number">6</span>]:<br />
imgs = os.listdir(os.path.join(<span class="hljs-string">“dataset/train”</span>, c))<br />
sample = random.choice(imgs)<br />
img = Image.<span class="hljs-built_in">open</span>(os.path.join(<span class="hljs-string">“dataset/train”</span>, c, sample))<br />
plt.figure(); plt.title(c); plt.imshow(img); plt.axis(<span class="hljs-string">‘off’</span>)</div>
<div dir="ltr">

Kiểm tra imbalance

  • Nếu imbalance lớn, plan oversampling or class-weighting.


5) Pipeline tiền xử lý (production-ready)

Yêu cầu pipeline

  • deterministic preprocessing (same for train/val/test and inference)

  • support augmentation only on train

  • fast (use tf.data or torch.utils.data)

Ví dụ dùng tf.data (TensorFlow)

</div>
</div>
</div>
<div class="overflow-y-auto p-4" dir="ltr">
<p><code class="whitespace-pre! language-python"><span class="hljs-keyword">import</span> tensorflow <span class="hljs-keyword">as</span> tf</code></p>
<p>IMG_SIZE = (<span class="hljs-number">224</span>,<span class="hljs-number">224</span>)<br />
<span class="hljs-keyword">def</span> <span class="hljs-title function_">preprocess</span>(<span class="hljs-params">path, label</span>):<br />
img = tf.io.read_file(path)<br />
img = tf.image.decode_jpeg(img, channels=<span class="hljs-number">3</span>)<br />
img = tf.image.resize(img, IMG_SIZE)<br />
img = img / <span class="hljs-number">255.0</span><br />
<span class="hljs-keyword">return</span> img, label</p>
<p><span class="hljs-comment"># create dataset…</span></p>
<p>

Augmentation (on-the-fly)

  • rotation, flip, random_crop, color_jitter.

  • Dùng albumentations cho PyTorch; dùng tf.image/keras.preprocessing cho TF.


6) Chọn model & chiến lược training

Chiến lược tổng quát

  • Dùng transfer learning: backbone pretrained (MobileNetV2/EfficientNetB0).

  • Freeze base, train head, sau đó unfreeze một phần và fine-tune.

Lựa chọn vì demo local + Flask

  • MobileNetV2 hoặc EfficientNetB0: nhỏ, inference nhanh.

Mẫu mã Keras (transfer learning)

</div>
</div>
</div>
<div class="overflow-y-auto p-4" dir="ltr"><code class="whitespace-pre! language-python"><code class="whitespace-pre! language-python"><span class="hljs-keyword">from</span> tensorflow.keras.applications <span class="hljs-keyword">import</span> MobileNetV2<br />
<span class="hljs-keyword">from</span> tensorflow.keras.layers <span class="hljs-keyword">import</span> GlobalAveragePooling2D, Dense, Dropout<br />
<span class="hljs-keyword">from</span> tensorflow.keras.models <span class="hljs-keyword">import</span> Model</code></code>base = MobileNetV2(weights=<span class="hljs-string">‘imagenet’</span>, include_top=<span class="hljs-literal">False</span>, input_shape=(<span class="hljs-number">224</span>,<span class="hljs-number">224</span>,<span class="hljs-number">3</span>))<br />
x = GlobalAveragePooling2D()(base.output)<br />
x = Dropout(<span class="hljs-number">0.3</span>)(x)<br />
x = Dense(<span class="hljs-number">256</span>, activation=<span class="hljs-string">‘relu’</span>)(x)<br />
out = Dense(num_classes, activation=<span class="hljs-string">‘softmax’</span>)(x)<br />
model = Model(inputs=base.<span class="hljs-built_in">input</span>, outputs=out)<code class="whitespace-pre! language-python"><code class="whitespace-pre! language-python"></code></code><span class="hljs-keyword">for</span> layer <span class="hljs-keyword">in</span> base.layers:<br />
layer.trainable = <span class="hljs-literal">False</span><br />
model.<span class="hljs-built_in">compile</span>(optimizer=<span class="hljs-string">‘adam’</span>, loss=<span class="hljs-string">‘categorical_crossentropy’</span>, metrics=[<span class="hljs-string">‘accuracy’</span>])</div>
<div dir="ltr">

7) Huấn luyện & theo dõi (experiments)

Thiết lập experiment tracking

  • Dùng TensorBoard hoặc Weights & Biases (wandb) để theo dõi loss/acc, learning rate, hist gradients.

Callbacks cần có

  • ModelCheckpoint(save_best_only=True)

  • EarlyStopping(patience=5)

  • ReduceLROnPlateau

Ví dụ training call

</div>
</div>
</div>
<div class="overflow-y-auto p-4" dir="ltr"><code class="whitespace-pre! language-python">history = model.fit(train_ds, validation_data=val_ds, epochs=<span class="hljs-number">30</span>, callbacks=[...])<br />
</code></div>
</div>
<p data-start="6921" data-end="6944">

Ghi chú thực nghiệm

  • Ghi config (batch size, lr, backbone, augmentation) vào file config.yaml.

  • Lưu model với tên mô tả: plant_mobilenetv2_bs32_lr1e-3_epoch30.h5.


8) Đánh giá model, debug lỗi, validation

Metrics cần quan tâm

  • Accuracy (top-1), top-3 accuracy

  • Confusion matrix: phát hiện các cặp class dễ nhầm

  • Per-class precision & recall

Tạo confusion matrix

</div>
</div>
</div>
<div class="overflow-y-auto p-4" dir="ltr"><code class="whitespace-pre! language-python"><span class="hljs-keyword">from</span> sklearn.metrics <span class="hljs-keyword">import</span> confusion_matrix, classification_report<br />
y_true, y_pred = [], []<br />
<span class="hljs-comment"># fill from dataset + model prediction</span><br />
cm = confusion_matrix(y_true, y_pred)<br />
</code></div>
</div>
<p data-start="7498" data-end="7521">

Nếu low performance

  • Kiểm tra data leakage (ảnh test xuất hiện trong train)

  • Kiểm tra augmentation quá mạnh làm mất đặc trưng

  • Thêm ảnh thực tế, giảm overfitting (dropout, weight decay)

  • Thử backbone mạnh hơn hoặc tăng dataset


9) Tối ưu model & export

Export format

  • For Flask local: save Keras .h5 or SavedModel.

  • For mobile: convert to TFLite.

  • For cross-platform: ONNX.

Export Keras .h5

model.save("model/plant_model.h5")

Convert to TFLite (float16 quant)

</div>
</div>
</div>
<div class="overflow-y-auto p-4" dir="ltr"><code class="whitespace-pre! language-python"><span class="hljs-keyword">import</span> tensorflow <span class="hljs-keyword">as</span> tf<br />
converter = tf.lite.TFLiteConverter.from_keras_model(model)<br />
converter.optimizations = [tf.lite.Optimize.DEFAULT]<br />
converter.target_spec.supported_types = [tf.float16]<br />
tflite_model = converter.convert()<br />
<span class="hljs-built_in">open</span>(<span class="hljs-string">"model/plant_model.tflite"</span>,<span class="hljs-string">"wb"</span>).write(tflite_model)<br />
</code></div>
</div>
<p data-start="8308" data-end="8335">

Validate exported model

  • Run sample inference on exported model and compare outputs to original model (sanity check).


10) Xây dựng API phục vụ model (Flask / FastAPI)

Quy tắc

  • Load model một lần khi server start, không load mỗi request.

  • Tiền xử lý và postprocess phải match training pipeline.

Ví dụ Flask app (production-ready pattern)

app.py

</p>
<div class="contain-inline-size rounded-2xl corner-superellipse/1.1 relative bg-token-sidebar-surface-primary">
<div class="overflow-y-auto p-4" dir="ltr"><code class="whitespace-pre! language-python"><code class="whitespace-pre! language-python"><span class="hljs-keyword">from</span> flask <span class="hljs-keyword">import</span> Flask, request, jsonify, render_template<br />
<span class="hljs-keyword">import</span> numpy <span class="hljs-keyword">as</span> np<br />
<span class="hljs-keyword">from</span> PIL <span class="hljs-keyword">import</span> Image<br />
<span class="hljs-keyword">import</span> io<br />
<span class="hljs-keyword">import</span> tensorflow <span class="hljs-keyword">as</span> tf</code></code>app = Flask(__name__)<br />
model = tf.keras.models.load_model(<span class="hljs-string">“model/plant_model.h5”</span>)<br />
labels = […] <span class="hljs-comment"># load from JSON</span><code class="whitespace-pre! language-python"><code class="whitespace-pre! language-python"></code></code><span class="hljs-keyword">def</span> <span class="hljs-title function_">preprocess_image</span>(<span class="hljs-params">image_bytes</span>):<br />
img = Image.<span class="hljs-built_in">open</span>(io.BytesIO(image_bytes)).convert(<span class="hljs-string">“RGB”</span>).resize((<span class="hljs-number">224</span>,<span class="hljs-number">224</span>))<br />
arr = np.array(img)/<span class="hljs-number">255.0</span><br />
<span class="hljs-keyword">return</span> np.expand_dims(arr, <span class="hljs-number">0</span>)<code class="whitespace-pre! language-python"><code class="whitespace-pre! language-python"></code></code><span class="hljs-meta">@app.route(<span class="hljs-params"><span class="hljs-string">“/identify”</span></span></span>, methods=[<span class="hljs-string">“POST”</span>])<br />
<span class="hljs-keyword">def</span> <span class="hljs-title function_">identify</span>():<br />
<span class="hljs-keyword">if</span> <span class="hljs-string">‘image’</span> <span class="hljs-keyword">not</span> <span class="hljs-keyword">in</span> request.files:<br />
<span class="hljs-keyword">return</span> jsonify({<span class="hljs-string">“error”</span>:<span class="hljs-string">“no file”</span>}), <span class="hljs-number">400</span><br />
file = request.files[<span class="hljs-string">‘image’</span>].read()<br />
inp = preprocess_image(file)<br />
preds = model.predict(inp)[<span class="hljs-number">0</span>]<br />
idx = <span class="hljs-built_in">int</span>(np.argmax(preds))<br />
<span class="hljs-keyword">return</span> jsonify({<span class="hljs-string">“class”</span>: labels[idx], <span class="hljs-string">“confidence”</span>: <span class="hljs-built_in">float</span>(preds[idx])})<code class="whitespace-pre! language-python"><code class="whitespace-pre! language-python"></code></code><span class="hljs-keyword">if</span> __name__==<span class="hljs-string">“__main__”</span>:<br />
app.run(host=<span class="hljs-string">“0.0.0.0”</span>, port=786<span class="hljs-number">0</span>)

FastAPI (nếu cần async + docs)
  • FastAPI tự động tạo OpenAPI docs, tốt khi phát triển API cho front-end.


11) Đóng gói & triển khai (Docker, VPS, K8s)

Dockerfile mẫu (Flask + Keras)

</div>
</div>
</div>
<div class="overflow-y-auto p-4" dir="ltr"><code class="whitespace-pre! language-dockerfile">FROM python:3.11-slim<br />
WORKDIR /app<br />
COPY requirements.txt .<br />
RUN pip install --no-cache-dir -r requirements.txt<br />
COPY . .<br />
ENV FLASK_ENV=production<br />
CMD ["gunicorn", "-w", "2", "-b", "0.0.0.0:5000", "app:app"]<br />
</code></div>
<div dir="ltr">

Docker Compose (nginx + app)

  • Nginx làm reverse proxy, static files, TLS.

  • App chạy gunicorn 2-4 workers.

Deployment options

  • VPS (Ubuntu) + Docker Compose

  • Cloud VM (DigitalOcean, AWS EC2)

  • Container service (AWS ECS, GCP Cloud Run)

  • Kubernetes (GKE/EKS) cho scale lớn

Healthcheck

  • Endpoint /healthz trả 200 OK khi model load thành công.


12) CI/CD, monitoring, model versioning

CI/CD

  • GitHub Actions / GitLab CI để:

    • Linting, unit tests

    • Build Docker image → push to registry

    • Deploy to staging → run smoke tests

  • Workflow mẫu: push => build image => run tests => deploy to server.

Model registry & versioning

  • Lưu model artifacts trên S3 / MinIO hoặc DVC.

  • Store metadata: model_id, version, training config, metrics.

  • Use MLflow/W&B for tracking experiments.

Monitoring

  • Log every request (input hash, prediction, latency).

  • Use Prometheus + Grafana to monitor latency/throughput.

  • Model drift: monitor accuracy on “golden test set” overtime.

  • Alert when latency or error rate spikes.


13) Privacy / Ethics / Legal checklist

  • Thông báo rõ khi lưu ảnh user (privacy policy).

  • Nếu thu location, cần opt-in.

  • Xem license ảnh thu thập từ internet.

  • Nếu dùng model để nhận diện loài hiếm, cân nhắc bảo mật thông tin.


14) Checklist triển khai cuối cùng (pre-release)

  • Unit tests cho pipeline (preprocess, predict wrapper)

  • Integration tests (curl requests)

  • Smoke tests post-deploy

  • Model audit: confusion matrix + per-class metrics

  • Monitoring & alerting setup

  • Backup & rollback plan

  • Documentation (README + API docs)

  • Docker image scanned for vulnerabilities


15) Hình ảnh / Sơ đồ: cách tạo & mã để sinh

1) Flowchart (Graphviz)

Bạn có thể tạo file flow.dot:

</div>
</div>
</div>
<div class="overflow-y-auto p-4" dir="ltr"><code class="whitespace-pre! language-dot">digraph G {<br />
rankdir=LR;<br />
Data -> Preprocessing -> Training -> Evaluation -> Export -> Serving -> Client;<br />
Feedback -> Data;<br />
}<br />
</code></div>
</div>
<p data-start="11923" data-end="11932">Sinh PNG:</p>
<div class="contain-inline-size rounded-2xl corner-superellipse/1.1 relative bg-token-sidebar-surface-primary">
<div class="sticky top-9">
<div class="absolute end-0 bottom-0 flex h-9 items-center pe-2">
<div class="bg-token-bg-elevated-secondary text-token-text-secondary flex items-center gap-4 rounded-sm px-2 font-sans text-xs"></div>
</div>
</div>
<div class="overflow-y-auto p-4" dir="ltr"><code class="whitespace-pre! language-bash">dot -Tpng flow.dot -o flow.png<br />
</code></div>
<div dir="ltr">

2) Simple architecture diagram using matplotlib (python)

</div>
</div>
</div>
<div class="overflow-y-auto p-4" dir="ltr"><code class="whitespace-pre! language-python"><span class="hljs-keyword">import</span> matplotlib.pyplot <span class="hljs-keyword">as</span> plt<br />
fig, ax = plt.subplots(figsize=(<span class="hljs-number">8</span>,<span class="hljs-number">4</span>))<br />
ax.text(<span class="hljs-number">0.1</span>,<span class="hljs-number">0.6</span>,<span class="hljs-string">"Data\n(Images)"</span>, bbox=<span class="hljs-built_in">dict</span>(boxstyle=<span class="hljs-string">"round"</span>, facecolor=<span class="hljs-string">"lightblue"</span>))<br />
ax.text(<span class="hljs-number">0.35</span>,<span class="hljs-number">0.6</span>,<span class="hljs-string">"Preprocess\nPipeline"</span>, bbox=<span class="hljs-built_in">dict</span>(boxstyle=<span class="hljs-string">"round"</span>, facecolor=<span class="hljs-string">"lightgreen"</span>))<br />
ax.text(<span class="hljs-number">0.6</span>,<span class="hljs-number">0.6</span>,<span class="hljs-string">"Model\n(Train/Export)"</span>, bbox=<span class="hljs-built_in">dict</span>(boxstyle=<span class="hljs-string">"round"</span>, facecolor=<span class="hljs-string">"lightcoral"</span>))<br />
ax.text(<span class="hljs-number">0.85</span>,<span class="hljs-number">0.6</span>,<span class="hljs-string">"Serving\n(Flask)"</span>, bbox=<span class="hljs-built_in">dict</span>(boxstyle=<span class="hljs-string">"round"</span>, facecolor=<span class="hljs-string">"lightgrey"</span>))<br />
ax.arrow(<span class="hljs-number">0.23</span>,<span class="hljs-number">0.6</span>,<span class="hljs-number">0.1</span>,<span class="hljs-number">0</span>, head_width=<span class="hljs-number">0.02</span>)<br />
ax.arrow(<span class="hljs-number">0.48</span>,<span class="hljs-number">0.6</span>,<span class="hljs-number">0.1</span>,<span class="hljs-number">0</span>, head_width=<span class="hljs-number">0.02</span>)<br />
ax.arrow(<span class="hljs-number">0.73</span>,<span class="hljs-number">0.6</span>,<span class="hljs-number">0.08</span>,<span class="hljs-number">0</span>, head_width=<span class="hljs-number">0.02</span>)<br />
ax.axis(<span class="hljs-string">'off'</span>)<br />
plt.savefig(<span class="hljs-string">"arch.png"</span>, dpi=<span class="hljs-number">150</span>)<br />
</code></div>
<div dir="ltr">

3) Confusion matrix heatmap (seaborn)

</div>
</div>
</div>
<div class="overflow-y-auto p-4" dir="ltr"><code class="whitespace-pre! language-python"><span class="hljs-keyword">import</span> seaborn <span class="hljs-keyword">as</span> sns<br />
<span class="hljs-keyword">import</span> matplotlib.pyplot <span class="hljs-keyword">as</span> plt<br />
sns.heatmap(cm, annot=<span class="hljs-literal">True</span>, fmt=<span class="hljs-string">'d'</span>)<br />
plt.xlabel(<span class="hljs-string">'Predicted'</span>); plt.ylabel(<span class="hljs-string">'True'</span>)<br />
plt.savefig(<span class="hljs-string">"confusion.png"</span>)<br />
</code></div>
<div dir="ltr">

16) Ví dụ dự án minh họa: cây nhận diện (quick-repro)

Folder skeleton (local):

plant_project/
dataset/
train.py
model_
utils.py
app/
app.py
model/
plant_model.h5
Dockerfile
requirements.txt
README.md

train.py chứa pipeline training, model_utils.py chứa preprocess + label load.


17) Những lỗi hay gặp và cách debug nhanh

  • Model not found in Flask: kiểm tra path (relative vs absolute). Use os.path.join(os.path.dirname(__file__), 'model', 'plant_model.h5').

  • ModuleNotFoundError trong venv: activate venv, cài packages trong venv.

  • Mismatch preprocess: Ensure inference preprocessing = training preprocessing (resize & normalization).

  • CORS issues: nếu frontend khác origin, enable CORS trong Flask.

  • Performance (slow inference): batch predict, use ONNX Runtime, or quantize TFLite.


18) Tài liệu & nguồn học tham khảo (ngắn gọn)

  • TensorFlow docs (official)

  • Keras API

  • ONNX Runtime docs

  • Weights & Biases / MLflow (experiment tracking)

  • Docker, Gunicorn, Nginx guides


Kết luận (ngắn)

Đây là roadmap toàn diện và từng bước thực tế để bạn triển khai một dự án AI hoàn chỉnh — từ thiết kế dữ liệu, huấn luyện, cho tới deploy và vận hành. Mấu chốt nằm ở dữ liệu chất lượng, pipeline tiền xử lý chuẩn, và một quy trình deploy/monitoring tốt.

Be the first to comment

Leave a Reply

Your email address will not be published.


*