Table of Contents
- About
- Publications
- Camera Calibration Expertise
- Hardware & Platforms
- Products & Tools
- Open Source
- Technical Content
- CV Coaching Roadmap
- Courses
- Workshops & Events
About
Dr. Farshid Pirahansiah — Computer Vision expert with 10+ years R&D. 3 AI patents, 141+ citations (h-index 7), Springer book chapter author. Specializes in real-time image processing, edge AI across Jetson, Raspberry Pi, Hailo, Axelera, ARM. Full-stack CV/DL: model training → fine-tuning → deployment → API integration.
Metrics: 21 publications (3 patents, 2 books, 6 journals, 11 conferences) · h-index 7 · i10-index 5 · LinkedIn 55K+ · Facebook 15K+
Publications
Patents
1. Face Image Augmentation — WO 2021/060971 A1
Generates realistic face images from surveillance video using GANs. Captures faces from multiple angles, augments through data transformations, selects high-quality images for training. Improves recognition in difficult environments with fuzzy logic quality filtering.
2. Advertisement via Facial Analysis — WO 2020/141969 A2
Facial recognition (CNN, GAN) adjusts digital advertisements based on user demographics and emotions. Identifies single/group users, provides customized content without collecting personal data. Uses unique matching mechanism correlating facial features with business goals.
3. Moving Vehicle Detection — WO 2021/107761 A1
Advanced image processing for vehicle detection. Illumination enhancement, Sobel edge detection, geometric noise filtering. Works in poor lighting. Filters noise using geometric features and relationship to key objects.
Books
Camera Calibration & Video Stabilization for Robot Localization
Springer chapter in “Control Engineering in Robotics and Industrial Automation”. Camera calibration framework for robot localization.
Computational Intelligence: Optical Flow for Video Stabilization
Explores augmented optical flow methods for video stabilization in “Computational Intelligence: From Theory to Application”.
OpenCV 5 Ebook
4 chapters: Introduction → Image Basics → Feature Detection → Advanced Topics. Plus “Computer Vision Meets LLM”.
Journals
- Adaptive Image Thresholding Based on PSNR
- Character & Object Recognition via Global Feature Extraction
- PSNR Global Single Fuzzy Threshold
- PSNR Threshold for Image Segmentation
- 3D SLAM: Simultaneous Localization And Mapping Trends And Humanoid Robot Linkages
- Using an Ant Colony Optimization Algorithm for Image Processing
Conference Papers
- 2D vs 3D Map for Environment Movement Objects
- Adaptive Image Segmentation Based on PSNR for License Plate Recognition
- Classification Techniques Using Enhanced Geometrical Topological Feature Analysis
- Camera Calibration for Multi-Modal Robot Vision
- Character Recognition Based on Global Feature
- Comparison of Single Thresholding Method for Handwritten Images Segmentation
- License Plate Recognition with Multi-Threshold Based on Entropy
- Multi-threshold Approach for License Plate Recognition System
- Pattern Image Significance for Camera Calibration
- TafreshGrid: Grid Computing at Tafresh University
- Computer Vision Meets LLM
Keynotes
- LLMs Meet Computer Vision
Camera Calibration Expertise
Expert across single-camera and multi-camera systems:
- Standard RGB cameras — common imaging tasks
- High-resolution cameras — precision industrial imaging
- Depth cameras — stereo vision, 3D reconstruction
- Infrared cameras — thermal, night vision
- IoT camera systems — real-time monitoring, smart environments
- Robotic vision — autonomous navigation, industrial robotics
- Medical imaging — precise calibration for surgical tools
Techniques: Fixed patterns (chessboard), dynamic automated calibration for real-time/mobile platforms. Works with robotics, IoT, medical technology, industrial automation.
Hardware & Platforms
AI Accelerators
- Axelera AI M2 — Metis AIPU on Raspberry Pi 5, M.2 inference card
- Hailo-15 SBC — AI Vision Processor, Yocto Linux, full BSP
- FPGA Xilinx Kria KV260 — Zynq UltraScale+ Vision AI Starter Kit
- Intel Neural Compute Stick 2 — portable AI inference
- OpenCV AI Kit — integrated AI vision + depth sensing
- Google Coral (TPU) — on-device ML, low-latency
- Nvidia Jetson Nano — edge AI, accelerated vision
- Nvidia GPU (RTX 1080–5090) — high-performance training
Edge Devices
- Raspberry Pi 3, 4, 5 — edge computing, low-power
- ARM platforms — mobile CV
- RISC-V chipsets — open-source scalable
Platforms
- ARM — low-power mobile CV
- Apple Silicon — CoreML, MLX, Metal workflows
- x86-64 — large-scale training
OS
- Linux (preferred for CV), Windows, macOS
Products & Tools
AI Model Cost Calculator
Calculates text and image processing costs for GPT-4 Turbo, Gemini 1.5 Pro, Claude 3 Opus with real-time pricing estimates.
Real-time OpenCV GUI
PyQt5-based function tester. Apply OpenCV functions on images with safe code execution, undo functionality. For learning and prototyping.
3D Camera Calibration
Calibration tools and demos for single and multi-camera systems.
AI Todo List Telegram Mini App
IndexedDB persistence, multi-view calendar (day/week/month/year), cross-device compatible. Telegram Bot + Mini App integration.
Telegram Bots
- @pirahansiahbot — Fine-tuned GPT-4 Mini on AWS Lambda for CV queries. Custom dataset, hyperparameter tuning, serverless deployment.
- @image_processing_farshid_bot — Send images, apply OpenCV functions (Canny, etc.), get instant results. Payment via TON/stars.
- @item2cook_bot — Photo to pencil sketch transformer.
Custom ChatGPTs
- CV Developer — Python, OpenCV expertise
- MLOps & DevOps — pipeline optimization
- Career Companion — CV enhancement, interview prep
- German TutorBot — text correction, translations
- Simpli3D Creator — image-to-3D conversion
- Image Inspirer — creative image generation
VSCode Extensions Pack
Essential tools for CV, ML, LLM, PKM: Better Comments, Prettier, Python, Jupyter, Docker.
Open Source
OpenCV NuGet Packages
Static OpenCV 5 library for Visual Studio. Install via NuGet Package Manager in minutes.
- VS2019:
Install-Package OpenCV5_StaticLib_VS2019_NuGet - VS2022:
Install-Package OpenCV5_StaticLib_VS22_NuGet
Static opencv make: 200KB → 18MB, no DLL needed.
cvTest — Computer Vision Testing Framework
Unit, integration, system, and acceptance tests for CV/DL. Tests processing time, memory, CPU usage. Output validation via PSNR, SSIM, image quality metrics. Hardware-specific benchmarks. Tests: auto brightness adjustment, sharpening kernel effectiveness, FPS measurement, OCR comparison.
opencv-cpp
C++ OpenCV example projects and templates.
Technical Content
CUDA & GPU Programming
CUDA + OpenCV + VSCode (Windows)
Setup for CUDA C++ development in VS Code:
tasks.json — Build task using nvcc with MSVC include/lib paths. Compiles main.cu → main.exe.
settings.json — Associates .cu files with C++ for syntax highlighting. Uses cmd.exe terminal.
launch.json — Debug config using cppvsdbg. Auto-builds before run, executes in external terminal.
c_cpp_properties.json — IntelliSense with CUDA and MSVC headers. Compiler: nvcc.exe, C++17 standard.
Tips: Use ${env:CUDA_PATH} instead of hardcoding. Add -g for debug symbols. Consider CMake for larger projects.
PyCUDA Kernel Explanation
PyCUDA runs CUDA kernels (C/C++) from Python:
- Import:
pycuda.driver as cuda,pycuda.autoinit - Write kernel as string:
__global__ void add(int *a, int *b, int *result) { int idx = threadIdx.x + blockIdx.x * blockDim.x; result[idx] = a[idx] + b[idx]; } - Compile:
SourceModule(kernel_code)— compiles at runtime - Extract:
mod.get_function("add") - Allocate GPU memory:
cuda.mem_alloc(), copy data withcuda.memcpy_htod() - Run:
add(a_gpu, b_gpu, result_gpu, block=(4,1,1), grid=(1,1)) - Retrieve:
cuda.memcpy_dtoh(result, result_gpu)
Numba JIT Tutorial
@jit(nopython=True) compiles Python to machine code at runtime. Skips Python interpreter entirely.
Without Numba:
def sum_of_squares(arr):
total = 0
for num in arr:
total += num * num
return total
With Numba:
from numba import jit
@jit(nopython=True)
def sum_of_squares_jit(arr):
total = 0
for num in arr:
total += num * num
return total
For 10M numbers: several times faster. Works for factorials, matrix multiplication, any numerical loop.
Optical Flow
Challenges & Solutions
Illumination Variations: Use CLAHE preprocessing, RAFT/PWC-Net deep models, NCC for robust matching.
def robust_motion_estimation(frames):
preprocessed = [cv2.createCLAHE(clipLimit=2.0, tileGridSize=(8,8)).apply(f) for f in frames]
return cv2.calcOpticalFlowFarneback(preprocessed[0], preprocessed[1], None, 0.5, 3, 15, 3, 5, 1.2, 0)
Occlusions: Bilateral filtering, backward-forward flow consistency check, MaskFlowNet.
Fast Motion: Pyramidal Lucas-Kanade, FlowNet2, PWC-Net for large displacements.
Textureless Regions: Farneback dense flow, smoothness constraints, RAFT.
Motion Blur: Wiener filtering, Dual TV-L1 optical flow, deblurring preprocessing.
Real-time: GPU-accelerated (CUDA OpenCV), LiteFlowNet, coarse-to-fine approaches.
Scaling: Downscale + multi-scale refinement, image pyramids.
OpenCV Functions
cv2.calcOpticalFlowFarneback()— dense optical flowcv2.calcOpticalFlowPyrLK()— sparse pyramidal Lucas-Kanadecv2.createCLAHE()— illumination normalizationcv2.cuda::calcOpticalFlowPyrLK()— GPU-accelerated
3D Vision & Multi-Camera
Depth to 3D Point Cloud
Deprojection using camera intrinsics:
def deproject_point(u, v, depth, camera_matrix):
fx, fy = camera_matrix[0,0], camera_matrix[1,1]
cx, cy = camera_matrix[0,2], camera_matrix[1,2]
return np.array([(u-cx)*depth/fx, (v-cy)*depth/fy, depth])
100-Camera Synchronization
Reality check: 100 HD cameras @ 30fps ≈ 6.5 Gbps raw. USB/PCIe bottlenecks. 2-4GB just for buffers.
Recommended architecture (distributed):
- 10 machines × 10 cameras each
- Compressed frames (MJPEG/H.264) over network
- Central machine decodes and displays synced grid
Key tools: ZMQ/gRPC for streaming, FFmpeg for encoding, OpenCV+CUDA for GPU decode.
Best approach: GStreamer with ksvideosrc do-timestamp=true, queue max-size-buffers=1 leaky=2, GPU MJPEG decode, Direct3D11 rendering. End-to-end latency ≤ 40ms.
MF “no buffer” simulation: Flush() before every ReadSample(), MF_SOURCE_READER_IGNORE_CLOCK, overwrite “latest frame only” in global array.
Multi-Camera Transform
def transform_point(point, matrix):
point_homog = np.append(point, 1.0)
transformed = np.dot(matrix, point_homog)
return transformed[:3]
Motion Detection from Point Cloud
Threshold-based: compare recent positions within time window, detect movement > 0.05 units.
Optimization
Deep Learning Optimization
Model: Quantization (INT8/FP16), Pruning, Knowledge Distillation
Hardware: GPU/TPU acceleration, CUDA/cuDNN
Data Loading: Multi-threaded DataLoader, real-time augmentation
Architecture: MobileNet, EfficientNet, ResNet
Inference: ONNX Runtime, TensorRT, OpenVINO
Computer Vision Optimization
Algorithms: YOLO (real-time detection), MobileNet/SqueezeNet (embedded)
Preprocessing: Grayscale conversion, ROI focus, frame skipping
Parallel: Multi-threading, GPU processing via CUDA
Features: ORB, HOG — efficient extraction
Edge: NVIDIA Jetson, TFLite, FPGA/ASIC
Data Optimization
Collection: Diverse sources, balanced classes, high-quality filtering
Preprocessing: Normalization, missing data handling, PCA/t-SNE
Augmentation: Rotation/scaling/cropping (CV), SMOTE (imbalanced), time-series shifts
Underfitting vs Overfitting
Underfitting fix: More layers/features, complex models, more epochs, reduce learning rate
Overfitting fix: L1/L2 regularization, dropout, early stopping, data augmentation, reduce complexity, ensemble methods
RAM Reduction
Attention sinks, mixed-precision training, lower-precision compute, reduce batch size, gradient accumulation, gradient checkpointing, CPU parameter offloading.
Key Libraries
- DL/ML: PyTorch, TensorFlow, Keras, ONNX Runtime, TensorRT, OpenVINO
- CV: OpenCV, Pillow, FFmpeg, GStreamer
- Data: NumPy, Pandas, Albumentations, SMOTE
- Acceleration: Numba, PyCUDA, CuPy, TFLite
- Distributed: Horovod, Dask, Apache Spark
- Tuning: Ray Tune, GridSearchCV, RandomSearchCV
AI & LLM
Orchestrating AI Agents
Multi-agent systems for complex tasks. Components:
- Agents: Autonomous units (single-purpose or general-purpose)
- Orchestrator: Delegates tasks, monitors progress, combines results
- Communication: Message passing, API calls, shared memory
Workflow: Task decomposition → assign to specialized agents → monitor → aggregate results
Benefits: Efficiency (parallelization), scalability, flexibility, improved decision-making
Challenges: Coordination complexity, communication overhead, error handling, resource management
Applications: Research & analysis, content creation, project management
LLM at the Edge (IoT)
- Ultra Low-Power (watch MCUs): TinyML, quantization, pruning, Edge Impulse
- Common Edge (Raspberry Pi 5): ONNX Runtime, TFLite, model distillation
- RISC-V: Custom compiler optimization (TVM), RISC-V ML frameworks
- Nvidia Edge: Jetson platform, CUDA, TensorRT, DeepStream SDK
RAG vs CAG
- RAG: Retrieval-based, up-to-date info, more complex, slower
- CAG: Cache-based, faster responses, simpler, limited to stable data
Emerging LLM Methods
- Transformer²: Self-adaptive weight matrices for real-time task adjustment
- MML (Modular ML): Smaller components, better reasoning, logic-based decisions
- Mosaic: Composite pruning — smaller models without performance loss
CV Coaching Roadmap
1. Fundamentals
Image formation (cameras, lenses, sensors, lighting). Image representation (pixels, RGB/HSV/YCbCr). Sampling & quantization (resolution, bit depth).
2. Image Processing
Filtering (convolution, Gaussian, Sobel, Canny). Thresholding (Otsu, adaptive). Morphology (erosion, dilation). Histograms (equalization). Features (SIFT, SURF, ORB, FAST, Harris).
3. Object Detection & Recognition
Traditional: Haar cascades, HOG+SVM, template matching. Deep Learning: ResNet/VGG/EfficientNet, YOLO/Faster R-CNN/SSD, U-Net/DeepLab, Mask R-CNN.
4. Depth & 3D Vision
Stereo vision (disparity, epipolar). Structure from Motion. Depth sensors (LiDAR, RealSense, Kinect, ToF). SLAM (ORB-SLAM, LSD-SLAM).
5. Camera Calibration
Intrinsic/extrinsic parameters. Homographies, perspective warp. Epipolar geometry (fundamental/essential matrix).
6. Optical Flow & Motion
Dense vs sparse (Lucas-Kanade, Farneback, Horn-Schunck). Background subtraction (MOG2, KNN). Action recognition (pose, LSTM, 3D CNN).
7. Compression
JPEG/PNG (lossy/lossless). H.264/H.265 (video). Depth map compression.
8. Real-Time & Edge AI
Hardware acceleration (CUDA, TensorRT, OpenVINO). Frameworks (TFLite, ONNX Runtime, OpenCV DNN). Embedded (Jetson, Raspberry Pi, FPGAs).
9. Multi-Camera & Sensor Fusion
Camera synchronization. Multi-view geometry (3D reconstruction, triangulation). IMU+camera, LiDAR+camera fusion.
10. Applications
Autonomous vehicles (lane detection, tracking). Medical imaging (MRI/CT, anomaly detection). Surveillance (face recognition, crowd analysis). AR/VR (pose tracking, spatial mapping).
Tools: Python+OpenCV+NumPy, TensorFlow, PyTorch, scikit-image, SimpleITK, MATLAB.
Courses
- Machine Learning Specialization — ML fundamentals with case studies
- Full Stack Deep Learning — end-to-end DL deployment
- MLOps — ML pipeline operations and monitoring
- ROS — Robot Operating System for automation
- Parallel Programming — GPU and multi-threading techniques
- Modern C++ — C++17/20 for performance-critical systems
- Cloud Native — containerized AI deployment
- IoT Scholarship — IoT fundamentals for edge AI
- TensorFlow Deployment — TF serving and edge deployment
Workshops & Events
- RISC-V — open-source processor architecture
- Edge AI Summit — on-device inference optimization
- Embedded IoT — AI on microcontrollers
- Tesla AI — autonomous driving and vision
- AI Hardware — custom accelerators and NPUs
- OpenVINO — Intel inference optimization toolkit
- Metaverse — XR and spatial computing #farshid #pirahansiah #drfarshidpirahansiah #AI