🇸🇬 MADE-IN SINGAPORE
★ Headquartered in Singapore
Home/Solutions/AI / ML Inference
AI / ML INFERENCE

Real-time AI at the sensor.

Inference platforms that put machine learning where the data is generated — cameras, radios, sensors — with deterministic latency and predictable power budgets.

Newstart platform visual
The new wave of inference acceleration

Training silicon is not inference silicon.

The GPUs that win on training throughput often lose on real-world inference — too much idle compute at batch-size-one, too much latency jitter, too much wall power for the deployment site. Over the last few years the stack has split, and so has the hardware. These four shifts drive our silicon choices.

Framework × target matrix

What runs where.

Pick your framework. Pick your target. We’ll handle the conversion, quantization and deployment plumbing.

FrameworkFPGA (Kintex/UltraScale+)ZYNQ MPSoCJetson-classx86 accelerator
PyTorch ≥ 1.12ONNX → Vitis AIVitis AITensorRTOpenVINO / TRT
TensorFlow 2.xTF → ONNX → VitisVitis AITF-TRTOpenVINO / TF
ONNXNativeNativeNativeNative
Custom (C / RTL)Hand-tuned HLSHLS + PSCUDA kernelsIntrinsics
Hugging Face modelsCase-by-caseQuantizedTRT-LLMOpenVINO
Three deployment shapes

One team, one toolchain, three form factors.

We pick the silicon around your latency, power and BOM envelope — not the other way round.

📱

Edge SoC

ZYNQ UltraScale+ or Jetson-class modules for translators, smart cameras and portable scanners with tight power budgets (<15 W typical).

Learn more →
💾

PCIe accelerator

Kintex / Alveo-style cards that slot into standard 1U/2U servers for data-center pipelines at line rate (10-100 GbE ingress).

Learn more →

Rack appliance

1U / 2U servers with multi-FPGA fabric for telco, satellite and sensor analytics — turnkey or OEM-branded.

Learn more →
Optimization techniques

What we do to your model.

Most customer models come in as float32 PyTorch checkpoints. These are the four transformations that close the latency and power gap on real silicon.

Deployment workflow

From checkpoint to production.

Every model we deploy goes through the same 5-step flow. Average time from handoff to deployable artifact: 4-8 weeks.

1
Characterize

Profile your model, pick target silicon, set accuracy/latency budgets.

2
Convert

PyTorch / TF → ONNX. Validate graph fidelity bit-accurate.

3
Quantize

PTQ with calibration data; accuracy delta report before commit.

4
Compile

Target-specific compile (Vitis AI / TRT / OpenVINO); micro-benchmarks.

5
Deploy

Ship model + runtime + OTA update mechanism; acceptance tests.

Model families we deploy today

From vision to signal processing.

These are families we’ve shipped to production across customer deployments. Outside this list, most PyTorch / ONNX models land with standard toolchain flow.

Vision

  • YOLOv5 / v8Edge SoC + FPGA
  • EfficientNet, ResNetAll targets
  • Segmentation (U-Net, DeepLab)All targets
  • Multi-stream trackingFPGA accelerator

Speech / NLP

  • Whisper / Whisper-tinyEdge SoC + GPU
  • Conformer ASRFPGA + GPU
  • Custom translation models (K1)On-device NPU
  • Small-LM inferenceGPU + rack accelerator

Signal processing

  • Sensor fusion (Kalman / ML)SoC + GPU
  • Audio / speech inferenceFPGA + SoC
  • Vision pre-processingFPGA native
  • Time-series streamingFPGA native

Other

  • Anomaly detectionAll targets
  • Time-series forecastingGPU + rack accel
  • Custom model portingScoped per engagement
When to engage Newstart

The five conversations we have most often.

You don’t always need a full platform build. Some customers come in with a trained model and an impossible power budget. Others have tried an off-the-shelf PCIe card and run out of headroom. Talk to us at any of these five points.

Deployment targets

Where Newstart inference ships today.

Our customers deploy across a wide geometry — from a surveillance camera on a street corner to a special-purpose appliance inside a telco data center. The three archetypes below cover the majority of engagements.

🔄

Edge deployment

Everything from a surveillance camera to a telephone-pole radio to a networking closet inside an office building. Fanless, thermally bounded, often power-over-Ethernet, always with an OTA path and field-calibration hooks.

Learn more →
🏢

Data-center specialty

Our partners handle the conventional server build-out — we come in for special-purpose devices: inline inference for telco traffic, radar or sensor-fusion analytics appliances, custom PCIe accelerators where no stock card meets the spec.

Learn more →
🔧

Evaluation & dev platforms

For some partners we build and supply the development platforms used to evaluate their silicon or IP — complete boards, firmware, host drivers and a demo stack that reviewers can plug in and run on day one.

Learn more →

Ready to accelerate your next platform?

Talk to our Singapore engineering team about your RF, FPGA/DSP, or AI inference project. We'll help you pick the right silicon and ship on time.