Our Goal Local AI agents that remain practical

Given the widespread interest in AI agents, as well as the high cost of servers and subscriptions, we develop virtual assistants that can run locally on suitable consumer hardware or on an affordable server. Exact requirements depend on the model, quantization, context size, concurrency, and response-time target. Our priority is customer-controlled deployment and data protection.

eMagicOne What we do

Requirements discovery.
Talk to you to understand the target workflow, privacy constraints, integration needs, and model fit.
Environment and model setup.
Installation of the runtime environment and the selected model on your device or on infrastructure you control.
Agent development.
Standard agents are available, and the agent can be adjusted for your specific use case.
Initial knowledge-base ingestion for RAG.
Gather approved source content, clean and structure it, split it into chunks, attach metadata, generate embeddings, and index it for retrieval. Web scraping is optional and used only when appropriate.
Optional fine-tuning or adapter training.
Used only when the target use case requires additional behavior or style adaptation beyond prompt engineering and RAG.
Custom development.
A basic chat interface is included, with optional API and interface work for integration into your infrastructure.

Talk to us >>

Data Privacy Built around customer-controlled deployment

We made a deliberate decision not to rely on large AI models hosted by third-party companies for the core assistant path. The main inference path can run inside customer-controlled infrastructure, with the model and data stored on hardware you control.

Any optional external services used for telemetry, remote support, OCR, speech, or other auxiliary components should be identified explicitly so the privacy boundary is clear.

Core LLM path
Can remain fully local when deployed that way.
Auxiliary services
Need to be listed separately if any external components are enabled.
Operational benefit
Reduces dependence on third-party LLM APIs for prompts and retrieved internal data.

Technical Characteristics AI Agents in detail

The numbers below are presented as practical deployment guidance, not universal one-line requirements.

AI Models
Mistral 7B7B parameters. Approximate memory depends on runtime and quantization. For example, a 4-bit deployment may place the model itself roughly in the 4.5–6 GB range, while practical working VRAM is often higher once context and KV-cache overhead are included.
Qwen3-8B8.2B parameters. Practical memory depends on quantization and runtime. A quantized deployment may fit roughly in the 5–6.5 GB model range, while total working memory rises with context length and active sessions.
Llama 3.1 8B8B parameters. Use the real model family/version name. A quantized deployment can be directionally similar to other 8B-class models, but working memory still increases once cache and runtime overhead are added.
gpt-oos-20b20B-class deployment. Publish memory together with the runtime assumption. For example, the official MXFP4-quantized setup is described as operating within 16 GB of memory, rather than being summarized by one bare rounded RAM number.

Technical details
HardwareMinimum requirements depend on the selected model and deployment mode. Publish separate guidance for system RAM, GPU VRAM, disk space, expected concurrency, and latency target instead of one flat “starting from 5 GB RAM” line.
SoftwarePyTorch is a runtime dependency, not hardware. It belongs in a software stack section together with supported OS, Python version, and CUDA/runtime notes where relevant. We assist with installation and configuration.
ConnectivityInternet access may be required for installation, model downloads, updates, optional integrations, and remote administration. It is not inherently required for fully offline local inference once everything is already installed locally.
ConcurrencyRequirements also depend on expected concurrency, context size, and latency target. Simultaneous chats should be described as request/session concurrency, not as thread count.

* – Terminology matters for technical credibility: use GB for memory capacity, separate RAM from VRAM, use exact official model names, and spell framework/training terms correctly (PyTorch, fine-tuning).

Trusted by

We are proud of contributing to the success of the world’s leading brands

around the globe Our customers

let’s talk Contact us

Trusted by world’s leading brands

Apart from the daily benefits it offers in terms of time and efficiency, I was particularly impressed by the opportunity it offered to work offline (for example, from a laptop computer on a train or plane). Also, being able to add more than 10 photos for one product in just one click is a great development!
Some retailers use it, above all, to manage their catalog, for example to reduce prices for a category of products by 20% for sales… again with just one click! Others will opt to use it to improve customer relations and to take advantage of its very powerful import/export functions. We are well known as a “difficult project company” but we have only one secret: we discovered PrestaShop Store Manager! In conclusion, in view of its low price and the time it saves traders (about 2 hours a day), it is an absolute must-have!

Bruno Lévêque PrestaShop Co-Founder

Deploy AI inside infrastructure you control

Trusted by

Trusted by world’s leading brands

Deploy AI inside
infrastructure you
control