中数科技 operates a data middleware stack built on Java, Kafka, Spark, Flink, and ClickHouse—classical components for ETL and real-time streaming. The tech choices reveal a company focused on processing high-volume, multi-source data pipelines: Hadoop/HDFS for storage, Hudi for incremental updates, SeaTunnel for data movement, and web-scraping tools (Scrapy, Pyspider, Nutch) for ingestion. Active projects center on data collection, distributed crawling, and a middle-platform architecture, with documented pain around ETL performance, crawler efficiency, and handling concurrent loads—typical friction points for early-stage data infrastructure companies.
Notable leadership hires: Marketing Director, Sales Director
中数科技 is a China-based data platform company building infrastructure for data collection, transformation, and supply-chain analytics. The product stack spans web data acquisition (distributed crawlers and schedulers), ETL pipelines (SeaTunnel, DolphinScheduler), and analytics storage (ClickHouse, StarRocks). The team is small (~4 core) but maintains a broad product and engineering focus across data, supply systems, and market expansion. Current operational challenges include optimizing pipeline performance, maintaining data quality at scale, and expanding adoption beyond initial customer segments.
Java, Spring Cloud, Kafka, Apache Spark, Flink, HBase, and ClickHouse for streaming and batch analytics. Web data collection runs on Scrapy, Pyspider, and Nutch with DolphinScheduler orchestration.
Data middleware platform development, distributed crawler and scheduler systems, data quality modules, supply-chain systems, and market expansion. Focus areas include ETL performance optimization and high-concurrency handling.
Other companies in the same industry, closest in size