河

河北华网计算机技术有限公司 Tech Stack

Distributed data collection and integration platform for multi-source web data

Software Development 石家庄市, 河北省

河北华网 operates a data infrastructure stack built on Java, Hadoop, Spark, and Flink—with emerging adoption of ClickHouse, Hudi, and SeaTunnel—designed for large-scale web data collection and heterogeneous source integration. The company's active project mix (distributed crawlers, high-availability collection platforms, data governance, API layers) and pain-point focus on multi-source integration and data quality suggest a platform positioned between raw data acquisition and downstream analytics consumption.

Tech Stack 27 technologies

Core StackJava GitHub Linux Python Oracle MySQL Hadoop Kafka Apache Spark Apache Flink ClickHouse Android SQLite Bash Hive HDFS HBase Zookeeper Yarn Hudi SeaTunnel DolphinScheduler SQL Scrapy Pyspider Nutch Jsoup

What 河北华网计算机技术有限公司 Is Building

◆Challenges

Diagnosing data governance issues
Heterogeneous data integration
Multi-source data integration
Improving data quality in key areas

▲Active Projects

Distributed crawler system
High-availability data collection platform
Data governance project implementation
Heterogeneous data integration
Website captcha cracking
Data application system development
Api development
Network data capture platform

Hiring Activity

Minimal6 roles · 0 in 30d

Department

Data

Engineering

Seniority

Mid

Manager

Senior

Company intelligence

Find more companies like 河北华网计算机技术有限公司 by tech stack, pain points and active projects

Get started free

About 河北华网计算机技术有限公司

河北华网 is a software development company headquartered in Shijiazhuang, Hebei Province, China. The company builds distributed systems for web data collection, integration, and governance—serving use cases that require ingesting, unifying, and quality-assuring data from heterogeneous sources. The technical foundation spans open-source big-data tooling (Hadoop, Spark, Flink, Kafka, HBase) alongside web-scraping frameworks (Scrapy, Pyspider, Nutch) and modern data pipeline tools (SeaTunnel, DolphinScheduler). Current operations center on crawler infrastructure, collection platform reliability, data governance implementation, and API surface development.

Headquarters石家庄市, 河北省

Hiring MarketsChina

Frequently Asked Questions

What is 河北华网's tech stack?

Java, Python, Hadoop, Spark, Flink, Kafka, HBase, ClickHouse, Hudi, SeaTunnel, DolphinScheduler, plus web-scraping tools (Scrapy, Pyspider, Nutch). Data storage spans SQLite, MySQL, Oracle, and HDFS.

What projects is 河北华网 actively working on?

Distributed crawler system, high-availability data collection platform, data governance implementation, heterogeneous data integration, data application system development, and network data capture platform.

Is 河北华网 hiring?

Yes, 6 active roles across data and engineering teams, primarily at mid-level and manager seniority. Hiring is currently limited to China.