CytoNET

ASP.NET CoreEntity Framework CoreSQLiteRazor PagesBootstrapC#

Project Overview

CytoNET is a sophisticated bioinformatics platform designed to manage and analyze complex protein data for research purposes. Built for a client with extensive biological datasets, the system handles over 21,000 database entries across multiple interconnected biological domains including protein interactions, modifications, small molecules, and tissue distributions. The platform features an intelligent data seeding system that allows researchers to dynamically import new datasets, automatically processing and integrating the data into the appropriate database contexts while maintaining data integrity and relationships.

Key Features

Maps showing protein interactions and modifications across different tissues

Multi-database architecture supporting proteins, interactions, modifications, small molecules, and tissue distributions

Automated CSV data seeding with intelligent file processing and validation

Automatic data cleaning and preprocessing pipeline for imported CSV files

Production-ready deployment with environment-specific database configurations

Technical Challenges

Designing a scalable database schema to handle complex biological relationships across 21,000+ entries

Implementing robust CSV parsing and validation to handle diverse biological data formats

Creating an efficient background seeding system that processes large datasets without impacting application performance

Ensuring data integrity across multiple related database contexts during bulk import operations

Optimizing database queries for complex biological data relationships and large-scale analytics

Technology Stack

ASP.NET Core

Leveraged for building a high-performance web application with robust dependency injection, middleware pipeline, and hosting capabilities optimized for data-intensive bioinformatics operations.

Entity Framework Core

Implemented with multiple DbContexts (Protein, ProteinModification, ProteinInteraction, SmallMolecule, TissueDistribution) using Code First approach with optimized migrations and relationship mapping.

SQLite

Selected for its lightweight nature and excellent performance with read-heavy biological data operations, with separate database files for each biological domain to optimize query performance.

Background Task Processing

Implemented asynchronous data seeding using Task.Run with comprehensive error handling to process large CSV datasets without impacting user experience or causing application timeouts.

CSV Processing Engine

Built a custom CsvCleaner class with intelligent data validation, formatting, and error correction specifically designed for biological data standards and research dataset requirements.

Debugging API

Created comprehensive RESTful debugging endpoints for real-time monitoring of database status, seeding progress, and data integrity validation across all biological contexts.

Development Process

Client Requirements Analysis

Analyzed client needs for managing 21,000 biological database entries, understanding the complex relationships between proteins, interactions, modifications, and tissue distributions.

Database Architecture Design

Designed a multi-context Entity Framework architecture with separate databases for each biological domain while maintaining referential integrity and optimized query performance.

CSV Processing Pipeline Development

Built a robust data import system with automatic CSV cleaning, validation, and transformation capabilities to handle diverse biological data formats from research sources.

Background Processing Implementation

Implemented asynchronous seeding processes to handle large dataset imports without blocking the main application, including comprehensive error handling and progress monitoring.

Debugging and Monitoring System

Created comprehensive debugging endpoints and monitoring tools to track seeding progress, database status, and data integrity across all biological contexts.

Production Deployment and Optimization

Deployed the system with environment-specific configurations, database migration automation, and performance optimizations for handling large-scale biological datasets.

Project Links

Live Demo

Gallery