Join the Largest Global Community in Computing

Engage with other computer engineers, scientists, academia, and industry professionals from all areas of computing and fuel global technological advancements

About IEEE Computer Society

Pioneering Excellence in Computing and Technology

The IEEE Computer Society is the world’s leading organization for computing professionals. We advance the theory, practice, and application of computer and information processing technology through resources, conferences, and publications. Join us to drive innovation and professional growth.

926,000+

Research Articles in the Digital Library

217+

Active Technical Standards

195+

Conferences Worldwide

4,000+

Technical and Networking Events

Follow us!

FacebookTwitterLinkedInInstagramYoutube

From the Blog

Community news, technical analysis, and career advice to keep you informed.

Community Voices
Autonomous Observability: AI Agents That Debug AI
Today’s data engineering teams operate massive, highly distributed systems: data pipelines with tens of interdependent services, real-time ML inference platforms, and pipelines retraining business-critical models around the clock. Manual monitoring, traditional dashboarding, and rule-based alerts have become increasingly impractical. Telemetry grows exponentially, alert fatigue sets in, and systems fail in unexpected cascading ways. The costs of outages—in lost revenue and customer trust—can be extraordinary.Autonomous observability is a paradigm in which AI agents continuously consume, analyze, and act on telemetry and logs. Rather than simply notifying human operators, these agents diagnose, localize, and may even remediate issues automatically. Incidents become opportunities for learning and self-optimization, setting a new standard for AI and data operational excellence.Autonomous Observability: Concept and ArchitectureAt its core, autonomous observability brings together several key agent roles within a unified framework: Metric Agent: Continuously ingests and analyzes a broad array of system, application, and infrastructure metrics—latency, resource utilization, error rates, ML model performance—using advanced anomaly detection algorithms, unsupervised learning, and even LLMs for structured/unstructured data. Root Cause Agent: Leverages distributed tracing, causal inference, and knowledge graphs to map dependencies and flow of operations. When an anomaly arises, this agent builds a ranked hypothesis list of likely sources, correlating symptoms across logs, trace spans, and temporal patterns. Remediation Agent: Receives ranked hypotheses and executes automated or semi-automated mitigations. This can involve restarting failed ETL stages, rolling back model versions, provisioning additional resources, or even opening PRs/merge requests with suggested code/config changes. Human-in-the-loop review workflows ensure safety. Learning & Feedback Loop Agent: Archives each incident, including actions taken, efficacy, and operator feedback. It retrains anomaly detection models and remediation policies, closing the loop for continuous platform improvement. Technical Stack: Metric/tracing: Prometheus, OpenTelemetry; log management: ELK/Datadog. ● ML/graph: scikit-learn, PyTorch, Neo4j. Orchestration: Kubernetes API, Argo Workflows, GitOps for autonomic rollbacks. ● Interfaces:…

Rambabu Bandam

Community Voices
Disaggregating LLM Infrastructure: Solving the Hidden Bottleneck in AI Inference
Large language models (LLMs) are accelerating in capability—but their infrastructure is falling behind. Despite massive advances in generative AI, current serving architectures are inefficient at inference time, especially when forced to handle highly asymmetric compute patterns. Disaggregated inference, the separation of input processing and output generation, offers a hardware-aware architecture that can dramatically improve performance, efficiency, and scalability.Today, most state-of-the-art LLMs like GPT-4, Claude, and Llama rely on monolithic server configurations that struggle to serve diverse AI applications efficiently. This article explores the fundamental inefficiencies of conventional model serving, the technical reasoning behind disaggregation, and how it is reshaping inference performance at cloud scale.The Problem: LLM Inference Isn’t One ThingInference in large language models happens in two computationally distinct phases: Prefill: The model encodes the input prompt: a batch-parallel, compute-heavy task. Decode: The model generates tokens one at a time: a memory-bound, latency-sensitive task. This split leads to radically different hardware requirements. Prefill benefits from high throughput compute (e.g., tensor core-heavy workloads), while decode suffers from irregular memory access patterns, poor batching efficiency, and low GPU utilization. In practical terms, the same GPU might run at 90% utilization during prefill, but only 25–30% during decode wasting energy and compute resources.As IEEE Micro notes, phase-splitting LLM inference lets teams map prefill and decode to the right hardware class, improving throughput and cost.Why Conventional Hardware Doesn’t Fit BothModern GPUs like the NVIDIA A100 and H100 are not designed to optimize both phases simultaneously. The H100's massive compute capabilities offer excellent prefill performance, but decode hits memory bottlenecks. Real-world metrics show decode operations achieving as little as 15–35% utilization of available hardware.This asymmetry creates inefficiencies in cost, power consumption, and latency. Traditional co-located serving, where prefill and decode run on the same device, forces a lowest-common-denominator configuration, leading to overprovisioning of expensive accelerators…

Anat Heilper

Community Voices
Copilot Ergonomics: UI Patterns that Reduce Cognitive Load
AI copilots help people draft, review, and decide. The payoff appears in the last step: the user interface where a person reads a suggestion and chooses what to do next. When that screen hides uncertainty or punishes exploration, progress slows. When it makes the next good action obvious and safe, teams move faster with fewer mistakes. This article offers six patterns you can ship in a month. They lower cognitive load without changing your model stack and they track to established Human–AI guidance. For clarity, UI means user interface and LLM means large language model.1. Confidence Bands That Change BehaviorPeople need a quick sense of how much to rely on a suggestion. Avoid pretend precision such as “87 percent confident.” Use three confidence bands and connect each band to what the interface allows. High confidence can default the focus to Accept and allow a single click. Medium confidence can require a short rationale preview before Accept. Low confidence can allow copy only and require an edit before commit. You can set bands with simple rules such as stronger source overlap raising confidence and policy triggers lowering it. The aim is behavior, not color: the band should change what the user can do so they don’t over-trust. This is consistent with guidance to surface uncertainty in ways that calibrate reliance (see Microsoft’s Guidelines for Human–AI Interaction).A practical check is task latency. Acceptance should be faster for high confidence than for low. If times don’t separate, the bands aren’t helping decisions.2. Two Alternatives by DefaultOne option invites over-trust. Offer two compact alternatives side by side and make them meaningfully different. You can vary the prompt, the retrieval scope, or the tone. Keep the layout identical and include a short note on how they differ. Provide an easy keyboard toggle so comparisons are…

Kostakis Bouzoukas

375,000+

Community Members

12,000+

Volunteers

1,000+

Chapters

157+

Countries Represented

Create More Connections

Find awards, volunteer opportunities, and educational courses to propel your career forward.

Recognize Excellence

Discover prestigious awards that acknowledge outstanding achievements in your field. Gain the recognition you deserve and elevate your professional profile.

Award Nominations →

Give Back, Grow Forward

Engage in meaningful volunteer opportunities that make a difference. Enhance your skills, expand your network, and contribute to causes you care about.

Volunteering Opportunities →

Learn and Lead

Explore a variety of educational courses designed to boost your knowledge and expertise. Get ahead in your career with cutting-edge learning opportunities.

Education Courses →

IEEE Computer Society Publications

Explore the forefront of technology with IEEE Computer Society’s leading journals and magazines, providing in-depth research, expert analysis, and innovative insights across a diverse range of computing and engineering fields.

Magazines

Journals

Access research and network today!

Become a member

Students watch a virtual presentation in a classroom with an event banner on the wall

Broaden Participation

Fostering a Culture of Belonging

At the Computer Society, we are committed to creating an environment where everyone feels valued and respected. Our initiatives to broaden participation aim to promote opportunities, celebrate diverse perspectives, and cultivate a supportive community. Join us in our mission to make the Computer Society a place where all voices are heard and everyone has the chance to thrive.

Latest Report

Insights and Updates from Our Latest Research

Stay informed with the latest findings and developments from our recent report. Dive into comprehensive analyses, data-driven insights, and innovative solutions shaping the future of technology. Read the Latest Reports →

“Computer scientists and engineers now support nearly every global industry, expanding our ability to positively impact the future of our world”

Jyotika Athavale – 2024 IEEE Computer Society President
Preparing for a Career as an AI Developer

Sign Up To Our Newsletter

Sign up to read about Computer Society upcoming events, Webinars, Call for Papers and more… delivered to your inbox.

Meet Our Corporate Partners

The Computer Society is proud to have the support of leading companies to build and empower technical innovation.

Advantestlogo: Applelogo: AWSQ-CTRLQuantinuumQuantum MachinesStart Train logoSuperQAdvantestlogo: Applelogo: AWSQ-CTRLQuantinuumQuantum MachinesStart Train logoSuperQ