{"id":14152,"date":"2025-04-28T22:00:00","date_gmt":"2025-04-28T22:00:00","guid":{"rendered":"https:\/\/modernsciences.org\/staging\/4414\/?p=14152"},"modified":"2025-04-16T06:28:36","modified_gmt":"2025-04-16T06:28:36","slug":"ai-alignment-measurement-human-goals-misalignment-research-april-2025","status":"publish","type":"post","link":"https:\/\/modernsciences.org\/staging\/4414\/ai-alignment-measurement-human-goals-misalignment-research-april-2025\/","title":{"rendered":"Getting AIs working toward human goals \u2212 study shows how to measure misalignment"},"content":{"rendered":"\n\n\n<div class=\"theconversation-article-body\">\n    <figure>\n      <img  decoding=\"async\"  src=\"data:image\/png;base64,iVBORw0KGgoAAAANSUhEUgAAAAEAAAABAQMAAAAl21bKAAAAA1BMVEUAAP+KeNJXAAAAAXRSTlMAQObYZgAAAAlwSFlzAAAOxAAADsQBlSsOGwAAAApJREFUCNdjYAAAAAIAAeIhvDMAAAAASUVORK5CYII=\"  class=\" pk-lazyload\"  data-pk-sizes=\"auto\"  data-pk-src=\"https:\/\/images.theconversation.com\/files\/659679\/original\/file-20250403-56-17bcw4.jpg?ixlib=rb-4.1.0&#038;rect=0%2C189%2C3725%2C2091&#038;q=45&#038;auto=format&#038;w=754&#038;fit=clip\" >\n        <figcaption>\n          Self-driving cars are only one example where it\u2019s tricky but critical to align AI and human goals.\n          <span class=\"attribution\"><a class=\"source\" href=\"https:\/\/newsroom.ap.org\/detail\/SelfDrivingCarsSurrealRide\/ac864953bdd8486699d2fc528fcc23f2\/photo\" target=\"_blank\" rel=\"noopener\">AP Photo\/Michael Liedtke<\/a><\/span>\n        <\/figcaption>\n    <\/figure>\n\n  <span><a href=\"https:\/\/theconversation.com\/profiles\/aidan-kierans-2343298\" target=\"_blank\" rel=\"noopener\">Aidan Kierans<\/a>, <em><a href=\"https:\/\/theconversation.com\/institutions\/university-of-connecticut-1342\" target=\"_blank\" rel=\"noopener\">University of Connecticut<\/a><\/em><\/span>\n\n  <p>Ideally, artificial intelligence agents aim to help humans, but what does that mean when humans want conflicting things? My colleagues <a href=\"https:\/\/scholar.google.com\/citations?user=0S_1aNwAAAAJ&amp;hl=en\" target=\"_blank\" rel=\"noopener\">and I<\/a> have come up with a way to measure the alignment of the goals of a group of humans and AI agents.<\/p>\n\n<p>The <a href=\"https:\/\/alignmentsurvey.com\/\" target=\"_blank\" rel=\"noopener\">alignment problem<\/a> \u2013 making sure that AI systems act according to human values \u2013 has become more urgent as <a href=\"https:\/\/doi.org\/10.48550\/arXiv.2503.14499\" target=\"_blank\" rel=\"noopener\">AI capabilities grow exponentially<\/a>. But aligning AI to humanity seems impossible in the real world because everyone has their own priorities. For example, a pedestrian might want a self-driving car to slam on the brakes if an accident seems likely, but a passenger in the car might prefer to swerve.<\/p>\n\n<p>By looking at examples like this, we developed <a href=\"https:\/\/doi.org\/10.48550\/arXiv.2406.04231\" target=\"_blank\" rel=\"noopener\">a score for misalignment<\/a> based on three key factors: the humans and AI agents involved, their specific goals for different issues, and how important each issue is to them. Our model of misalignment is based on a simple insight: A group of humans and AI agents are most aligned when the group\u2019s goals are most compatible.<\/p>\n\n<p>In simulations, we found that misalignment peaks when goals are evenly distributed among agents. This makes sense \u2013 if everyone wants something different, conflict is highest. When most agents share the same goal, misalignment drops.<\/p>\n\n<h2 id=\"why-it-matters\">Why it matters<\/h2>\n\n<p>Most AI safety research treats alignment as an all-or-nothing property. Our framework shows it\u2019s more complex. The same AI can be aligned with humans in one context but misaligned in another.<\/p>\n\n<p>This matters because it helps AI developers be more precise about what they mean by aligned AI. Instead of vague goals, such as align with human values, researchers and developers can talk about specific contexts and roles for AI more clearly. For example, an AI recommender system \u2013 those \u201cyou might like\u201d product suggestions \u2013 that entices someone to make an unnecessary purchase could be aligned with the retailer\u2019s goal of increasing sales but misaligned with the customer\u2019s goal of living within his means.<\/p>\n\n<figure>\n            <iframe loading=\"lazy\" width=\"440\" height=\"260\" src=\"https:\/\/www.youtube.com\/embed\/pGntmcy_HX8?wmode=transparent&amp;start=0\" frameborder=\"0\" allowfullscreen=\"\"><\/iframe>\n            <figcaption><span class=\"caption\">Recommender systems use sophisticated AI technologies to influence consumers, making it all the more important that they aren\u2019t out of alignment with human values.<\/span><\/figcaption>\n          <\/figure>\n\n<p>For policymakers, <a href=\"https:\/\/doi.org\/10.1007\/s00146-025-02181-5\" target=\"_blank\" rel=\"noopener\">evaluation frameworks<\/a> like ours offer a way to measure misalignment in systems that are in use and <a href=\"https:\/\/www.igi-global.com\/chapter\/ai-standards-and-regulations\/365873\" target=\"_blank\" rel=\"noopener\">create standards<\/a> for alignment. For AI developers and safety teams, it provides a framework to <a href=\"https:\/\/doi.org\/10.48550\/arXiv.2502.16320\" target=\"_blank\" rel=\"noopener\">balance competing stakeholder interests<\/a>.<\/p>\n\n<p>For everyone, having a clear understanding of the problem makes people better able to <a href=\"https:\/\/doi.org\/10.1007\/s43681-024-00624-1\" target=\"_blank\" rel=\"noopener\">help solve it<\/a>.<\/p>\n\n<h2 id=\"what-other-research-is-happening\">What other research is happening<\/h2>\n\n<p>To measure alignment, our research assumes we can compare what humans want with what AI wants. Human value data can be collected through surveys, and the field of social choice <a href=\"https:\/\/doi.org\/10.48550\/arXiv.2404.10271\" target=\"_blank\" rel=\"noopener\">offers useful tools<\/a> to interpret it for AI alignment. Unfortunately, learning the goals of AI agents is much harder.<\/p>\n\n<p>Today\u2019s smartest AI systems are large language models, and their <a href=\"https:\/\/theconversation.com\/what-is-a-black-box-a-computer-scientist-explains-what-it-means-when-the-inner-workings-of-ais-are-hidden-203888\" target=\"_blank\" rel=\"noopener\">black box<\/a> nature makes it hard to learn the goals of the AI agents such as ChatGPT that they power. Interpretability research might help by <a href=\"https:\/\/doi.org\/10.48550\/arXiv.2501.15740\" target=\"_blank\" rel=\"noopener\">revealing the models\u2019 inner \u201cthoughts\u201d<\/a>, or researchers could design AI that <a href=\"https:\/\/doi.org\/10.1109\/MIS.2023.3268724\" target=\"_blank\" rel=\"noopener\">thinks transparently to begin with<\/a>. But for now, it\u2019s impossible to know whether an AI system is truly aligned.<\/p>\n\n<h2 id=\"whats-next\">What\u2019s next<\/h2>\n\n<p>For now, we recognize that sometimes goals and preferences <a href=\"https:\/\/doi.org\/10.1007\/s11098-024-02249-w\" target=\"_blank\" rel=\"noopener\">don\u2019t fully reflect what humans want<\/a>. To address trickier scenarios, we are working on approaches for <a href=\"https:\/\/doi.org\/10.1007\/s43681-025-00664-1\" target=\"_blank\" rel=\"noopener\">aligning AI to moral philosophy experts<\/a>.<\/p>\n\n<p>Moving forward, we hope that developers will implement practical tools to measure and improve alignment across diverse human populations.<\/p>\n\n<p><em>The <a href=\"https:\/\/theconversation.com\/us\/topics\/research-brief-83231\" target=\"_blank\" rel=\"noopener\">Research Brief<\/a> is a short take on interesting academic work.<\/em><!-- Below is The Conversation's page counter tag. Please DO NOT REMOVE. --><img  loading=\"lazy\"  decoding=\"async\"  src=\"data:image\/png;base64,iVBORw0KGgoAAAANSUhEUgAAAAEAAAABAQMAAAAl21bKAAAAA1BMVEUAAP+KeNJXAAAAAXRSTlMAQObYZgAAAAlwSFlzAAAOxAAADsQBlSsOGwAAAApJREFUCNdjYAAAAAIAAeIhvDMAAAAASUVORK5CYII=\"  alt=\"The Conversation\"  width=\"1\"  height=\"1\"  style=\"border: none !important; box-shadow: none !important; margin: 0 !important; max-height: 1px !important; max-width: 1px !important; min-height: 1px !important; min-width: 1px !important; opacity: 0 !important; outline: none !important; padding: 0 !important\"  referrerpolicy=\"no-referrer-when-downgrade\"  class=\" pk-lazyload\"  data-pk-sizes=\"auto\"  data-pk-src=\"https:\/\/counter.theconversation.com\/content\/251896\/count.gif?distributor=republish-lightbox-basic\" ><!-- End of code. If you don't see any code above, please get new code from the Advanced tab after you click the republish button. The page counter does not collect any personal data. More info: https:\/\/theconversation.com\/republishing-guidelines --><\/p>\n\n  <p><span><a href=\"https:\/\/theconversation.com\/profiles\/aidan-kierans-2343298\" target=\"_blank\" rel=\"noopener\">Aidan Kierans<\/a>, Ph.D. Student in Computer Science and Engineering, <em><a href=\"https:\/\/theconversation.com\/institutions\/university-of-connecticut-1342\" target=\"_blank\" rel=\"noopener\">University of Connecticut<\/a><\/em><\/span><\/p>\n\n  <p>This article is republished from <a href=\"https:\/\/theconversation.com\" target=\"_blank\" rel=\"noopener\">The Conversation<\/a> under a Creative Commons license. Read the <a href=\"https:\/\/theconversation.com\/getting-ais-working-toward-human-goals-study-shows-how-to-measure-misalignment-251896\" target=\"_blank\" rel=\"noopener\">original article<\/a>.<\/p>\n<\/div>\n\n","protected":false},"excerpt":{"rendered":"Self-driving cars are only one example where it\u2019s tricky but critical to align AI and human goals. AP&hellip;\n","protected":false},"author":1164,"featured_media":14154,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"nf_dc_page":"","fifu_image_url":"https:\/\/upload.wikimedia.org\/wikipedia\/commons\/4\/42\/Artificial-Intelligence.jpg","fifu_image_alt":"","footnotes":""},"categories":[16],"tags":[8384,8408,8406,8399,8396,8405,8395,8394,8386,8389,8383,8402,8398,8404,8385,8400,8388,8392,8393,8401,8391,8407,8387,8403,8397,8390],"class_list":{"0":"post-14152","1":"post","2":"type-post","3":"status-publish","4":"format-standard","5":"has-post-thumbnail","7":"category-tech","8":"tag-ai-alignment","9":"tag-ai-alignment-research","10":"tag-ai-and-human-values","11":"tag-ai-developers-and-alignment","12":"tag-ai-development-for-human-values","13":"tag-ai-ethical-frameworks","14":"tag-ai-goal-measurement","15":"tag-ai-interpretability","16":"tag-ai-misalignment","17":"tag-ai-recommender-systems","18":"tag-ai-safety-research","19":"tag-ai-safety-tools","20":"tag-ai-simulations","21":"tag-ai-systems-and-human-goals","22":"tag-ai-transparency","23":"tag-aligning-ai-with-human-priorities","24":"tag-alignment-problem-in-ai","25":"tag-chatgpt-alignment","26":"tag-competing-interests-in-ai-systems","27":"tag-ethical-ai-systems","28":"tag-human-ai-goal-alignment","29":"tag-large-language-models-and-alignment","30":"tag-misalignment-score","31":"tag-moral-philosophy-and-ai","32":"tag-self-driving-car-ethics","33":"tag-social-choice-theory-in-ai","34":"cs-entry","35":"cs-video-wrap"},"aioseo_notices":[],"_links":{"self":[{"href":"https:\/\/modernsciences.org\/staging\/4414\/wp-json\/wp\/v2\/posts\/14152","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/modernsciences.org\/staging\/4414\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/modernsciences.org\/staging\/4414\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/modernsciences.org\/staging\/4414\/wp-json\/wp\/v2\/users\/1164"}],"replies":[{"embeddable":true,"href":"https:\/\/modernsciences.org\/staging\/4414\/wp-json\/wp\/v2\/comments?post=14152"}],"version-history":[{"count":1,"href":"https:\/\/modernsciences.org\/staging\/4414\/wp-json\/wp\/v2\/posts\/14152\/revisions"}],"predecessor-version":[{"id":14153,"href":"https:\/\/modernsciences.org\/staging\/4414\/wp-json\/wp\/v2\/posts\/14152\/revisions\/14153"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/modernsciences.org\/staging\/4414\/wp-json\/wp\/v2\/media\/14154"}],"wp:attachment":[{"href":"https:\/\/modernsciences.org\/staging\/4414\/wp-json\/wp\/v2\/media?parent=14152"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/modernsciences.org\/staging\/4414\/wp-json\/wp\/v2\/categories?post=14152"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/modernsciences.org\/staging\/4414\/wp-json\/wp\/v2\/tags?post=14152"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}