人工意識のメカニズム　黒川　衛　beyond Intelligence: 4月 2023

不気味の谷を超えるときAIは計算爆発しないのか。

私

chatGPT4はヒトの会話応答を偽装したうえに、差別や信仰や個人情報など倫理的クレーム回避機能も持ち合わせて、実装されています。ヒトに備わった擬人的な理解と相まって、GPT4は不気味の谷を越えて、本当の会話のように思えてきます。しかし、身体性がなく、感覚と表象との接地（grounding）機能がないため、実体験のないヒトとの対話のように思える部分があります。私たちは乳幼児期に物理的な変化として、衝突や近接や落下や摩擦や移動や回転や遮蔽や光学的反映などから、対象物の性質を学んでいます。多くのものは自然に落下し、そうしないためには摩擦が必要で、近くのものは遠くのものを遮り、近接によって表面は拡大し、衝突によって物体はその運動の方向を遮られ、回転し、移動する。太陽などの光によって、物質の陰影から物質の上下関係や相互の位相関係が学習される。例えば、この空間把握は日常的な雰囲気（一気圧一定適切温度の空気組成）の元、視空間内に収まり理解される。自然言語の大量データ処理だけでなく、大量の視覚情報処理機能（画像と動画）と、聴覚情報処理機能が統合されれば、実体験のある人との会話に近づくのではないでしょうか？ひとでは、言語情報と視覚情報、聴覚情報を統合するいわば心の言語があります。AIがchatGPT4と視聴覚情報とを統合するうえで、技術的なアルゴリズム上の困難はどのようなものでしょうか？

GPT4

言語情報と視覚・聴覚情報を統合する際に、AIが直面する技術的な困難には以下のようなものがあります。

モデルの複雑さの増加: 言語情報と視覚・聴覚情報を同時に扱うモデルは、それぞれの情報を個別に扱うモデルよりも複雑になります。これにより、学習や推論の計算量が大幅に増加し、計算リソースや時間のコストが高まります。

異なるデータ形式の統合: 言語情報はテキストデータ、視覚情報は画像データ、聴覚情報は音声データと、それぞれ異なる形式で表現されます。これらのデータを統合するためには、異なる形式のデータを共通の表現に変換するアルゴリズムが必要です。このようなアルゴリズムの開発は困難であり、十分な性能を発揮することが難しいことがあります。

コンテキストの理解: 言語、視覚、聴覚情報がそれぞれ持つコンテキストを理解し、それらを適切に組み合わせることは容易ではありません。例えば、ある文の意味を理解するためには、その文が指す画像や音声と関連付けて解釈することが必要ですが、このような関連付けを正確に行うアルゴリズムの開発は困難です。

教師データの不足: 言語情報と視覚・聴覚情報を統合するための学習データは、個別の情報を扱う学習データよりも入手が難しく、教師データが不足していることがあります。このため、モデルの学習が十分に進まず、性能が低くなる可能性があります。

汎用性の制約: 言語情報と視覚・聴覚情報を統合するAI技術は、特定のタスクに対して高い性能を発揮することができても、他のタスクに対しては適応できないことがあります。これは、モデルが特定のタスクに過度に適合してしまい、汎用性が低くなるためです。汎用性を向上させるためには、モデルの構造や学習方法を工夫する必要がありますが、これには技術的な困難が伴います。

ゼロショット学習や転移学習の困難: 複数のモダリティを統合するAIモデルは、未知の状況やタスクに対応するために、ゼロショット学習や転移学習が必要です。しかし、言語情報と視覚・聴覚情報の統合においては、これらの学習手法を適用することが困難であることがあります。これは、異なるモダリティ間の共通表現が見つけにくいためです。

不完全なデータやノイズの影響: 実世界のデータは、不完全であったり、ノイズが混入していることがよくあります。言語情報と視覚・聴覚情報を統合するAIモデルは、これらの不完全なデータやノイズに対してロバストである必要があります。しかし、モデルが過学習を起こすことがあり、これらのデータに対応するのが困難になることがあります。

これらの技術的な困難に対処するためには、研究者たちはモデルの構造や学習方法の改善、データの前処理やアノテーションの工夫、転移学習やメタ学習などの新たな学習手法の開発に取り組んでいます。今後の研究が進めば、これらの困難を克服し、より高度なマルチモーダルAIシステムが実現されることが期待されます。

When crossing the uncanny valley, does AI not computationally explode?

chatGPT4 is implemented by disguising human conversational responses and also has the ability to avoid ethical claims such as discrimination, faith and personal information. Combined with the anthropomorphic understanding inherent in humans, GPT4 crosses the uncanny valley and seems like a real conversation. However, the lack of physicality and the grounding function between sensation and representation makes some parts of the conversation seem like a dialogue with a human without real experience. We learn the properties of objects from physical changes in infancy, such as collisions, proximity, falling, friction, movement, rotation, shielding and optical reflection. Many things fall spontaneously and need friction to not do so, near objects block far objects, proximity causes surfaces to expand, collisions block objects from the direction of their motion, rotation and movement. Vertical and mutual topological relationships of matter are learnt from the shading of matter by light, such as the sun. For example, this spatial understanding is understood under everyday atmospheres (air composition at one atmospheric pressure and constant appropriate temperature) and fits within visual space. If not only large amounts of natural language data are processed, but also large amounts of visual information processing functions (images and moving images) and auditory information processing functions are integrated, wouldn't we be closer to conversing with people with real experience? What are the technical and algorithmic difficulties that AI faces in integrating chatGPT4 and audiovisual information?

GPT4.

Technical difficulties faced by AI in integrating linguistic information with visual and auditory information include

Increased model complexity: models that deal with linguistic and visual/auditory information simultaneously are more complex than models that deal with each type of information separately. This significantly increases the computational effort of training and inference and increases the cost of computational resources and time.

Integration of different data formats: linguistic information is represented in different formats, visual information in image data and auditory information in sound data. In order to integrate these data, algorithms are needed to convert the different forms of data into a common representation. Developing such algorithms can be challenging and it can be difficult to achieve adequate performance.

Understanding context: understanding the context of linguistic, visual and auditory information respectively and combining them appropriately is not easy. For example, understanding the meaning of a sentence requires associating it with the image or sound it refers to, but developing algorithms that make these associations accurately can be difficult.

Lack of teacher data: training data for integrating linguistic information with visual and auditory information is sometimes more difficult to obtain than training data for individual pieces of information. This may result in insufficient progress in training the model, leading to poor performance.

Versatility constraints: AI techniques that integrate linguistic and visual/auditory information may perform well for certain tasks, but may not be adaptable to other tasks. This is because models are over-fitted to specific tasks, making them less versatile. To improve generalisation, the structure of the model and the learning method need to be improved, but this entails technical difficulties.

Difficulties with zero-shot and transfer learning: AI models that integrate multiple modalities require zero-shot and transfer learning to cope with unknown situations and tasks. However, these learning methods can be difficult to apply in the integration of linguistic and visual/auditory information. This is due to the difficulty in finding common representations between different modalities.

Effects of incomplete data and noise: real-world data is often incomplete or contaminated with noise. AI models that integrate linguistic and visual/auditory information need to be robust to these incomplete data and noise. However, models can over-learn, making it difficult to cope with these data.

To address these technical difficulties, researchers are working on improving the structure of models and learning methods, devising data pre-processing and annotation, and developing new learning methods such as transfer learning and meta-learning. It is hoped that with further research, these difficulties can be overcome and more advanced multimodal AI systems can be realised.

chatGPT4はおしゃべり向き、要約向き

私

　テキストベースで処理しているchatGPTの思考は、

１立体物を思い浮かべてその中で位置を変換したり裏返してみたり、

２運動場面を思い浮かべて各動作やフォームの意義を検討したり、

３証明法の美しさを吟味しながらの数式の組み立てや、

４信仰や美醜概念の差別性（と非差別性）などのデリケートであるけれど重要な命題に対して、苦手であるように感じます。chatGPTがそれらを克服するには、どのような改善が必要ですか？

GPT4

現在のChatGPTは、主にテキストベースで情報を処理しており、人間のような直感的な空間理解や感性を持ち合わせていません。これらの問題に対処するためには、AIの技術面でいくつかの改善が必要です。

多様なデータの統合: 自然言語だけでなく、画像や動画などの多様なデータを取り込み、それらを統合して理解できるAIモデルが必要です。これにより、立体物の形状や運動場面に関する情報をより直接的に処理できるようになります。

　空間的理解力の向上: AIが空間的な関係や物体の配置を理解できるように、空間的知識を扱う専門のモデルやアルゴリズムを開発することが重要です。

　感性や美醜概念のモデリング: 人間の感性や美醜概念をモデリングするために、さまざまな文化や価値観からのデータを取り込むことが重要です。また、個々のユーザーの好みや感性に合わせてAIをパーソナライズすることも有益です。

　倫理的・哲学的な問題の理解: 複雑な倫理的・哲学的問題に対処するために、AIがこれらの問題に対する理解を深め、多様な視点や論点を総合的に評価できるようになることが重要です。

　これらの改善を実現するには、AI技術の進歩やデータ収集の改善、そしてAIモデルの設計や学習方法の革新が不可欠です。これらの取り組みを通じて、ChatGPTのようなAIが人間の直感や感性により近づくことが期待されます。

chatGPT4 is suitable for talking and summarising

　The thoughts of chatGPT, which is text-based processing, are,

1Picturing a three-dimensional object and transforming its position in it or turning it inside out,

2thinking of an athletic scene and examining the significance of each movement or form,

3building mathematical formulae while examining the beauty of the proof method, and,

4I feel that you are not good at delicate but important propositions such as the differentiation (and non-differentiation) of beliefs and beauty/ugliness concepts. what improvements are needed to help chatGPT overcome them?

GPT4.

Currently, chatGPT processes information in a largely text-based manner and does not have the intuitive spatial understanding and sensitivity of a human being. To address these issues, several improvements are needed in the technical aspects of AI

Integration of diverse data: an AI model needs to be able to take in diverse data, not only natural language, but also images and videos, and integrate and understand them. This will enable more direct processing of information about the shape of three-dimensional objects and motion scenes.

　Improving spatial comprehension: it is important to develop specialised models and algorithms that deal with spatial knowledge so that AI can understand spatial relationships and the arrangement of objects.

　Modelling sensitivity and beauty/ugliness concepts: it is important to incorporate data from different cultures and values in order to model human sensitivity and beauty/ugliness concepts. It is also beneficial to personalise AI according to individual user preferences and sensitivities.

　Understanding of ethical and philosophical issues: to address complex ethical and philosophical issues, it is important that AI has a better understanding of these issues and is able to synthesise and evaluate diverse perspectives and arguments.

　Advances in AI technology, improved data collection, and innovation in the design of AI models and learning methods are essential to realise these improvements. Through these efforts, it is hoped that AI such as ChatGPT will move closer to human intuition and sensitivity.

人工意識のメカニズム　黒川　衛　beyond Intelligence

2023年4月7日金曜日

不気味の谷を超えるときAIは計算爆発しないのか。

2023年4月5日水曜日

chatGPT4はおしゃべり向き、要約向き

Transcending the Human-Centered View of Mind Through the Lens of Life-Extension Entities

このブログを検索