登入帳戶  | 訂單查詢  | 購物車/收銀台(0) | 在線留言板  | 付款方式  | 聯絡我們  | 運費計算  | 幫助中心 |  加入書簽
會員登入   新用戶註冊
HOME新書上架暢銷書架好書推介特價區會員書架精選月讀2024年度TOP分類閱讀雜誌 香港/國際用戶
最新/最熱/最齊全的簡體書網 品種:超過100萬種書,正品正价,放心網購,悭钱省心 送貨:速遞 / 物流,時效:出貨後2-4日

2025年04月出版新書

2025年03月出版新書

2025年02月出版新書

2025年01月出版新書

2024年12月出版新書

2024年11月出版新書

2024年10月出版新書

2024年09月出版新書

2024年08月出版新書

2024年07月出版新書

2024年06月出版新書

2024年05月出版新書

2024年04月出版新書

2024年03月出版新書

『簡體書』大规模并行处理器程序设计(英文版·原书第4版) [美]胡文美 [美]大卫·B. 柯克 [黎巴嫩]伊扎特·埃尔·哈吉

書城自編碼: 4097760
分類: 簡體書→大陸圖書→計算機/網絡程序設計
作者: [美]胡文美 [美]大卫·B. 柯克[黎巴嫩]伊扎特
國際書號(ISBN): 9787111774716
出版社: 机械工业出版社
出版日期: 2025-03-01

頁數/字數: /
書度/開本: 16开 釘裝: 平装

售價:NT$ 658

我要買

share:

** 我創建的書架 **
未登入.



新書推薦:
戏剧影视表演专业原创小品合集
《 戏剧影视表演专业原创小品合集 》

售價:NT$ 449.0
在游戏中培养自立的孩子
《 在游戏中培养自立的孩子 》

售價:NT$ 230.0
玄学与魏晋士人心态(精)--中华学术·有道
《 玄学与魏晋士人心态(精)--中华学术·有道 》

售價:NT$ 398.0
雪的练习生
《 雪的练习生 》

售價:NT$ 347.0
硅、神经与智能体:人工智能的觉醒
《 硅、神经与智能体:人工智能的觉醒 》

售價:NT$ 398.0
明代粮长制度(增订本)精--梁方仲著作集
《 明代粮长制度(增订本)精--梁方仲著作集 》

售價:NT$ 316.0
机械复制时代的艺术作品:本雅明精选集
《 机械复制时代的艺术作品:本雅明精选集 》

售價:NT$ 398.0
爱因斯坦学习法
《 爱因斯坦学习法 》

售價:NT$ 281.0

編輯推薦:
第4版重要更新:·增加关于CUDA的新内容,包括较新的库,如CUDNN。·新增关于常用并行模式(模板、归约、排序)的章节,并对之前的章节(卷积、直方图、稀疏矩阵、图遍历、深度学习)进行了全面更新。·新增一章专门讨论GPU架构,包含Ampere等新的架构示例。·优化关于问题分解策略和性能方面的讨论,增加新的优化检查清单。
內容簡介:
本书内容简洁、直观、实用,强调计算思维能力和并行编程技巧。本书主要分为四个部分:第 一部分介绍异构并行计算编程的基础概念,包括数据并行化、GPU架构、CUDA编程及程序性能优化方法等内容;第二部分介绍并行模式,包括卷积、模板、并行直方图、归约、前缀和、归并等内容;第三部分介绍高级模式及应用,包括排序、稀疏矩阵计算、图遍历、深度学习、迭代式磁共振成像重建、静电势能图和计算思维等内容;第四部分介绍高级编程实践,包括异构计算集群编程、CUDA动态并行化等内容。本书不仅适合高等院校计算机相关专业的学生学习,也适合并行计算领域的技术人员参考。
關於作者:
胡文美(Wen-mei W. Hwu) NVIDIA公司杰出研究科学家兼高级研究总监。伊利诺伊大学厄巴纳-香槟分校荣休教授,并行计算研究中心首席科学家。他在编译器设计、计算机体系结构、微体系结构和并行计算方面做出了卓越贡献,是IEEE Fellow、ACM Fellow,荣获了包括ACM-IEEE CS Eckert-Mauchly奖、ACM Grace Murray Hopper奖、ACM SIGARCH Maurice Wilkes奖在内的众多奖项。他拥有加州大学伯克利分校计算机科学博士学位。
大卫·B. 柯克(David B. Kirk) 美国国家工程院院士,NVIDIA Fellow,曾任NVIDIA公司首席科学家。2002年,他荣获ACM SIGGRAPH计算机图形成就奖,以表彰其在把高性能计算机图形系统推向大众市场方面做出的杰出贡献。他拥有加州理工学院计算机科学博士学位。
伊扎特·埃尔·哈吉(Izzat El Hajj) 贝鲁特美国大学计算机科学系助理教授。他的研究方向是针对新兴并行处理器和内存技术的应用加速和编程支持,特别是GPU和内存内处理。他拥有伊利诺伊大学厄巴纳-香槟分校电气与计算机工程博士学位。
目錄
Contents
Foreword
Preface 
Acknowledgments 
CHAPTER 1 Introduction 1
1.1 Heterogeneous parallel computing 3
1.2 Why more speed or parallelism 7
1.3 Speeding up real applications 9
1.4 Challenges in parallel programming 11
1.5 Related parallel programming interfaces 13
1.6 Overarching goals 14
1.7 Organization of the book 15
References 19
Part I Fundamental Concepts
CHAPTER 2 Heterogeneous data parallel computing 23
With special contribution from David Luebke
2.1 Data parallelism 23
2.2 CUDA C program structure 27
2.3 A vector addition kernel 28
2.4 Device global memory and data transfer 31
2.5 Kernel functions and threading 35
2.6 Calling kernel functions 40
2.7 Compilation 42
2.8 Summary 43
Exercises 44
References 46
CHAPTER 3 Multidimensional grids and data 47
3.1 Multidimensional grid organization 47
3.2 Mapping threads to multidimensional data 51
3.3 Image blur: a more complex kernel 58
3.4 Matrix multiplication 62
3.5 Summary 66
Exercises 67
CHAPTER 4 Compute architecture and scheduling 69
4.1 Architecture of a modern GPU 70
4.2 Block scheduling 70
4.3 Synchronization and transparent scalability 71
4.4 Warps and SIMD hardware 74
4.5 Control divergence 79
4.6 Warp scheduling and latency tolerance 83
4.7 Resource partitioning and occupancy 85
4.8 Querying device properties 87
4.9 Summary 90
Exercises 90
References 92
CHAPTER 5 Memory architecture and data locality 93
5.1 Importance of memory access efficiency 94
5.2 CUDA memory types 96
5.3 Tiling for reduced memory traffic 103
5.4 A tiled matrix multiplication kernel 107
5.5 Boundary checks 112
5.6 Impact of memory usage on occupancy 115
5.7 Summary 118
Exercises 119
CHAPTER 6 Performance considerations 123
6.1 Memory coalescing 124
6.2 Hiding memory latency 133
6.3 Thread coarsening 138
6.4 A checklist of optimizations 141
6.5 Knowing your computation’s bottleneck 145
6.6 Summary 146
Exercises 146
References 147
Part II Parallel Patterns
CHAPTER 7 Convolution
An introduction to constant memory and caching 151
7.1 Background 152
7.2 Parallel convolution: a basic algorithm 156
7.3 Constant memory and caching 159
7.4 Tiled convolution with halo cells 163
7.5 Tiled convolution using caches for halo cells 168
7.6 Summary 170
Exercises 171
CHAPTER 8 Stencil 173
8.1 Background 174
8.2 Parallel stencil: a basic algorithm 178
8.3 Shared memory tiling for stencil sweep 179
8.4 Thread coarsening 183
8.5 Register tiling 186
8.6 Summary 188
Exercises 188
CHAPTER 9 Parallel histogram 191
9.1 Background 192
9.2 Atomic operations and a basic histogram kernel 194
9.3 Latency and throughput of atomic operations 198
9.4 Privatization 200
9.5 Coarsening 203
9.6 Aggregation 206
9.7 Summary 208
Exercises 209
References 210
CHAPTER 10 Reduction
And minimizing divergence 211
10.1 Background 211
10.2 Reduction trees 213
10.3 A simple reduction kernel 217
10.4 Minimizing control divergence 219
10.5 Minimizing memory divergence 223
10.6 Minimizing global memory accesses
內容試閱
Preface
We are proud to introduce to you the fourth edition of Programming Massively Parallel Processors: A Hands-on Approach.
Mass market computing systems that combine multicore CPUs and many-thread GPUs have brought terascale computing to laptops and exascale computing to clusters. Armed with such computing power, we are at the dawn of the wide-spread use of computational experiments in the science, engineering, medical, and business disciplines. We are also witnessing the wide adoption of GPU computing in key industry vertical markets, such as finance, e-commerce, oil and gas, and manufacturing. Breakthroughs in these disciplines will be achieved by using computational experiments that are of unprecedented levels of scale, accuracy, safety, controllability, and observability. This book provides a critical ingredient for this vision: teaching parallel programming to millions of graduate and under-graduate students so that computational thinking and parallel programming skills will become as pervasive as calculus skills.
The primary target audience of this book consists of graduate and undergradu-ate students in all science and engineering disciplines in which computational thinking and parallel programming skills are needed to achieve breakthroughs. The book has also been used successfully by industry professional developers who need to refresh their parallel computing skills and keep up to date with ever-increasing speed of technology evolution. These professional developers work in fields such as machine learning, network security, autonomous vehicles, computa-tional financing, data analytics, cognitive computing, mechanical engineering, civil engineering, electrical engineering, bioengineering, physics, chemistry, astronomy, and geography, and they use computation to advance their fields. Thus these developers are both experts in their domains and programmers. The book takes the approach of teaching parallel programming by building up an intu-itive understanding of the techniques. We assume that the reader has at least some basic C programming experience. We use CUDA C, a parallel programming environment that is supported on NVIDIA GPUs. There are more than 1 billion of these processors in the hands of consumers and professionals, and more than 400,000 programmers are actively using CUDA. The applications that you will develop as part of your learning experience will be runnable by a very large user community.
Since the third edition came out in 2016, we have received numerous com-ments from our readers and instructors. Many of them told us about the existing features they value. Others gave us ideas about how we should expand the book’s contents to make it even more valuable. Furthermore, the hardware and software for heterogeneous parallel computing have advanced tremendously since 2016. In the hardware arena, three more generations of GPU computing architectures, namely, Volta, Turing, and Ampere, have been introduced since the third edition. In the software domain, CUDA 9 through CUDA 11 have allowed programmers to access new hardware and system features. New algorithms have also been developed. Accordingly, we added four new chapters and rewrote a substantial number of the existing chapters.
The four newly added chapters include one new foundational chapter, namely, Chapter 4 (Compute Architecture and Scheduling), and three new parallel patterns and applications chapters: Chapter 8 (Stencil), Chapter 10 (Reduction and Minimizing Divergence), and Chapter 13 (Sorting). Our motivation for adding these chapters is as follows:
.
Chapter 4 (Compute Architecture and Scheduling): In the previous edition the discussions on architecture and scheduling considerations were scattered across multiple chapters. In this edition, Chapter 4 consolidates these discussions into one focused chapter that serves as a centralized reference for readers who are particularly interested in this topic.
.
Chapter 8 (Stencil): In the previous edition the stencil pat

 

 

書城介紹  | 合作申請 | 索要書目  | 新手入門 | 聯絡方式  | 幫助中心 | 找書說明  | 送貨方式 | 付款方式 台灣用户 | 香港/海外用户
megBook.com.tw
Copyright (C) 2013 - 2025 (香港)大書城有限公司 All Rights Reserved.