bert参数

15375357604 / 2023-08-23 / 原文

bert结构：BERT-Base, Uncased: 12-layer, 768-hidden, 12-heads, 110M parameters，词典大小：30522

embeding层：

token embeding:30522*768，获取每个token的初始编码

position embeding: 512*768

type embeding:2*768

layer norm:wight+bias 768*2

self_attention层：

query,key,value:(768*768+768)*3

dense：768*768+768

layer norm:wight+bias 768*2

feed_forward:

两层，先升后降

layer norm:wight+bias 768*2

bert.embeddings.word_embeddings.weight torch.Size([30522, 768]) 参数个数为： 23440896

bert.embeddings.position_embeddings.weight torch.Size([512, 768]) 参数个数为： 393216

bert.embeddings.token_type_embeddings.weight torch.Size([2, 768]) 参数个数为： 1536

bert.embeddings.LayerNorm.weight torch.Size([768]) 参数个数为： 768

bert.embeddings.LayerNorm.bias torch.Size([768]) 参数个数为： 768

bert.encoder.layer.0.attention.self.query.weight torch.Size([768, 768]) 参数个数为： 589824

bert.encoder.layer.0.attention.self.query.bias torch.Size([768]) 参数个数为： 768

bert.encoder.layer.0.attention.self.key.weight torch.Size([768, 768]) 参数个数为： 589824

bert.encoder.layer.0.attention.self.key.bias torch.Size([768]) 参数个数为： 768

bert.encoder.layer.0.attention.self.value.weight torch.Size([768, 768]) 参数个数为： 589824

bert.encoder.layer.0.attention.self.value.bias torch.Size([768]) 参数个数为： 768

bert.encoder.layer.0.attention.output.dense.weight torch.Size([768, 768]) 参数个数为： 589824

bert.encoder.layer.0.attention.output.dense.bias torch.Size([768]) 参数个数为： 768

bert.encoder.layer.0.attention.output.LayerNorm.weight torch.Size([768]) 参数个数为： 768

bert.encoder.layer.0.attention.output.LayerNorm.bias torch.Size([768]) 参数个数为： 768

bert.encoder.layer.0.intermediate.dense.weight torch.Size([3072, 768]) 参数个数为： 2359296

bert.encoder.layer.0.intermediate.dense.bias torch.Size([3072]) 参数个数为： 3072

bert.encoder.layer.0.output.dense.weight torch.Size([768, 3072]) 参数个数为： 2359296

bert.encoder.layer.0.output.dense.bias torch.Size([768]) 参数个数为： 768

bert.encoder.layer.0.output.LayerNorm.weight torch.Size([768]) 参数个数为： 768

bert.encoder.layer.0.output.LayerNorm.bias torch.Size([768]) 参数个数为： 768

bert参数更多相关文章

Redis持久化机制（面试考点）与位图API

NOIP2024模拟赛20 & 11.1 小记

20241101 数据结构与算法期中机试收获

什么是IT技术

即将到来！

舍得-时间-工作是人的一生最重要的事情-自己要有私房钱-人的一生最重要的事情是书写自己的人生

2.TiUP 部署 DM 集群

ubuntu 24.04 部署 mysql 8.4.3 LTS

随机推荐

验证码处理在自动化测试中的应用

一些学科笑话

NOIP2024模拟赛20 & 11.1 小记

20241101 数据结构与算法期中机试收获

Java，启动！

什么是IT技术

即将到来！

2024/11/1日日志关于JavaScript简介&引入方式以及基础语法的学习

舍得-时间-工作是人的一生最重要的事情-自己要有私房钱-人的一生最重要的事情是书写自己的人生

2.TiUP 部署 DM 集群

原型模式的C++实现

python bytecode解析

09-XSS键盘监听、cookie窃取&文件上传绕过

ubuntu 24.04 部署 mysql 8.4.3 LTS

国标GB28181公网平台LiteGBS国标GB28181视频平台建筑工地无线视频联网监控系统方案

imes完工下线

android 13 更改手机信号调整

BFS(Breath First Search 广度优先搜索)

Visual Studio Code（VSCode）中设置中文界面

影响黄金价格大幅波动的因素主要有哪些？

bert参数

bert参数更多相关文章

随机推荐

热门话题