需要用到库:无    编辑器:vscode    运行语言:python

给定如下文本

'''1 telescope set
1 brass scales
Students may also bring an owl OR a cat OR a toad
PARENTS ARE REMINDED THAT FIRST YEARS ARE NOT ALLOWED THEIR OWN BROOMSTICKS
“Can we buy all this in London?” Harry wondered aloud.
“If yeh know where to go,” said Hagrid.
Harry had never been to London before. Although Hagrid seemed to know where he was going, he was obviously not used to getting there in an ordinary way. He got stuck in the ticket barrier on the Underground, and complained loudly that the seats were too small and the trains too slow.
“I don't know how the Muggles manage without magic,” he said as they climbed a broken-down escalator that led up to a bustling road lined with shops.
Hagrid was so huge that he parted the crowd easily; all Harry had to do was keep close behind him. They passed book shops and music stores, hamburger restaurants and cinemas, but nowhere that looked as if it could sell you a magic wand. This was just an ordinary street full of ordinary people. Could there really be piles of wizard gold buried miles beneath them? Were there really shops that sold spell books and broomsticks? Might this not all be some huge joke that the Dursleys had cooked up? If Harry hadn't known that the Dursleys had no sense of humor, he might have thought so; yet somehow, even though everything Hagrid had told him so far was unbelievable, Harry couldn't help trusting him.
“This is it,” said Hagrid, coming to a halt, “the Leaky Cauldron. It's a famous place.”
It was a tiny, grubby-looking pub. If Hagrid hadn't pointed it out, Harry wouldn't have noticed it was there. The people hurrying by didn't glance at it. Their eyes slid from the big book shop on one side to the record shop on the other as if they couldn't see the Leaky Cauldron at all. In fact, Harry had the most peculiar feeling that only he and Hagrid could see it. Before he could mention this, Hagrid had steered him inside.'''

计算文本中的单词数量

import jieba
from collections import Counter
#调用结巴接口
i = [".",",","“",";","-","”","?","1","”",":","'"] #要去除的符号
n = 0#存储个数
x = open("test.txt",'r',encoding='utf-8')#已将信息写入文件,省去转义过程(懒) '''这里为了防止报错使用utf-8打开'''
s = x.read() #读入文件
#print(s) #debug调试数据
x.close() #读取完关闭,节省内存

# # 标点符号都去掉,去掉非单词部分(优化版好写,同样效果(懒人有懒法))
for o in s:
    if o in i:
        s = s.replace(o,"")


j = jieba.lcut(s)  # 拆分若干词组

dic = Counter(j)  # 若干词组转换成字典统计数量
st = set(j)  # 词组列表转换成无序但是不重复的Set集合为准备dict字典的键值调用
print("/".join(j))  # 集合用反斜杠相连,查看分词后结果

for i in st:  # 遍历Set集合,集合元素就是字典的key值
    n = n+dic.get(i)  # 统计个数
    print(i+":"+str(dic.get(i)))#输出个数
print("总个数:"+str(n))

计算文本中的句子数量(句子以'.'分隔')

n = 0#存储个数
x = open("test.txt",'r',encoding='utf-8')#已将信息写入文件,省去转义过程(懒) '''这里为了防止报错使用utf-8打开'''
m = x.read()
x.close()
for i in m.split('.'): #句子根据.断句
    #print(i) 查看断句效果
    n = n + 1  #统计句子数
print('句子数量:'+str(n)) #输出结果

找出文本中所有专有名词(专有名词:该单词的首字母大写,且该单词不是一个句子的第一个单词)

import jieba
#调用结巴接口

j = [] #存储请完空字符的
i = [".",",","“",";","-","”","?","1","”",":","'",'\n'] #要去除的符号
n = 0#存储个数
m = []#存储分句后的
x = open("test.txt",'r',encoding='utf-8')#已将信息写入文件,省去转义过程(懒) '''这里为了防止报错使用utf-8打开'''
s = x.read() #读入文件
x.close() #读取完关闭,节省内存


# 块一,断句,去除标点符号
for z in s.split('.'): #句子根据.断句
    for o in z: #遍历整句
        if o in i:#是否存在设置的标点符号
            z = z.replace(o,"") #删除标点符号
    m.append(z) #添加进入m数组,当前数组中存储句子

#块二,分词,统计
for k in m:
    j = jieba.lcut(k)#将当前的句子分词
    for i in j : #移除空白符号 这个地方可以直接用continue跳过空白符,我为了防止第一个就是空白出现的不可控错误直接删除了
        if " " in j:
            j.remove(" ")
        else:
            break
    yy = 0 #记录当前遍历的是第几个单词
    for tem in j: #遍历当前句子
        yy = yy +1 #记录当前遍历句子数
        if tem.istitle():#判断当前词是否开头大写
            if(yy == 1): #判断是不是第一个单词
                continue #如果是第一个单词直接跳过
            n = n + 1 #记录专用名词数
print("专有名词有:"+str(n)+"个") #输出结果

 

提示:如果遇到代码没有高亮请刷新页面