IOTXING

Python 线程详解

Posted on 2018-05-13 Edited on 2022-03-19 In 技术 Changyan:

线程

概念

线程是程序运行的最小单位，一个标准的线程由线程ID，当前指令指针，寄存器集合和堆栈组成。线程是进程中的一个实体，是被系统独立调度和分派的基本单位，线程不占有系统资源，但是可以和同进程下面的其它线程共享该进程所拥有的资源。一个线程可以生成或者撤销另外一个线程，一个进程中的多个线程可以并发执行。线程之间存在相互制约，所以线程在运行的时候会出现间断的情况，因此线程会有三种基本状态：阻塞，就绪和运行。就绪指线程具备运行的条件，等待处理机。运行表示线程占有处理机正在运行；阻塞表示线程在等待一个事件，逻辑上不可执行。每一个应用程序都至少有一个进程和一个线程，线程是程序中一个单一的程序控制流程。一个进程里面有多个线程工作成为多线程

举例：一个机器人由多个部件构成，例如头部，手部。这些部件由不同的加工厂进行加工生产，而每个加工厂里面又会有多条生产线进行同时生产。这个机器人工厂相当于应用程序，加工厂就类似于进程。生产线就类似与线程。

python中的多线程

import threading import time

def show(i): time.sleep(2) print('thread is %d'%i)

for i in xrange(20): t = threading.Thread(target=show, args=(i,)) t.start() 

thread 0 thread 1 thread 2 thread 3 thread 6 thread 8 thread 5 thread 4 thread 10 thread 9 thread 7 thread 12 thread 11 thread 13 thread 15 thread 17 thread 16 thread 14 thread 18 thread 19

我们创建了20个进程，然后把控制器交给CPU，CPU根据相关算法去进行调度，分片执行指令。

下面是Thread类主要的一些方法：

start()

线程进入就绪状态，等待CPU调度

setName

为线程设置名称

getName

获取线程名称

join()

主线程阻塞，等待子线程结束

Wait until the thread terminates.

This blocks the calling thread until the thread whose join() method is called terminates – either normally or through an unhandled exception or until the optional timeout occurs.

When the timeout argument is present and not None, it should be a floating point number specifying a timeout for the operation in seconds (or fractions thereof). As join() always returns None, you must call isAlive() after join() to decide whether a timeout happened – if the thread is still alive, the join() call timed out.

When the timeout argument is not present or None, the operation will block until the thread terminates.

A thread can be join()ed many times.

join() raises a RuntimeError if an attempt is made to join the current thread as that would cause a deadlock. It is also an error to join() a thread before it has been started and attempts to do so raises the same exception.

上面是join()的定义。大致内容就是主线程会一直挂起，直到等到调用join()方法的线程终止，如果发生异常，或者时间到达的设定的超时时间，也会停止阻塞主进程

这里我们先不适用join来看看结果

import threading
def loop():
    print 'now running thread is %s' % threading.current_thread().name
    n = 0
    while n < 10:
        print 'thread %s ------%d ' % (threading.current_thread().name, n)
        n += 1
    print 'thread  %s ended'%threading.current_thread().name


print('now running thread is %s' % threading.current_thread().name)
t = threading.Thread(target=loop,name='childLoop')
t.start()
print('thread %s ended.' % threading.current_thread().name)

now running thread is MainThread
now running thread is childLoop
thread childLoop ------0
thread childLoop ------1
thread childLoop ------2
thread childLoop ------3
thread childLoop ------4
thread childLoop ------5
thread MainThread ended
thread childLoop ------6
thread childLoop ------7
thread childLoop ------8
thread childLoop ------9

thread  childLoop ended

这里能看到，主线程在执行完毕之后就自动退出了，并没有等到子线程的结束。很多情况下我们需要主线程等待子线程完成任务之后再进行退出，因此这种写法并不能满足我们

下面我们使用join()再看一下

    import threading
    def loop():
        print 'now running thread is %s' % threading.current_thread().name
        n = 0
        while n < 10:
            print 'thread %s ------%d ' % (threading.current_thread().name, n)
            n += 1
        print 'thread  %s ended'%threading.current_thread().name
    
    
    print('now running thread is %s' % threading.current_thread().name)
    t = threading.Thread(target=loop,name='childLoop')
    t.start()
    t.join()
    print('thread %s ended.' % threading.current_thread().name)
    

    now running thread is MainThread
    now running thread is childLoop
    thread childLoop ------0
    thread childLoop ------1
    thread childLoop ------2
    thread childLoop ------3
    thread childLoop ------4
    thread childLoop ------5
    thread childLoop ------6
    thread childLoop ------7
    thread childLoop ------8
    thread childLoop ------9
    thread  childLoop ended
    thread MainThread ended.
```   

> 这里能明显看到，主线程在等到子线程完成之后才开始退出

#### 锁

线程间的资源是共享的，也就是说一个进程下面的线程访问的都是同一个进程的资源。这样就会存在一个问题，如果一个资源同时被多个线程进行操作，很容易就会出现混乱。

我们先来模拟一种场景

iimport threading
import time

result = 0

def changeResutl(n):
    global result
    result += n
    result -= n

def threadWorker(n):
    for i in range(10000):
        changeResutl(n)
for i in range(10):
    t1 = threading.Thread(target=threadWorker,args=(20,))
    t2 = threading.Thread(target=threadWorker,args=(10,))
    t1.start()
    t2.start()
    t1.join()
    t2.join()
    print result

1
2


> 这里我们定义一个全部变量 result并赋予初值为0，我们通过changeResult函数对result进行加减操作，正常来说执行完一次changeResult，result的值应该还是0,我们看下结果


这是执行结果，我们重复运行了十次，只有两次是结果符合预期的。

这是因为我们的多线程在运行的时候，是由系统进行调度的，只要出现的次数够多，结果就可能不是0，会出错

下面我们从头进行分析：

我们写的一句代码在系统运行的时候是会被分成许多句代码进行执行的，就拿最简单的计算来说

result += 1

1 2	系统在运行的时候，会被分成两步执行

tmp = result + 1
result = tmp


由于这个tmp是属于局部变量，而每个线程都有属于自己的tmp

理想情况下，我们上面的计算操作应该是如下的

result = 0
t1:  tmp = result +20   ## tmp = 20
t1:  result = tmp       ## result = 20

t1:  tmp = result -20   ## tmp = 0
t1:  result = tmp       ## result  = 0 

t2:  tmp = result +10   ## tmp = 10
t2:  result = tmp       ## result = 10

t2:  tmp = result -10   ## tmp = 0
t2:  result = tmp       ## result  = 0 

result = 0


这种理想情况下，数据是预期的，线程对资源的操作都是有序进行的

实际情况，线程是由系统调度的，交互进行的

result = 0
t1: tmp = result + 20    ##  tmp = 20

t2: tmp = result +10     ##  tmp = 10
t2: result = tmp         ##  result = 10

t1: result = tmp         ##  result = 20

t2: tmp = result - 10    ##  tmp = 10
t2: result = tmp         ##  result = 10

t1: tmp = result - 20    ##  tmp = 10
t1: result = tmp         ##   result = -10

result = -10

最后的结果就会是错误的内容，因为在改的过程中发生了混乱

因此我们需要保证同时只能有一个线程对资源进行操作，从根本上拒绝这种错误的发生

##### 锁的种类

##### 普通锁(Lock)

> 互斥类型，同一时刻只能有一个线程被执行

result = 0
lock = threading.Lock()

def changeResutl(n):
    global result
    lock.acquire()
    result += n
    result -= n
    lock.release()

    

使用threading.Lock()来获取进程所，然后在函数执行之前先使用acquire获取到锁，在完成之后再使用release释放锁。

threading.Rlock()是可以嵌套的锁。例如如果一个函数嵌套函数，然后调用了acquire，这两个函数都能够获取到这个锁

这样就算再多线程同时同时运行，也能保证同时只能有一个线程能够对资源进行操作。但是这样做性能也会相应的降低，因为变成了单线程操作。而且因为可能会有多个锁的存在，不同的线程含有不同的锁，并且在设法去获取对方的锁，这样可能会导致死锁的出现，导致多个线程的挂起，既不能执行，也不能结束。只能通过操作系统去强制终止

##### 信号(Semaphore)

> 锁队列，同时可以有指定数量的线程获取到锁，超过数量的线程需要等待获取

result = 0
samaphore = threading.BoundedSemaphore(10)
def changeResutl(n):
    global result
    samaphore.acquire()
    result += n
    result -= n
    samaphore.release()


在调用acquire的时候，samaphore首先会检查内部计数器的值，如果大于0，就将其值减一，然后返回锁，如果等于0，将会进行堵塞，然后等待其他函数调用release()来对计数器加一，为了避免同时有多个线程等待解锁，release()调用时只会随机唤醒其中一条线程。如果acquire设置了阻塞时间，则最多阻塞timeout秒，如果超时，会返回False。

##### 事件(event)

> 事件提供三个函数，set，wait和clear。

事件模式中，主要是通过使用一个Flag来控制线程的，使用set方法能将Flag置为True，这样所有调用wait方法的线程就能执行下去了，使用clear方法将Flag置为False，所有调用wait方法的线程将会被阻塞，直到Flag被置为True为止。wait方法提供timeout参数，当等待时间到达timeout设置的时间时，wait会返回False。其它情况下面，要么会立马返回True，要么就会堵塞住，知道Flag被置为True

##### 条件(condition)

> 使线程等待，直到满足某条件时才释放n个进程

cv = threading.Condition()
# Consume one item
cv.acquire()
while not an_item_is_available():
    cv.wait()
get_an_available_item()
cv.release()

# Produce one item
cv.acquire()
make_an_item_available()
cv.notify()
cv.release()

```

上面是一个消费者生产者模型，消费者会获取到锁，然后判断条件是否满足，如果不满足，就一直处于wait状态，如果满足条件，就继续运行，然后释放锁。生产者首先获取到锁，然后执行代码，完成之后调用notify通知处于wait状态的线程，(这里的notify并不是立马释放锁，只是对挂起的线程进行通知，释放还需要等到该线程进行释放操作)，然后生产者释放锁

全局解释器锁（GIL）

Python在解释器层面限制了一个程序在同一时间只能有一个线程被CPU实际执行，因此不管实际开了多少线程，实际都是只有一个再跑，因此有时多线程编程还不如单线程有效率。避免这个问题，可以使用多进程来解决

Nginx动态黑名单

Posted on 2018-05-09 Edited on 2022-03-19 In 技术 Changyan:

最近一台服务器上面，经常有人恶意调用接口，从开始的几s一次到后来的1次/s，简直是在浪费自己的性能以及流量。这里使用fail2ban来对nginx的access日志访问，并且自动使用iptables进行屏蔽相关ip

安装fail2ban

yum install fail2ban

可以直接使用yum进行安装

配置fail2jan

python设计模式实现之观察者模式

Posted on 2018-04-26 Edited on 2022-03-19 In 技术 Changyan:

观察者模式

django开发分布式应用

Posted on 2018-04-02 Edited on 2022-03-19 In 技术 Changyan:

django配合celery开发分布式应用

需要安装的包

pip install -y celery djcelery

工作中常用的一些django扩展包

Posted on 2018-04-02 Edited on 2022-03-19 In 技术 Changyan:

BeautifulSoup

拿来解析html文本的神器，使用起来极其方便

djangorestframework

能够让django实现RestFul风格的插件

djangorestframework_jwt

配合djangorestframework使用的jwt认证工具，只需要几句话就能够将你的项目变为使用jwt认证

django_rest_swagger

用来查看api的工具，可以直接进行接口调试

django-cors-headers

用来解决django跨域问题的

django-crontab

拿来实现django定时任务的插件

django_celery

分布式框架，能够实现异步任务

Python *args与**kwargs比较

Posted on 2018-03-21 Edited on 2022-03-19 In 技术 Changyan:

通过在函数传参的时候，如果不确定要传入的参数，我们可以使用*args或者**kwargs来当做参数，但是这两个之间又存在着一些不同，我们主要是通过几个例子来进行区分先看看类型

In [22]: def showArgs(*args):
    ...:     print type(args)

In [23]: showArgs()
<type 'tuple'>


In [24]: def showKwargs(**kwargs):
    ...:     print type(kwargs)
    ...:     

In [26]: showKwargs()
<type 'dict'>

python range与xrange比较

Posted on 2018-03-20 Edited on 2022-03-19 In 技术 Changyan:

range 与 xrange的区别首先我们看一个测试

import time
def rangeTest():
    startTime = time.time()
    for i in range(1,100000000):
        pass
    stopTime = time.time()
    print 'range'
    print +stopTime-startTime
    print '------'

def xrangeTest():
    startTime = time.time()
    for i in xrange(1,100000000):
        pass
    stopTime = time.time()
    print 'range'
    print +stopTime - startTime
    print '------'

rangeTest()
xrangeTest()

调用钉钉接口填坑记

Posted on 2018-03-16 Edited on 2022-03-19 In 技术 Changyan:

消息加密

msg_encrypt = Base64_Encode( AES_Encrypt[random(16B) + msg_len(4B) + msg + $key] ) AES加密的buf由16个字节的随机字符串、4个字节的msg长度、明文msg和$key组成。其中msg_len为msg的字节数，网络字节序； * $key对于ISV来说，填写对应的suitekey * $key对于普通企业开发，填写企业的Corpid

python实现钉钉消息接口加密

Posted on 2018-03-15 Edited on 2022-03-19 In 技术 Changyan:

最近在接入钉钉消息接口的时候，遇到的不小的挑战，官方的加密与解密sdk并没有python的版本，而且根据官方的文档，感觉有很多坑要踩，为了方便别人，直接把我的加解密模块独立出来，做成一个包供大家使用目前包已经被我上传到pypi服务器上了，可以直接使用

base64换行符引发的血案

Posted on 2018-03-15 Edited on 2022-03-19 In 技术 Changyan:

在使用base64对字符串进行加密的时候，发现每次都会在后面出现几个换行符，而且自己也一直都是加密的结果无法解密出来。通过在网上搜了一下，发现普遍都存在这个情况：base64在超过76个字符的时候，每76个字符会自动加一个换行符，而这个换行符在进行解密的时候也会被算进去。所以需要手动把换行符去掉 encrypt = base64.encodestring(xxxx).replace('\n','') 去掉/n之后，一切都变得正常起来了