python-爬虫遇到的坑
img src=”data:base64”
这种类型的爬虫第一次做,然后遇到个坑解决了好久,原来是编码问题,我从网页复制下来的可以变成正常的图片,然后用控制台,也就是python弄的怎么都是不行,最后发觉是html编码的问题
数据1,没仔细看的时候没发觉,仔细看后发觉他全是+
1 | data:image/jpeg;base64,/9j/4AAQSkZJRgABAQAAAQABAAD/2wBDAAIBAQEBAQIBAQECAgICAgQDAgICAgUEBAMEBgUGBgYFBgYGBwkIBgcJBwYGCAsICQoKCgoKBggLDAsKDAkKCgr/2wBDAQICAgICAgUDAwUKBwYHCgoKCgoKCgoKCgoKCgoKCgoKCgoKCgoKCgoKCgoKCgoKCgoKCgoKCgoKCgoKCgoKCgr/wAARCAAoAGQDASIAAhEBAxEB/8QAHwAAAQUBAQEBAQEAAAAAAAAAAAECAwQFBgcICQoL/8QAtRAAAgEDAwIEAwUFBAQAAAF9AQIDAAQRBRIhMUEGE1FhByJxFDKBkaEII0KxwRVS0fAkM2JyggkKFhcYGRolJicoKSo0NTY3ODk6Q0RFRkdISUpTVFVWV1hZWmNkZWZnaGlqc3R1dnd4eXqDhIWGh4iJipKTlJWWl5iZmqKjpKWmp6ipqrKztLW2t7i5usLDxMXGx8jJytLT1NXW19jZ2uHi4+Tl5ufo6erx8vP09fb3+Pn6/8QAHwEAAwEBAQEBAQEBAQAAAAAAAAECAwQFBgcICQoL/8QAtREAAgECBAQDBAcFBAQAAQJ3AAECAxEEBSExBhJBUQdhcRMiMoEIFEKRobHBCSMzUvAVYnLRChYkNOEl8RcYGRomJygpKjU2Nzg5OkNERUZHSElKU1RVVldYWVpjZGVmZ2hpanN0dXZ3eHl6goOEhYaHiImKkpOUlZaXmJmaoqOkpaanqKmqsrO0tba3uLm6wsPExcbHyMnK0tPU1dbX2Nna4uPk5ebn6Onq8vP09fb3+Pn6/9oADAMBAAIRAxEAPwD9/CQoLE8DrUVjf2Wp2q3unXcc8L52SxOGVsHBwR7gio9W1OLR9Ol1Oe2uZkhXLR2ls80jc4+VEBZj7AV5UT4E8NeDvFuo/DT4W+NDPFqy6jdaPbpq+mfbrqfYHa3zgFedziEbNyklc81wYvGfVZLWNrNu7d9FfRJSb2d1v2vsB6bJ4t8NxeKovBEmsQjVprJryKwJ/eNArBWkA9AxA+prRrwq30rx3p3xxuNbk+BmoTJa+CI/K1WLxjqE0rs9zIzWqyyusZP7pWMYXeCVJbBAr1D4VXtx4i8AWXiLUfCms6DcavD9putI1q/klurRnH3GZnJQgAfKCNvoDmubL8yqYurKnONneVtJrRWW8opXu+/otwNN/GHhiPxYngWTWoF1eWxa8j09mxI8CsEMgHcBiAfrWlXhfizwnqekfELxx4z8OfDzXbuTw34EMGi6jJrGsPc6neOHuPssJFz+9hyIgfLG7zCRuyMDWttB1nxB8U/DWk+LfCWqLaQ+AkkutVtr7VY1jvXlXdbmRZ9hI2lv3u6UDHz4znCnm2K55QnTV+ays5JW5uXVuHdPbTbvcZ69RXBfs4R68vw287xJoF/pl1Lqt432TUbm9llCecwRs3kskoBUAgbgvPygA4rva9fCV3isLCs1bmSdu1/VJ/ghBRRRXQAUUUUAFFFFAEV9fWWmWkmoaleRW8EKFpZppAqIo6kk8AV8/eEtS+H3xE0z4rWGpaRP4jtfE3jqK40zSdMkIk1S2j0/TkilVgQFt2mt5F85iIzscbj0PuPjbwP4T+I/hq58HeONEh1LS7wKLqyuM7JQGDAHBGRkDis3xz8JvC3jm2tTI95pd7p8ZTTNV0O8e0ubRDjKK8ZGYzhcxtlDtGVOBXiZtgsZjZx5FFxino9XJyjKMk7qySi9LqSk3ZpJatHzZ4c+DvjLTPjP8Q7q78KaX4gvk8KaNLf+GdOu5LRLWOaTUR9ksJAVAKxomXkA8xyzYXdge+fBm+0Lxd+zzoyfCuPUdDtBopsdLi1W2ZbmxeDdBslRuSyPGQT3xkHkGuRH7K3xR0nX9Q8T+Ev2rfEFnfapHDHfXV5oFhcSzJCGESs4jQsF3vj/AHj613ngjwD8Q/DPg19D8Q/Ga/1zVZpMya3d6XbxmJc8iKJBtU46F/MGRkgjivDyHLMTl9eUXh5xi1Uu26bvzTco2kp86dnZprlvZpxtqM8j+Pfh7TPAXhPw/wCDda8CeEvGGovrKTeGPCUGm3Mdze3e/dJcPI1xISi5Mss0oZSQC2WIqL9kzxt481jwzol9r9toN74q1Hw95kGo6zq0yXN/YC5lYIjLCyssTsylRkqChPDLXsmi/B3wz4c/tPWdPM134g1SzeC68RarL591ICDtXdwEjBORHGFQdlFczp/7K3gy++BHh74M+Nby5uZ/DtqosNf0udrS8tLkA5nt5UO6JuT6gjhgRxUyyTM6WaLFUEopQaUbq2jgopyab5nHms7OMLcq3k5F9DqvhNo3i7w94fn0bxT4R8N6MkN/KdNtfDN7LNCbdm3K0nmQxES5J3YBBPzZGcDqa534XeB9Z+HvhKHw1rvxF1jxRPEzH+09c8nzyueFJiRAcDHJyScnPYdFX1+BhKng4RkmmktHy3Xl7vu6baaCCiiiuoAooooAKKKKACiiigAooooAKKKKACiiigAooooAKKKKACiiigD/2Q== |
数据2,从网页复制下来的
1 | data:image/jpeg;base64,/9j/4AAQSkZJRgABAQAAAQABAAD/2wBDAAIBAQEBAQIBAQECAgICAgQDAgICAgUEBAMEBgUGBgYFBgYGBwkIBgcJBwYGCAsICQoKCgoKBggLDAsKDAkKCgr/2wBDAQICAgICAgUDAwUKBwYHCgoKCgoKCgoKCgoKCgoKCgoKCgoKCgoKCgoKCgoKCgoKCgoKCgoKCgoKCgoKCgoKCgr/wAARCAAoAGQDASIAAhEBAxEB/8QAHwAAAQUBAQEBAQEAAAAAAAAAAAECAwQFBgcICQoL/8QAtRAAAgEDAwIEAwUFBAQAAAF9AQIDAAQRBRIhMUEGE1FhByJxFDKBkaEII0KxwRVS0fAkM2JyggkKFhcYGRolJicoKSo0NTY3ODk6Q0RFRkdISUpTVFVWV1hZWmNkZWZnaGlqc3R1dnd4eXqDhIWGh4iJipKTlJWWl5iZmqKjpKWmp6ipqrKztLW2t7i5usLDxMXGx8jJytLT1NXW19jZ2uHi4+Tl5ufo6erx8vP09fb3+Pn6/8QAHwEAAwEBAQEBAQEBAQAAAAAAAAECAwQFBgcICQoL/8QAtREAAgECBAQDBAcFBAQAAQJ3AAECAxEEBSExBhJBUQdhcRMiMoEIFEKRobHBCSMzUvAVYnLRChYkNOEl8RcYGRomJygpKjU2Nzg5OkNERUZHSElKU1RVVldYWVpjZGVmZ2hpanN0dXZ3eHl6goOEhYaHiImKkpOUlZaXmJmaoqOkpaanqKmqsrO0tba3uLm6wsPExcbHyMnK0tPU1dbX2Nna4uPk5ebn6Onq8vP09fb3+Pn6/9oADAMBAAIRAxEAPwD9/CQoLE8DrUVjf2Wp2q3unXcc8L52SxOGVsHBwR7gio9W1OLR9Ol1Oe2uZkhXLR2ls80jc4+VEBZj7AV5UT4E8NeDvFuo/DT4W+NDPFqy6jdaPbpq+mfbrqfYHa3zgFedziEbNyklc81wYvGfVZLWNrNu7d9FfRJSb2d1v2vsB6bJ4t8NxeKovBEmsQjVprJryKwJ/eNArBWkA9AxA+prRrwq30rx3p3xxuNbk+BmoTJa+CI/K1WLxjqE0rs9zIzWqyyusZP7pWMYXeCVJbBAr1D4VXtx4i8AWXiLUfCms6DcavD9putI1q/klurRnH3GZnJQgAfKCNvoDmubL8yqYurKnONneVtJrRWW8opXu+/otwNN/GHhiPxYngWTWoF1eWxa8j09mxI8CsEMgHcBiAfrWlXhfizwnqekfELxx4z8OfDzXbuTw34EMGi6jJrGsPc6neOHuPssJFz+9hyIgfLG7zCRuyMDWttB1nxB8U/DWk+LfCWqLaQ+AkkutVtr7VY1jvXlXdbmRZ9hI2lv3u6UDHz4znCnm2K55QnTV+ays5JW5uXVuHdPbTbvcZ69RXBfs4R68vw287xJoF/pl1Lqt432TUbm9llCecwRs3kskoBUAgbgvPygA4rva9fCV3isLCs1bmSdu1/VJ/ghBRRRXQAUUUUAFFFFAEV9fWWmWkmoaleRW8EKFpZppAqIo6kk8AV8/eEtS+H3xE0z4rWGpaRP4jtfE3jqK40zSdMkIk1S2j0/TkilVgQFt2mt5F85iIzscbj0PuPjbwP4T+I/hq58HeONEh1LS7wKLqyuM7JQGDAHBGRkDis3xz8JvC3jm2tTI95pd7p8ZTTNV0O8e0ubRDjKK8ZGYzhcxtlDtGVOBXiZtgsZjZx5FFxino9XJyjKMk7qySi9LqSk3ZpJatHzZ4c+DvjLTPjP8Q7q78KaX4gvk8KaNLf+GdOu5LRLWOaTUR9ksJAVAKxomXkA8xyzYXdge+fBm+0Lxd+zzoyfCuPUdDtBopsdLi1W2ZbmxeDdBslRuSyPGQT3xkHkGuRH7K3xR0nX9Q8T+Ev2rfEFnfapHDHfXV5oFhcSzJCGESs4jQsF3vj/AHj613ngjwD8Q/DPg19D8Q/Ga/1zVZpMya3d6XbxmJc8iKJBtU46F/MGRkgjivDyHLMTl9eUXh5xi1Uu26bvzTco2kp86dnZprlvZpxtqM8j+Pfh7TPAXhPw/wCDda8CeEvGGovrKTeGPCUGm3Mdze3e/dJcPI1xISi5Mss0oZSQC2WIqL9kzxt481jwzol9r9toN74q1Hw95kGo6zq0yXN/YC5lYIjLCyssTsylRkqChPDLXsmi/B3wz4c/tPWdPM134g1SzeC68RarL591ICDtXdwEjBORHGFQdlFczp/7K3gy++BHh74M+Nby5uZ/DtqosNf0udrS8tLkA5nt5UO6JuT6gjhgRxUyyTM6WaLFUEopQaUbq2jgopyab5nHms7OMLcq3k5F9DqvhNo3i7w94fn0bxT4R8N6MkN/KdNtfDN7LNCbdm3K0nmQxES5J3YBBPzZGcDqa534XeB9Z+HvhKHw1rvxF1jxRPEzH+09c8nzyueFJiRAcDHJyScnPYdFX1+BhKng4RkmmktHy3Xl7vu6baaCCiiiuoAooooAKKKKACiiigAooooAKKKKACiiigAooooAKKKKACiiigD/2Q== |
大坑
中文验证码识别
建议用百度的api,高精度,一天500次,够用了,然后识别率挺高
本文作者:NoOne
本文地址: https://noonegroup.xyz/posts/6717abad/
版权声明:转载请注明出处!